diff --git a/docs/backup-strategy.md b/docs/backup-strategy.md new file mode 100644 index 00000000..2a85a54d --- /dev/null +++ b/docs/backup-strategy.md @@ -0,0 +1,127 @@ +# BrainLayer Backup Strategy + +Status: implemented for daily database snapshots. + +## Decision + +Use SQLite's online backup API, gzip the resulting snapshot, and upload it directly to Google Drive using the existing `~/.config/google-drive-mcp` OAuth credentials. + +Target folder: + +`Brain Drive/06_ARCHIVE/backups/brainlayer-db/YYYY-MM-DD.db.gz` + +Encryption posture: + +Backups are encrypted in transit by HTTPS and at rest by Google's infrastructure. Google holds the provider-side encryption keys. The database can contain user messages, code snippets, file paths, and agent memory, so client-side encryption should be added before upload if the threat model requires protection from the Drive account/provider layer. Recommended upgrade path: encrypt the gzip with `age` or GPG using a key stored in 1Password, then upload `YYYY-MM-DD.db.gz.age` and document the recovery key location. + +Schedule: + +Daily at 03:17 local time via `com.brainlayer.backup-daily`. + +Retention: + +Keep the latest 30 daily snapshots plus the latest snapshot for each of the latest 12 months. + +## Why This Approach + +The database runs in WAL mode and has active writers from BrainBar, enrichment, watch, and maintenance jobs. Copying `brainlayer.db` directly can miss WAL contents or capture an inconsistent file pair. SQLite's online backup API reads through SQLite itself, so the backup is a consistent snapshot without stopping the live services. + +Direct Google Drive API upload is used because the post-repair machine no longer has Google Drive Desktop mounted at the old CloudStorage path. Historical DriveFS logs show the previous path was: + +`~/Library/CloudStorage/GoogleDrive-etanface@gmail.com/My Drive/Brain Drive` + +That mount is not present after repair, and `/Applications/Google Drive.app` is also absent. The API path avoids depending on that local mount. + +## Implementation + +Repo files: + +- `src/brainlayer/backup_daily.py`: creates the SQLite backup, gzips it, uploads it to Drive, and prunes retention. +- `scripts/launchd/backup-daily.sh`: launchd wrapper installed to `~/.local/lib/brainlayer/backup-daily.sh`. +- `scripts/launchd/com.brainlayer.backup-daily.plist`: LaunchAgent template. +- `scripts/launchd/install.sh backup`: installs the wrapper and LaunchAgent. + +Local logs: + +- `~/.local/share/brainlayer/logs/backup-daily.log` +- `~/.local/share/brainlayer/logs/backup-daily.err` + +Manual run: + +```bash +PYTHONPATH=~/Gits/brainlayer/src python3 -m brainlayer.backup_daily +``` + +## Restore Drill + +1. Pick the newest good snapshot from Google Drive: + + `Brain Drive/06_ARCHIVE/backups/brainlayer-db/YYYY-MM-DD.db.gz` + +2. Download it to a local scratch path, for example: + + `/tmp/brainlayer-restore/YYYY-MM-DD.db.gz` + +3. Decompress and verify integrity: + + ```bash + mkdir -p /tmp/brainlayer-restore + gunzip -c /tmp/brainlayer-restore/YYYY-MM-DD.db.gz > /tmp/brainlayer-restore/brainlayer.db + sqlite3 /tmp/brainlayer-restore/brainlayer.db 'PRAGMA integrity_check; SELECT count(*) FROM chunks;' + ``` + +4. Stop writers before replacing the live DB: + + ```bash + launchctl unload ~/Library/LaunchAgents/com.brainlayer.brainbar.plist 2>/dev/null || true + launchctl unload ~/Library/LaunchAgents/com.brainlayer.enrichment.plist 2>/dev/null || true + launchctl unload ~/Library/LaunchAgents/com.brainlayer.watch.plist 2>/dev/null || true + launchctl unload ~/Library/LaunchAgents/com.brainlayer.decay.plist 2>/dev/null || true + ``` + +5. Preserve the corrupted DB and install the restored copy: + + ```bash + ts="$(date +%Y%m%d-%H%M%S)" + mkdir -p ~/.local/share/brainlayer/corrupt-$ts + ls -lh ~/.local/share/brainlayer/brainlayer.db ~/.local/share/brainlayer/brainlayer.db-wal ~/.local/share/brainlayer/brainlayer.db-shm 2>/dev/null || true + mv ~/.local/share/brainlayer/brainlayer.db* ~/.local/share/brainlayer/corrupt-$ts/ + ls -lh ~/.local/share/brainlayer/corrupt-$ts/ + cp /tmp/brainlayer-restore/brainlayer.db ~/.local/share/brainlayer/brainlayer.db + ``` + + The `brainlayer.db*` move preserves the main database plus SQLite auxiliary files: `brainlayer.db`, + `brainlayer.db-wal`, and `brainlayer.db-shm`. Verify the `ls` output before and after the move; the + wildcard will also move any other similarly named files in that directory. + +6. Verify the restored DB before re-enabling services: + + ```bash + sqlite3 ~/.local/share/brainlayer/brainlayer.db 'PRAGMA integrity_check; SELECT count(*) FROM chunks;' + ``` + +7. Re-enable services: + + ```bash + launchctl load ~/Library/LaunchAgents/com.brainlayer.brainbar.plist + launchctl load ~/Library/LaunchAgents/com.brainlayer.enrichment.plist + launchctl load ~/Library/LaunchAgents/com.brainlayer.watch.plist + launchctl load ~/Library/LaunchAgents/com.brainlayer.decay.plist + ``` + +8. Run a post-restore WAL checkpoint: + + ```bash + brainlayer wal-checkpoint --mode TRUNCATE + ``` + +## Monthly Drill + +Once per month: + +1. Download the newest snapshot from Drive. +2. Restore it into `/tmp/brainlayer-restore`. +3. Run `PRAGMA integrity_check` and `SELECT count(*) FROM chunks`. +4. Record the snapshot date, chunk count, and command output in the maintenance log. + +Do not replace the live DB during a drill unless the live DB is actually corrupted. diff --git a/pyproject.toml b/pyproject.toml index 7bc18dee..27848fad 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -35,6 +35,8 @@ dependencies = [ "scikit-learn>=1.0.0", # K-means for cluster-based sampling "spacy>=3.7,<4.0", # NER for PII sanitization (en_core_web_sm model) "requests>=2.28.0", # HTTP calls to Ollama for enrichment + "google-api-python-client>=2.0.0", # Daily DB backup upload to Google Drive + "google-auth>=2.0.0", # OAuth refresh for Google Drive backups "ranx>=0.3.20", # IR evaluation metrics + significance testing "abydos>=0.5.0", # Beider-Morse phonetic matching for cross-script entity aliases ] diff --git a/scripts/launchd/backup-daily.sh b/scripts/launchd/backup-daily.sh new file mode 100755 index 00000000..8dba1ae3 --- /dev/null +++ b/scripts/launchd/backup-daily.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash +set -euo pipefail + +export PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:$HOME/.local/bin" +export PYTHONUNBUFFERED=1 + +exec "${BRAINLAYER_PYTHON:-python3}" -m brainlayer.backup_daily diff --git a/scripts/launchd/com.brainlayer.backup-daily.plist b/scripts/launchd/com.brainlayer.backup-daily.plist new file mode 100644 index 00000000..3f36443f --- /dev/null +++ b/scripts/launchd/com.brainlayer.backup-daily.plist @@ -0,0 +1,43 @@ + + + + + Label + com.brainlayer.backup-daily + + + + ProgramArguments + + __HOME__/.local/lib/brainlayer/backup-daily.sh + + + StartCalendarInterval + + Hour + 3 + Minute + 17 + + + StandardOutPath + __HOME__/.local/share/brainlayer/logs/backup-daily.log + StandardErrorPath + __HOME__/.local/share/brainlayer/logs/backup-daily.err + + EnvironmentVariables + + PYTHONPATH + __BRAINLAYER_DIR__/src + BRAINLAYER_BACKUP_DRIVE_FOLDER + Brain Drive/06_ARCHIVE/backups/brainlayer-db + + + Nice + 15 + + ProcessType + Background + + diff --git a/scripts/launchd/install.sh b/scripts/launchd/install.sh index 205bcf4a..48a9c191 100755 --- a/scripts/launchd/install.sh +++ b/scripts/launchd/install.sh @@ -9,6 +9,7 @@ # ./scripts/launchd/install.sh load enrichment # ./scripts/launchd/install.sh unload enrichment # ./scripts/launchd/install.sh checkpoint # Install WAL checkpoint only +# ./scripts/launchd/install.sh backup # Install daily DB backup only # ./scripts/launchd/install.sh remove # Unload and remove all set -euo pipefail @@ -16,6 +17,7 @@ SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" LAUNCH_DIR="$HOME/Library/LaunchAgents" LOG_DIR="$HOME/Library/Logs" BRAINLAYER_LOG_DIR="$HOME/.local/share/brainlayer/logs" +BRAINLAYER_LIB_DIR="$HOME/.local/lib/brainlayer" BRAINLAYER_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)" BRAINLAYER_BIN="${BRAINLAYER_BIN:-$(which brainlayer 2>/dev/null || echo "$HOME/.local/bin/brainlayer")}" GOOGLE_API_KEY="${GOOGLE_API_KEY:-}" @@ -27,7 +29,7 @@ if [ ! -x "$BRAINLAYER_BIN" ]; then exit 1 fi -mkdir -p "$LAUNCH_DIR" "$LOG_DIR" "$BRAINLAYER_LOG_DIR" +mkdir -p "$LAUNCH_DIR" "$LOG_DIR" "$BRAINLAYER_LOG_DIR" "$BRAINLAYER_LIB_DIR" resolve_google_api_key() { if [ -n "${GOOGLE_API_KEY:-}" ]; then @@ -95,6 +97,20 @@ install_plist() { load_plist "$name" } +install_backup_script() { + local src="$SCRIPT_DIR/backup-daily.sh" + local dst="$BRAINLAYER_LIB_DIR/backup-daily.sh" + + if [ ! -f "$src" ]; then + echo "ERROR: $src not found" + return 1 + fi + + cp "$src" "$dst" + chmod 755 "$dst" + echo "Installed: $dst" +} + remove_plist() { local name="$1" local dst="$LAUNCH_DIR/com.brainlayer.${name}.plist" @@ -127,11 +143,17 @@ case "${1:-all}" in checkpoint) install_plist wal-checkpoint ;; + backup) + install_backup_script + install_plist backup-daily + ;; all) install_plist index install_plist enrichment install_plist decay install_plist wal-checkpoint + install_backup_script + install_plist backup-daily # Remove old enrich plist if present remove_plist enrich 2>/dev/null || true ;; @@ -141,9 +163,11 @@ case "${1:-all}" in remove_plist enrichment 2>/dev/null || true remove_plist decay 2>/dev/null || true remove_plist wal-checkpoint + remove_plist backup-daily 2>/dev/null || true + rm -f "$BRAINLAYER_LIB_DIR/backup-daily.sh" ;; *) - echo "Usage: $0 [index|enrich|enrichment|decay|load [name]|unload [name]|checkpoint|all|remove]" + echo "Usage: $0 [index|enrich|enrichment|decay|load [name]|unload [name]|checkpoint|backup|all|remove]" exit 1 ;; esac diff --git a/src/brainlayer/backup_daily.py b/src/brainlayer/backup_daily.py new file mode 100644 index 00000000..98afb66d --- /dev/null +++ b/src/brainlayer/backup_daily.py @@ -0,0 +1,355 @@ +"""Daily BrainLayer database backups. + +The backup path intentionally uses SQLite's online backup API instead of copying +the database file directly, so live WAL writes are folded into a consistent +snapshot without stopping BrainBar or the enrichment jobs. +""" + +from __future__ import annotations + +import datetime as dt +import fcntl +import gzip +import json +import os +import shutil +import sqlite3 +import tempfile +import time +import traceback +from pathlib import Path +from typing import Any + +import requests + +from .paths import get_db_path + +DEFAULT_TOKEN_PATH = Path.home() / ".config" / "google-drive-mcp" / "tokens.json" +DEFAULT_CLIENT_PATH = Path.home() / ".config" / "google-drive-mcp" / "gcp-oauth.keys.json" +DEFAULT_FOLDER_PARTS = ["Brain Drive", "06_ARCHIVE", "backups", "brainlayer-db"] +DEFAULT_STAGING_DIR = Path.home() / ".local" / "share" / "brainlayer" / "backups" +DRIVE_FOLDER_MIME = "application/vnd.google-apps.folder" +DRIVE_SCOPES = ["https://www.googleapis.com/auth/drive"] + + +def _today() -> str: + return dt.datetime.now(dt.UTC).date().isoformat() + + +def _escape_drive_query_value(value: str) -> str: + return value.replace("\\", "\\\\").replace("'", "\\'") + + +def create_sqlite_backup_gzip(db_path: Path, output_dir: Path, date_stamp: str | None = None) -> Path: + """Create a restorable `.db.gz` snapshot using SQLite's online backup API.""" + db_path = Path(db_path).expanduser() + output_dir = Path(output_dir).expanduser() + date_stamp = date_stamp or _today() + + if not db_path.exists(): + raise FileNotFoundError(f"BrainLayer database not found: {db_path}") + + output_dir.mkdir(parents=True, exist_ok=True) + required_bytes = (db_path.stat().st_size * 2) + (512 * 1024 * 1024) + free_bytes = shutil.disk_usage(output_dir).free + if free_bytes < required_bytes: + raise RuntimeError( + f"Insufficient free space for backup in {output_dir}: " + f"{free_bytes} bytes free, {required_bytes} bytes required" + ) + final_gz = output_dir / f"{date_stamp}.db.gz" + + with tempfile.TemporaryDirectory(prefix="brainlayer-backup-", dir=output_dir) as tmp: + raw_snapshot = Path(tmp) / f"{date_stamp}.db" + source = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True, timeout=60) + target = sqlite3.connect(raw_snapshot) + try: + source.backup(target, pages=10_000, sleep=0.1) + target.execute("PRAGMA wal_checkpoint(TRUNCATE)") + integrity = target.execute("PRAGMA integrity_check").fetchone() + if not integrity or integrity[0] != "ok": + raise RuntimeError(f"Backup integrity check failed: {integrity!r}") + finally: + target.close() + source.close() + + temp_gz = Path(tmp) / final_gz.name + with raw_snapshot.open("rb") as src, gzip.open(temp_gz, "wb", compresslevel=6) as dst: + shutil.copyfileobj(src, dst, length=1024 * 1024) + shutil.move(str(temp_gz), final_gz) + + return final_gz + + +def _atomic_write_text(path: Path, content: str) -> None: + temp_path = path.with_name(f".{path.name}.{os.getpid()}.tmp") + try: + temp_path.write_text(content) + os.replace(temp_path, path) + finally: + temp_path.unlink(missing_ok=True) + + +def get_drive_credentials(token_path: Path = DEFAULT_TOKEN_PATH, client_path: Path = DEFAULT_CLIENT_PATH): + """Load and refresh Google Drive OAuth credentials from the existing MCP auth files.""" + from google.auth.transport.requests import Request + from google.oauth2.credentials import Credentials + + token_path = Path(token_path).expanduser() + client_path = Path(client_path).expanduser() + if not token_path.exists(): + raise FileNotFoundError(f"Google Drive token file not found: {token_path}") + if not client_path.exists(): + raise FileNotFoundError(f"Google OAuth client file not found: {client_path}") + + lock_path = token_path.with_suffix(token_path.suffix + ".lock") + with lock_path.open("w") as lock_file: + fcntl.flock(lock_file, fcntl.LOCK_EX) + token_data = json.loads(token_path.read_text()) + client_data = json.loads(client_path.read_text())["installed"] + + expiry = token_data.get("expiry") + if not expiry and token_data.get("expiry_date"): + expiry = dt.datetime.fromtimestamp(int(token_data["expiry_date"]) / 1000, tz=dt.UTC).isoformat() + + parsed_expiry = dt.datetime.fromisoformat(expiry.replace("Z", "+00:00")) if expiry else None + if parsed_expiry and parsed_expiry.tzinfo: + parsed_expiry = parsed_expiry.astimezone(dt.UTC).replace(tzinfo=None) + elif parsed_expiry: + parsed_expiry = parsed_expiry.replace(tzinfo=None) + + creds = Credentials( + token=token_data.get("access_token"), + refresh_token=token_data.get("refresh_token"), + token_uri=client_data["token_uri"], + client_id=client_data["client_id"], + client_secret=client_data["client_secret"], + scopes=token_data.get("scope", " ".join(DRIVE_SCOPES)).split(), + expiry=parsed_expiry, + ) + + # google-auth Credentials.expired compares against a naive UTC helper, so keep expiry comparisons naive UTC. + refresh_before = dt.datetime.now(dt.UTC).replace(tzinfo=None) + dt.timedelta(hours=2) + if creds.expired or not creds.valid or (creds.expiry and creds.expiry < refresh_before): + creds.refresh(Request()) + token_data["access_token"] = creds.token + token_data["expiry"] = creds.expiry.isoformat() if creds.expiry else None + _atomic_write_text(token_path, json.dumps(token_data, indent=2, sort_keys=True) + "\n") + + return creds + + +def build_drive_service(token_path: Path = DEFAULT_TOKEN_PATH, client_path: Path = DEFAULT_CLIENT_PATH): + from googleapiclient.discovery import build + + return build("drive", "v3", credentials=get_drive_credentials(token_path, client_path)) + + +def ensure_drive_folder(service: Any, name: str, parent_id: str | None = None) -> str: + escaped = _escape_drive_query_value(name) + clauses = [ + f"name = '{escaped}'", + f"mimeType = '{DRIVE_FOLDER_MIME}'", + "trashed = false", + ] + if parent_id: + clauses.append(f"'{parent_id}' in parents") + query = " and ".join(clauses) + + result = ( + service.files() + .list(q=query, spaces="drive", fields="files(id,name)", pageSize=10, supportsAllDrives=True) + .execute() + ) + files = result.get("files", []) + if files: + return files[0]["id"] + + metadata: dict[str, Any] = {"name": name, "mimeType": DRIVE_FOLDER_MIME} + if parent_id: + metadata["parents"] = [parent_id] + created = service.files().create(body=metadata, fields="id", supportsAllDrives=True).execute() + return created["id"] + + +def ensure_drive_folder_chain(service: Any, folder_parts: list[str]) -> str: + parent_id = None + for part in folder_parts: + parent_id = ensure_drive_folder(service, part, parent_id) + if parent_id is None: + raise ValueError("folder_parts must not be empty") + return parent_id + + +def upload_file_to_drive_raw( + file_path: Path, + folder_id: str, + credentials: Any, + chunk_size: int = 8 * 1024 * 1024, + max_attempts: int = 30, +) -> dict[str, Any]: + """Upload large backups with Drive's raw resumable protocol.""" + file_path = Path(file_path) + total = file_path.stat().st_size + metadata = {"name": file_path.name, "parents": [folder_id]} + init = requests.post( + "https://www.googleapis.com/upload/drive/v3/files?uploadType=resumable&supportsAllDrives=true&fields=id,name,size", + headers={ + "Authorization": f"Bearer {credentials.token}", + "Content-Type": "application/json; charset=UTF-8", + "X-Upload-Content-Type": "application/gzip", + "X-Upload-Content-Length": str(total), + }, + data=json.dumps(metadata), + timeout=60, + ) + init.raise_for_status() + upload_url = init.headers["Location"] + + sent = 0 + with file_path.open("rb") as handle: + while sent < total: + handle.seek(sent) + expected = min(chunk_size, total - sent) + chunk = handle.read(expected) + if len(chunk) != expected: + raise RuntimeError(f"Backup file changed during upload: expected {expected} bytes, got {len(chunk)}") + start = sent + end = sent + len(chunk) - 1 + headers = { + "Authorization": f"Bearer {credentials.token}", + "Content-Length": str(len(chunk)), + "Content-Range": f"bytes {start}-{end}/{total}", + } + for attempt in range(1, max_attempts + 1): + try: + response = requests.put(upload_url, headers=headers, data=chunk, timeout=120) + if response.status_code in {200, 201}: + return response.json() + if response.status_code == 308: + uploaded_range = response.headers.get("Range") + if uploaded_range and "-" in uploaded_range: + sent = int(uploaded_range.rsplit("-", 1)[1]) + 1 + else: + sent = end + 1 + print(f"drive upload progress: {sent}/{total} bytes", flush=True) + break + if response.status_code in {429, 500, 502, 503, 504}: + raise RuntimeError(f"retryable HTTP {response.status_code}: {response.text[:200]}") + response.raise_for_status() + except Exception as exc: + if attempt >= max_attempts: + raise + sleep_seconds = min(60, 2 ** min(attempt, 6)) + print( + f"drive upload retry chunk={start}-{end} attempt={attempt}/{max_attempts}: {exc}; " + f"sleeping {sleep_seconds}s", + flush=True, + ) + time.sleep(sleep_seconds) + + raise RuntimeError("Drive upload ended without final response") + + +def _parse_snapshot_date(name: str) -> dt.date | None: + if not name.endswith(".db.gz"): + return None + try: + return dt.date.fromisoformat(name[:10]) + except ValueError: + return None + + +def prune_drive_backups(service: Any, folder_parts: list[str] = DEFAULT_FOLDER_PARTS) -> list[str]: + """Keep 30 latest daily snapshots plus latest snapshot for each of 12 months.""" + folder_id = ensure_drive_folder_chain(service, folder_parts) + files: list[dict[str, str]] = [] + page_token = None + while True: + result = ( + service.files() + .list( + q=f"'{folder_id}' in parents and trashed = false", + spaces="drive", + fields="nextPageToken,files(id,name)", + pageSize=1000, + pageToken=page_token, + supportsAllDrives=True, + ) + .execute() + ) + files.extend(result.get("files", [])) + page_token = result.get("nextPageToken") + if not page_token: + break + dated = [] + for item in files: + parsed = _parse_snapshot_date(item.get("name", "")) + if parsed: + dated.append((parsed, item)) + + dated.sort(key=lambda pair: pair[0], reverse=True) + keep_ids = {item["id"] for _, item in dated[:30]} + + months_seen: set[tuple[int, int]] = set() + for snapshot_date, item in dated: + month = (snapshot_date.year, snapshot_date.month) + if month in months_seen: + continue + if len(months_seen) >= 12: + continue + months_seen.add(month) + keep_ids.add(item["id"]) + + deleted: list[str] = [] + for _, item in dated: + if item["id"] in keep_ids: + continue + service.files().delete(fileId=item["id"], supportsAllDrives=True).execute() + deleted.append(item["name"]) + return deleted + + +def run_backup( + db_path: Path | None = None, + staging_dir: Path = DEFAULT_STAGING_DIR, + folder_parts: list[str] = DEFAULT_FOLDER_PARTS, + date_stamp: str | None = None, + upload: bool = True, +) -> dict[str, Any]: + snapshot = create_sqlite_backup_gzip(db_path or get_db_path(), staging_dir, date_stamp=date_stamp) + result: dict[str, Any] = { + "db": str(db_path or get_db_path()), + "snapshot": str(snapshot), + "bytes": snapshot.stat().st_size, + "uploaded": False, + } + if upload: + credentials = get_drive_credentials() + service = build_drive_service() + folder_id = ensure_drive_folder_chain(service, folder_parts) + uploaded = upload_file_to_drive_raw(snapshot, folder_id, credentials) + deleted = prune_drive_backups(service, folder_parts=folder_parts) + result.update({"uploaded": True, "drive_file": uploaded, "retention_deleted": deleted}) + return result + + +def main() -> int: + try: + result = run_backup( + staging_dir=Path(os.environ.get("BRAINLAYER_BACKUP_STAGING_DIR", str(DEFAULT_STAGING_DIR))), + # Prefer BRAINLAYER_BACKUP_DRIVE_FOLDER; BRAINLAYER_BACKUP_DRIVE_PATH is a legacy alias before DEFAULT_FOLDER_PARTS. + folder_parts=os.environ.get( + "BRAINLAYER_BACKUP_DRIVE_FOLDER", + os.environ.get("BRAINLAYER_BACKUP_DRIVE_PATH", "/".join(DEFAULT_FOLDER_PARTS)), + ).split("/"), + ) + except Exception as exc: + print(f"brainlayer backup failed: {exc}\n{traceback.format_exc()}", flush=True) + return 1 + print(json.dumps(result, sort_keys=True), flush=True) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/tests/test_backup_daily.py b/tests/test_backup_daily.py new file mode 100644 index 00000000..54422518 --- /dev/null +++ b/tests/test_backup_daily.py @@ -0,0 +1,115 @@ +import gzip +import sqlite3 +from pathlib import Path + +import pytest + + +def test_create_snapshot_gzip_is_restorable(tmp_path): + from brainlayer.backup_daily import create_sqlite_backup_gzip + + source = tmp_path / "brainlayer.db" + conn = sqlite3.connect(source) + journal_mode = conn.execute("PRAGMA journal_mode=WAL").fetchone()[0] + assert journal_mode.upper() == "WAL" + conn.execute("CREATE TABLE chunks (id TEXT PRIMARY KEY, content TEXT)") + conn.execute("INSERT INTO chunks VALUES ('c1', 'hello')") + conn.commit() + conn.close() + + out_dir = tmp_path / "out" + snapshot = create_sqlite_backup_gzip(source, out_dir, date_stamp="2026-05-13") + + assert snapshot == out_dir / "2026-05-13.db.gz" + assert snapshot.exists() + + restored = tmp_path / "restored.db" + with gzip.open(snapshot, "rb") as src, restored.open("wb") as dst: + dst.write(src.read()) + + restored_conn = sqlite3.connect(restored) + try: + assert restored_conn.execute("PRAGMA integrity_check").fetchone()[0] == "ok" + assert restored_conn.execute("SELECT content FROM chunks WHERE id = 'c1'").fetchone()[0] == "hello" + finally: + restored_conn.close() + + +def test_create_snapshot_rejects_low_disk_space(tmp_path, monkeypatch): + from brainlayer import backup_daily + + source = tmp_path / "brainlayer.db" + conn = sqlite3.connect(source) + conn.execute("CREATE TABLE chunks (id TEXT PRIMARY KEY, content TEXT)") + conn.commit() + conn.close() + + class LowDisk: + free = 1 + + monkeypatch.setattr(backup_daily.shutil, "disk_usage", lambda _path: LowDisk()) + + with pytest.raises(RuntimeError, match="Insufficient free space"): + backup_daily.create_sqlite_backup_gzip(source, tmp_path / "out", date_stamp="2026-05-13") + + +def test_ensure_drive_folder_chain_creates_missing_folders(): + from brainlayer.backup_daily import ensure_drive_folder_chain + + class FakeExecute: + def __init__(self, value): + self.value = value + + def execute(self): + return self.value + + class FakeFiles: + def __init__(self): + self.created = [] + + def list(self, **kwargs): + query = kwargs["q"] + if "name = 'Brain Drive'" in query: + return FakeExecute({"files": [{"id": "brain-drive"}]}) + return FakeExecute({"files": []}) + + def create(self, body, fields=None, **kwargs): # noqa: ARG002 + folder_id = f"folder-{body['name']}" + self.created.append((body["name"], body["parents"][0])) + return FakeExecute({"id": folder_id}) + + class FakeService: + def __init__(self): + self._files = FakeFiles() + + def files(self): + return self._files + + service = FakeService() + + result = ensure_drive_folder_chain( + service, + ["Brain Drive", "06_ARCHIVE", "backups", "brainlayer-db"], + ) + + assert result == "folder-brainlayer-db" + assert ("06_ARCHIVE", "brain-drive") in service.files().created + assert ("backups", "folder-06_ARCHIVE") in service.files().created + assert ("brainlayer-db", "folder-backups") in service.files().created + + +def test_launchd_installer_knows_backup_target(): + install_path = Path("scripts/launchd/install.sh") + plist_path = Path("scripts/launchd/com.brainlayer.backup-daily.plist") + + assert install_path.is_file(), f"Installer not found at {install_path}; check test working directory" + assert plist_path.is_file(), f"Backup plist not found at {plist_path}; check launchd template is committed" + + install = install_path.read_text() + plist = plist_path.read_text() + + assert "backup-daily" in install + assert "install_backup_script" in install + assert "com.brainlayer.backup-daily" in plist + assert "3" in plist + assert "17" in plist