Add daily BrainLayer DB backups#280
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@codex review |
|
@coderabbitai review |
📝 WalkthroughWalkthroughThis PR adds a complete daily database backup system that captures SQLite snapshots, compresses them with gzip, uploads to Google Drive with resumable protocol and retries, maintains a 30-plus-per-month retention policy, and integrates with macOS LaunchAgent for automatic scheduling at 03:17 UTC daily. ChangesDatabase Backup System
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Actions performedReview triggered.
|
|
Based on my review of the PR "Add daily BrainLayer DB backups", I've identified several issues ranging from critical to minor. Here's my comprehensive analysis: 🔴 Critical Issues1. Token Refresh Race Condition (
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0f0f0739d1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| export PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:$HOME/.local/bin" | ||
| export PYTHONUNBUFFERED=1 | ||
|
|
||
| exec "${BRAINLAYER_PYTHON:-python3}" -m brainlayer.backup_daily |
There was a problem hiding this comment.
Use the installed Python for backups
When BrainLayer is installed in a venv, pipx, or any non-default interpreter, the existing launchd jobs still work because their plists execute __BRAINLAYER_BIN__, but this wrapper ignores that configured executable and falls back to launchd's python3. In that setup the scheduled backup can fail at import time for project or Google Drive dependencies even though scripts/launchd/install.sh successfully found a working BrainLayer install; the backup job should run with the same interpreter/environment as the installed CLI or have the installer populate BRAINLAYER_PYTHON.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/backup-strategy.md`:
- Line 87: The mv command shown uses the glob brainlayer.db* which will move the
main SQLite file and its auxiliary files; update the docs to explicitly list the
exact files matched (e.g., brainlayer.db, brainlayer.db-wal, brainlayer.db-shm)
and clarify that the command preserves the main DB plus WAL and shared-memory
files by moving brainlayer.db, brainlayer.db-wal, and brainlayer.db-shm into
corupt-$ts; mention that other similarly named files will also be included by
the wildcard and recommend verifying the moved files with ls before and after
running the mv command.
In `@src/brainlayer/backup_daily.py`:
- Around line 191-200: The function upload_file_to_drive is defined but never
used (upload_file_to_drive_raw is used instead); remove the unused
upload_file_to_drive function and its internal import of MediaFileUpload to
reduce dead code, or alternatively replace the call sites that use
upload_file_to_drive_raw with upload_file_to_drive if you intend to switch to
the resumable MediaFileUpload implementation—update only the function definition
(upload_file_to_drive) and any references so there are no lingering unused
imports or duplicate semantics.
- Around line 91-110: The code mixes naive and aware datetimes around
parsed_expiry, Credentials(... expiry=parsed_expiry) and refresh_before; make
all datetimes timezone-aware in UTC instead of stripping tzinfo: when computing
parsed_expiry (in the block that sets parsed_expiry) keep the tzinfo (remove
.replace(tzinfo=None)) and ensure it is converted to UTC with
.astimezone(dt.UTC), then pass that aware datetime into Credentials; likewise
create refresh_before as dt.datetime.now(dt.UTC) + dt.timedelta(hours=2) (keep
tzinfo) so the comparison in the if checking creds.expired / creds.valid /
creds.expiry < refresh_before uses consistent timezone-aware datetimes.
- Around line 354-358: The code currently falls back from
BRAINLAYER_BACKUP_DRIVE_FOLDER to BRAINLAYER_BACKUP_DRIVE_PATH when computing
folder_parts (using DEFAULT_FOLDER_PARTS as a final default); add a brief inline
comment next to that expression explaining that BRAINLAYER_BACKUP_DRIVE_PATH is
checked for backward compatibility/alternate config and that
BRAINLAYER_BACKUP_DRIVE_FOLDER is the preferred variable—reference the variables
BRAINLAYER_BACKUP_DRIVE_FOLDER, BRAINLAYER_BACKUP_DRIVE_PATH and
DEFAULT_FOLDER_PARTS in the comment so future readers know the intended
precedence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9839ee43-e028-4724-ac21-df792d1aa44d
📒 Files selected for processing (7)
docs/backup-strategy.mdpyproject.tomlscripts/launchd/backup-daily.shscripts/launchd/com.brainlayer.backup-daily.plistscripts/launchd/install.shsrc/brainlayer/backup_daily.pytests/test_backup_daily.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.12)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
- GitHub Check: Cursor Bugbot
- GitHub Check: Macroscope - Correctness Check
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests
**/*.py: Usepaths.py:get_db_path()for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches
Files:
src/brainlayer/backup_daily.pytests/test_backup_daily.py
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/**/*.py: Use retry logic onSQLITE_BUSYerrors; each worker must use its own database connection to handle concurrency safely
Classification must preserveai_code,stack_trace, anduser_messageverbatim; skipnoiseentries entirely and summarizebuild_loganddir_listingentries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback viaenrichment_controller.py, and Ollama as offline last-resort; allow override viaBRAINLAYER_ENRICH_BACKENDenv var
Configure enrichment rate viaBRAINLAYER_ENRICH_RATEenvironment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns:superseded_by,aggregated_into,archived_aton chunks table; exclude lifecycle-managed chunks from default search; allowinclude_archived=Trueto show history
Implementbrain_supersedewith safety gate for personal data (journals, notes, health/finance); use soft-delete forbrain_archivewith timestamp
Addsupersedesparameter tobrain_storefor atomic store-and-replace operations
Run linting and formatting with:ruff check src/ && ruff format src/
Run tests withpytest
UsePRAGMA wal_checkpoint(FULL)before and after bulk database operations to prevent WAL bloat
Files:
src/brainlayer/backup_daily.py
🔇 Additional comments (19)
docs/backup-strategy.md (4)
5-24: LGTM!
25-33: LGTM!
35-53: LGTM!
112-121: LGTM!pyproject.toml (1)
38-39: LGTM!src/brainlayer/backup_daily.py (5)
1-32: LGTM!
42-74: LGTM!
125-158: LGTM!
277-324: LGTM!
202-265: ⚡ Quick winImplementation matches Google Drive API v3 resumable upload specification.
The custom resumable upload correctly follows the Drive API v3 protocol:
- Initiates the resumable session with proper Authorization, Content-Type, and X-Upload-Content-* headers
- Sends chunks via PUT with correct Content-Range format (bytes start-end/total)
- Handles status codes correctly: 200/201 for completion, 308 for resume (parsing Range header to determine next offset), and 429/5xx for retryable errors
- Range header parsing (extracting the final byte position and adding 1) is correct; the fallback to
end + 1when Range header is missing is reasonable for 308 responsesNo changes needed.
tests/test_backup_daily.py (3)
6-33: LGTM!
36-88: LGTM!
91-105: LGTM!scripts/launchd/backup-daily.sh (1)
1-7: LGTM!scripts/launchd/com.brainlayer.backup-daily.plist (1)
1-43: LGTM!scripts/launchd/install.sh (4)
12-12: LGTM!
20-20: LGTM!Also applies to: 32-32
100-112: LGTM!
146-149: LGTM!Also applies to: 155-156, 166-167, 170-170
| expiry = token_data.get("expiry") | ||
| if not expiry and token_data.get("expiry_date"): | ||
| expiry = dt.datetime.fromtimestamp(int(token_data["expiry_date"]) / 1000, tz=dt.UTC).isoformat() | ||
|
|
||
| parsed_expiry = dt.datetime.fromisoformat(expiry.replace("Z", "+00:00")) if expiry else None | ||
| if parsed_expiry and parsed_expiry.tzinfo: | ||
| parsed_expiry = parsed_expiry.astimezone(dt.UTC).replace(tzinfo=None) | ||
|
|
||
| creds = Credentials( | ||
| token=token_data.get("access_token"), | ||
| refresh_token=token_data.get("refresh_token"), | ||
| token_uri=client_data["token_uri"], | ||
| client_id=client_data["client_id"], | ||
| client_secret=client_data["client_secret"], | ||
| scopes=token_data.get("scope", " ".join(DRIVE_SCOPES)).split(), | ||
| expiry=parsed_expiry, | ||
| ) | ||
|
|
||
| refresh_before = dt.datetime.now(dt.UTC).replace(tzinfo=None) + dt.timedelta(hours=2) | ||
| if creds.expired or not creds.valid or (creds.expiry and creds.expiry < refresh_before): |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Consider simplifying timezone handling.
The credential refresh logic mixes timezone-aware and naive datetimes (lines 95-97 parse to aware, then convert to naive; line 109 creates naive UTC). While this works for google-auth's Credentials API, mixing aware and naive datetimes is fragile and discouraged in Python's datetime best practices.
If refactoring later, consider: keep all datetimes timezone-aware throughout, or clearly document why naive UTC is required here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/brainlayer/backup_daily.py` around lines 91 - 110, The code mixes naive
and aware datetimes around parsed_expiry, Credentials(... expiry=parsed_expiry)
and refresh_before; make all datetimes timezone-aware in UTC instead of
stripping tzinfo: when computing parsed_expiry (in the block that sets
parsed_expiry) keep the tzinfo (remove .replace(tzinfo=None)) and ensure it is
converted to UTC with .astimezone(dt.UTC), then pass that aware datetime into
Credentials; likewise create refresh_before as dt.datetime.now(dt.UTC) +
dt.timedelta(hours=2) (keep tzinfo) so the comparison in the if checking
creds.expired / creds.valid / creds.expiry < refresh_before uses consistent
timezone-aware datetimes.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Addressed review durability feedback in 99cc4e1: added token-file locking plus atomic token writes, backup staging free-space guard, stricter snapshot checkpoint, upload chunk-length validation/progress, traceback logging, and clearer encryption threat-model docs. Local focused checks passed and the full pre-push gate passed ( |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 99cc4e1045
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| required_bytes = (db_path.stat().st_size * 2) + (512 * 1024 * 1024) | ||
| free_bytes = shutil.disk_usage(output_dir).free |
There was a problem hiding this comment.
Account for WAL size in the space check
When the live database has a large uncheckpointed brainlayer.db-wal, this estimate only counts brainlayer.db, but the SQLite online backup materializes committed WAL pages into raw_snapshot. In the known WAL-heavy cases, a tiny main DB plus multi-GB WAL can pass this check with far less free space than the backup actually needs, then fail midway instead of refusing up front; include the WAL/logical backup size in the required-space calculation.
Useful? React with 👍 / 👎.
| date_stamp: str | None = None, | ||
| upload: bool = True, | ||
| ) -> dict[str, Any]: | ||
| snapshot = create_sqlite_backup_gzip(db_path or get_db_path(), staging_dir, date_stamp=date_stamp) |
There was a problem hiding this comment.
Prune or delete local staging snapshots
With the default launchd path, every successful run leaves a dated .db.gz in ~/.local/share/brainlayer/backups, and the later retention code only deletes Drive files. For a large BrainLayer DB this unbounded local cache can eventually fill the same disk used by the live DB and prevent future writes/backups, so either remove the staging artifact after upload or apply the same retention policy locally.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Addressed the latest CodeRabbit actionable comments in edea74e: clarified the restore wildcard docs, removed the unused MediaFileUpload helper/dead code, documented env-var precedence, and verified google-auth requires naive UTC expiry values (kept with inline comment after real |
| while True: | ||
| result = ( | ||
| service.files() | ||
| .list( | ||
| q=f"'{folder_id}' in parents and trashed = false", | ||
| spaces="drive", | ||
| fields="nextPageToken,files(id,name)", | ||
| pageSize=1000, |
There was a problem hiding this comment.
🟠 High brainlayer/backup_daily.py:268
When a 308 response lacks a Range header, the code sets sent = end + 1, assuming the chunk was received. Per the Drive resumable upload protocol, a missing Range header means the server received 0 bytes, so sent should remain unchanged. This causes the upload to skip that chunk, resulting in a corrupted backup file on Drive.
if response.status_code == 308:
uploaded_range = response.headers.get("Range")
if uploaded_range and "-" in uploaded_range:
sent = int(uploaded_range.rsplit("-", 1)[1]) + 1
else:
- sent = end + 1
+ pass # Server received 0 bytes; retry same chunk
print(f"drive upload progress: {sent}/{total} bytes", flush=True)🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/brainlayer/backup_daily.py around lines 268-275:
When a 308 response lacks a `Range` header, the code sets `sent = end + 1`, assuming the chunk was received. Per the Drive resumable upload protocol, a missing `Range` header means the server received 0 bytes, so `sent` should remain unchanged. This causes the upload to skip that chunk, resulting in a corrupted backup file on Drive.
Evidence trail:
src/brainlayer/backup_daily.py lines 270-273 (REVIEWED_COMMIT): `else: sent = end + 1` when Range header is missing from 308 response. Google Drive resumable upload docs: https://developers.google.com/workspace/drive/api/guides/manage-uploads — absent Range header in 308 means 0 bytes received.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: edea74e33d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| start = sent | ||
| end = sent + len(chunk) - 1 | ||
| headers = { | ||
| "Authorization": f"Bearer {credentials.token}", |
There was a problem hiding this comment.
Refresh Drive credentials on upload 401s
For multi-GB backups on a slow connection, the OAuth access token can expire after the resumable upload starts; each chunk's Authorization header is built from the same credentials.token, and a 401 falls through to raise_for_status() so the retry loop just resends the stale token up to max_attempts before failing. Refresh the credentials and rebuild the header when Drive returns 401/403 auth failures so long uploads can complete reliably.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit edea74e. Configure here.
| f"sleeping {sleep_seconds}s", | ||
| flush=True, | ||
| ) | ||
| time.sleep(sleep_seconds) |
There was a problem hiding this comment.
Non-retryable HTTP errors retried 30 times
Medium Severity
response.raise_for_status() on line 239 raises an HTTPError for non-retryable 4xx client errors (e.g., 400, 401, 403, 404), which is then caught by the generic except Exception handler on line 240. This causes permanent client errors to be retried up to 30 times with exponential backoff (up to 60s per sleep), potentially wasting 30+ minutes before ultimately failing. Only 429 and 5xx errors are intentionally retryable; other 4xx errors indicate a permanent problem (expired token, bad request, forbidden resource) that retrying won't fix.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit edea74e. Configure here.
|
|
||
| # google-auth Credentials.expired compares against a naive UTC helper, so keep expiry comparisons naive UTC. | ||
| refresh_before = dt.datetime.now(dt.UTC).replace(tzinfo=None) + dt.timedelta(hours=2) | ||
| if creds.expired or not creds.valid or (creds.expiry and creds.expiry < refresh_before): |
There was a problem hiding this comment.
Refresh threshold always exceeds Google token lifetime
Low Severity
refresh_before is set to now + 2 hours, but Google OAuth access tokens have a ~1-hour lifetime. Since a fresh token's expiry (~now + 1h) is always less than refresh_before (~now + 2h), the condition creds.expiry < refresh_before is always true, causing a token refresh on every single invocation. Combined with run_backup calling get_drive_credentials() twice (once directly, once inside build_drive_service()), this results in two unnecessary Google token refreshes per backup run.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit edea74e. Configure here.
| uploaded = upload_file_to_drive_raw(snapshot, folder_id, credentials) | ||
| deleted = prune_drive_backups(service, folder_parts=folder_parts) | ||
| result.update({"uploaded": True, "drive_file": uploaded, "retention_deleted": deleted}) | ||
| return result |
There was a problem hiding this comment.
Local staging snapshots never pruned, accumulating indefinitely
Medium Severity
run_backup prunes remote Drive backups via prune_drive_backups but never removes old local .db.gz files from staging_dir. Each daily run of create_sqlite_backup_gzip creates a new date-stamped file (~5GB per the PR verification) that persists indefinitely. After 30 days, ~150GB of local disk is consumed; the free-space guard will eventually reject new backups, silently breaking the entire backup pipeline.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit edea74e. Configure here.




Summary
~/.local/share/brainlayer/brainlayer.dbBrain Drive/06_ARCHIVE/backups/brainlayer-db/YYYY-MM-DD.db.gzdocs/backup-strategy.mdVerification
python3 -m pytest tests/test_backup_daily.py tests/test_wal_checkpoint_module.py tests/test_cli_enrich.py::test_cli_wal_checkpoint_routes_to_helper -q->7 passedplutil -lint scripts/launchd/com.brainlayer.backup-daily.plist-> OKpython3 -m compileall -q src/brainlayer/backup_daily.py-> OK1826 passed, 9 skipped, 75 deselected, 1 xfailed; MCP registration3 passed; eval/hook routing32 passed; Bun1 pass; regression shell pass2026-05-13.db.gz, Drive file id1tnAhWnmfpjG6Kn4peHTCGAptD0mrY8HF, size5243572021PRAGMA integrity_checkreturnedok;SELECT count(*) FROM chunksreturned361029Notes
~/.config/google-drive-mcpand uploads via Drive API.age/GPG as a recommended future hardening step if provider-layer protection is required.Note
Medium Risk
Adds new automated data-backup logic plus Google Drive OAuth/token handling and remote deletion for retention; mistakes could impact backup reliability or unintentionally delete snapshots.
Overview
Implements an automated daily backup flow that creates a consistent SQLite snapshot via the online backup API, gzips it, uploads it to Google Drive using existing
~/.config/google-drive-mcpOAuth credentials, and applies retention (30 daily + 12 monthly) via remote pruning.Adds a
launchdLaunchAgent (com.brainlayer.backup-daily) and installer support (newbackuptarget plus wrapper script install) to run at 03:17 with logs under~/.local/share/brainlayer/logs.Updates
pyproject.tomlwith Google Drive client/auth deps, adds tests covering snapshot restoration, low-disk-space rejection, Drive folder creation logic, and installer/plist wiring, and documents restore/drill procedures indocs/backup-strategy.md.Reviewed by Cursor Bugbot for commit edea74e. Bugbot is set up for automated code reviews on this repo. Configure here.
Summary by CodeRabbit
New Features
Documentation
Tests
Chores
Note
Add daily BrainLayer SQLite backups with Google Drive upload and retention
backup_daily.pywhich takes an online SQLite snapshot via the backup API, gzips it, uploads to Google Drive using resumable chunked transfer with retries, and prunes old backups (keeps 30 daily + 1 per month for 12 months).LaunchAgent plistthat runs daily at 03:17 with Nice level 15.install.shwith abackupsubcommand to deploy the wrapper script and plist;removealso cleans them up.📊 Macroscope summarized 99cc4e1. 2 files reviewed, 1 issue evaluated, 0 issues filtered, 1 comment posted
🗂️ Filtered Issues