Skip to content

Add daily BrainLayer DB backups#280

Merged
EtanHey merged 4 commits into
mainfrom
feat/brainlayer-db-backup
May 13, 2026
Merged

Add daily BrainLayer DB backups#280
EtanHey merged 4 commits into
mainfrom
feat/brainlayer-db-backup

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented May 13, 2026

Summary

  • add a daily SQLite online backup job for ~/.local/share/brainlayer/brainlayer.db
  • gzip snapshots and upload them to Google Drive: Brain Drive/06_ARCHIVE/backups/brainlayer-db/YYYY-MM-DD.db.gz
  • install via launchd at 03:17 local time with 30 daily + 12 monthly retention pruning
  • document restore and monthly restore drill in docs/backup-strategy.md

Verification

  • python3 -m pytest tests/test_backup_daily.py tests/test_wal_checkpoint_module.py tests/test_cli_enrich.py::test_cli_wal_checkpoint_routes_to_helper -q -> 7 passed
  • plutil -lint scripts/launchd/com.brainlayer.backup-daily.plist -> OK
  • python3 -m compileall -q src/brainlayer/backup_daily.py -> OK
  • pre-push gate -> 1826 passed, 9 skipped, 75 deselected, 1 xfailed; MCP registration 3 passed; eval/hook routing 32 passed; Bun 1 pass; regression shell pass
  • first production snapshot uploaded to Drive: 2026-05-13.db.gz, Drive file id 1tnAhWnmfpjG6Kn4peHTCGAptD0mrY8HF, size 5243572021
  • restore verification from gzip -> PRAGMA integrity_check returned ok; SELECT count(*) FROM chunks returned 361029

Notes

  • Google Drive Desktop was not mounted on this repaired Mac, so the implementation uses the existing Google Drive OAuth token from ~/.config/google-drive-mcp and uploads via Drive API.
  • Backups are HTTPS/in-Drive encrypted but not client-side encrypted; the doc calls out age/GPG as a recommended future hardening step if provider-layer protection is required.

Note

Medium Risk
Adds new automated data-backup logic plus Google Drive OAuth/token handling and remote deletion for retention; mistakes could impact backup reliability or unintentionally delete snapshots.

Overview
Implements an automated daily backup flow that creates a consistent SQLite snapshot via the online backup API, gzips it, uploads it to Google Drive using existing ~/.config/google-drive-mcp OAuth credentials, and applies retention (30 daily + 12 monthly) via remote pruning.

Adds a launchd LaunchAgent (com.brainlayer.backup-daily) and installer support (new backup target plus wrapper script install) to run at 03:17 with logs under ~/.local/share/brainlayer/logs.

Updates pyproject.toml with Google Drive client/auth deps, adds tests covering snapshot restoration, low-disk-space rejection, Drive folder creation logic, and installer/plist wiring, and documents restore/drill procedures in docs/backup-strategy.md.

Reviewed by Cursor Bugbot for commit edea74e. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • New Features

    • Added daily automated database backups to Google Drive with gzip compression.
    • Implemented automatic backup retention policy: 30 most recent daily snapshots plus latest monthly backups for 12 months.
  • Documentation

    • Added comprehensive backup and restore strategy guide with detailed recovery procedures and monthly drill checklist.
  • Tests

    • Added integration tests for backup creation, Drive upload, and installer configuration verification.
  • Chores

    • Added Google Drive API dependencies to support backup uploads.

Review Change Stack

Note

Add daily BrainLayer SQLite backups with Google Drive upload and retention

  • Adds backup_daily.py which takes an online SQLite snapshot via the backup API, gzips it, uploads to Google Drive using resumable chunked transfer with retries, and prunes old backups (keeps 30 daily + 1 per month for 12 months).
  • Schedules the backup via a macOS launchd LaunchAgent plist that runs daily at 03:17 with Nice level 15.
  • Extends install.sh with a backup subcommand to deploy the wrapper script and plist; remove also cleans them up.
  • OAuth credentials are loaded from existing local MCP config paths and refreshed automatically when expired or within 2 hours of expiry.
  • Risk: first-run requires pre-existing OAuth token files in the expected config paths; no automatic OAuth flow is included.
📊 Macroscope summarized 99cc4e1. 2 files reviewed, 1 issue evaluated, 0 issues filtered, 1 comment posted

🗂️ Filtered Issues

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 13, 2026

@cursor @BugBot review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 13, 2026

@codex review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 13, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

📝 Walkthrough

Walkthrough

This PR adds a complete daily database backup system that captures SQLite snapshots, compresses them with gzip, uploads to Google Drive with resumable protocol and retries, maintains a 30-plus-per-month retention policy, and integrates with macOS LaunchAgent for automatic scheduling at 03:17 UTC daily.

Changes

Database Backup System

Layer / File(s) Summary
Core backup module and validation
pyproject.toml, src/brainlayer/backup_daily.py, tests/test_backup_daily.py
Backup module creates SQLite snapshots via online backup API, validates integrity with PRAGMA integrity_check, compresses to gzip, loads/refreshes Google Drive OAuth tokens, manages nested folder hierarchies, uploads with resumable protocol and exponential backoff, and prunes to 30 latest daily snapshots plus one per month. Tests confirm snapshot restorationand Drive folder chain creation.
LaunchAgent scheduler and launcher script
scripts/launchd/backup-daily.sh, scripts/launchd/com.brainlayer.backup-daily.plist
Bash launcher enables strict mode, sets PATH and environment variables, execs the Python backup module with configurable interpreter. LaunchAgent plist schedules daily 03:17 execution, configures stdout/stderr logging to user directory, sets environment variables for Drive folder and Python path, and runs as background process with nice value 15.
Installation and removal integration
scripts/launchd/install.sh
Adds $BRAINLAYER_LIB_DIR path variable, new install_backup_script() helper that copies and permissions the launcher, new backup command option for installing backup infrastructure, extends all target to include backup setup, and updates remove target to unload LaunchAgent and delete launcher script.
Backup strategy and restore procedures
docs/backup-strategy.md
Documented end-to-end backup strategy covering SQLite online API approach, gzip compression, Google Drive destination, encryption posture, manual run instructions, and detailed restore drill workflow including download, decompression, integrity checks, service unload/reload, and WAL checkpoint verification.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

📦 A backup script hops each night,

Sqlite snapshots gzipped tight,

Drive keeps monthly, days so few,

Should trouble come, there's always brew! 🍺

Restore with care, the drill knows true.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title 'Add daily BrainLayer DB backups' accurately and concisely summarizes the main change: implementing a daily backup mechanism for the BrainLayer database.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/brainlayer-db-backup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cursor
Copy link
Copy Markdown

cursor Bot commented May 13, 2026

Based on my review of the PR "Add daily BrainLayer DB backups", I've identified several issues ranging from critical to minor. Here's my comprehensive analysis:

🔴 Critical Issues

1. Token Refresh Race Condition (backup_daily.py:110-114)

The OAuth token refresh logic is not thread-safe:

refresh_before = dt.datetime.now(dt.UTC).replace(tzinfo=None) + dt.timedelta(hours=2)
if creds.expired or not creds.valid or (creds.expiry and creds.expiry < refresh_before):
    creds.refresh(Request())
    token_data["access_token"] = creds.token
    token_data["expiry"] = creds.expiry.isoformat() if creds.expiry else None
    token_path.write_text(json.dumps(token_data, indent=2, sort_keys=True) + "\n")

Problem: If two backup processes run simultaneously (manual + scheduled, or launchd retry), both could:

  1. Read the same token file
  2. Both refresh
  3. Race to write back, potentially corrupting the file

Fix: Add file locking using fcntl or a lock file pattern before reading/writing tokens.

2. Non-Atomic Token File Updates (backup_daily.py:114)

Writing directly to the token file can leave it corrupted if the process crashes mid-write.

Fix: Use atomic write pattern:

temp_path = token_path.with_suffix('.tmp')
temp_path.write_text(json.dumps(token_data, indent=2, sort_keys=True) + "\n")
temp_path.replace(token_path)  # atomic on POSIX

3. No Disk Space Validation

With a 5GB database, the backup needs ~10GB free (uncompressed copy + compressed output). The staging directory could run out of space, leaving corrupted files.

Fix: Add a disk space check before starting:

statvfs = os.statvfs(staging_dir)
free_bytes = statvfs.f_bavail * statvfs.f_frsize
if free_bytes < snapshot_size_estimate * 2:
    raise RuntimeError(f"Insufficient disk space: {free_bytes} bytes free")

🟡 High-Severity Issues

4. WAL Checkpoint Mode Too Passive (backup_daily.py:60)

target.execute("PRAGMA wal_checkpoint(PASSIVE)")

PASSIVE mode doesn't guarantee the WAL is fully checkpointed. For a backup that should be self-contained, use TRUNCATE or RESTART.

Fix: Change to PRAGMA wal_checkpoint(TRUNCATE)

5. Upload Chunk Size Not Validated (backup_daily.py:231)

chunk = handle.read(min(chunk_size, total - sent))

If the file changes during upload, len(chunk) might not match the expected size, causing range header mismatches.

Fix: Add validation after read:

expected = min(chunk_size, total - sent)
if len(chunk) != expected:
    raise RuntimeError(f"File changed during upload: expected {expected} bytes, got {len(chunk)}")

6. Retry Logic Doesn't Handle Duplicate Folder Creation

In ensure_drive_folder (line 125-149), if two processes create folders simultaneously, both might see no results and both try to create, causing conflicts.

Fix: Catch duplicate errors and retry the list query.

7. Timezone-Unaware Date Handling

The backup uses dt.date.today() for naming, but prune logic doesn't consider timezone shifts. DST or timezone changes could cause incorrect pruning.

Fix: Use UTC consistently:

def _today() -> str:
    return dt.datetime.now(dt.UTC).date().isoformat()

🟢 Medium-Severity Issues

8. No Progress Reporting for Large Uploads

A 5GB upload with no progress indication makes debugging difficult. Add periodic progress prints in upload_file_to_drive_raw.

9. Hardcoded Configuration Values

  • Compression level: compresslevel=6 (line 69)
  • Retry attempts: max_attempts=30
  • Chunk size: 8 * 1024 * 1024

These should be configurable via environment variables for different network conditions.

10. Test Coverage Gaps

Missing tests for:

  • Token refresh logic
  • Prune retention with same-month snapshots
  • Upload resume after network failure
  • Disk space exhaustion

⚪ Minor Issues

11. Inconsistent Import Placement

Some imports are at function level (lines 78-79, 120) while most are at module level. This is intentional for optional dependencies but inconsistent with style guides.

12. Error Context Too Minimal (backup_daily.py:361)

print(f"brainlayer backup failed: {exc}", flush=True)

Should include traceback for debugging production failures:

import traceback
print(f"brainlayer backup failed: {exc}\n{traceback.format_exc()}", flush=True)

13. Documentation Misleading on Encryption

Backups are encrypted in transit by HTTPS and at rest by Google Drive.

"Encrypted at rest by Google Drive" means Google holds the keys. This should be clearer about the threat model.

Suggested rewording: "Backups are encrypted in transit (HTTPS) and at rest by Google's infrastructure. Google holds the encryption keys. For protection against provider-level access, add client-side encryption using age or GPG before upload."

📋 Overall Assessment

Verdict: Request Changes ⚠️

The implementation is solid and well-tested for the happy path, but has several critical concurrency and reliability issues that could cause production failures:

  1. Token file corruption from concurrent access
  2. Disk space exhaustion leading to corrupted backups
  3. WAL checkpoint mode leaving incomplete backups

These should be fixed before merging to production, especially given that this is a data durability feature where failures are silent until you need the backup.

Strengths:

  • Excellent use of SQLite's online backup API
  • Good retry logic for network failures
  • Comprehensive documentation
  • Production verification completed

Recommended Actions:

  1. Fix critical issues fix: Phase 3 core fixes — DB paths, date filtering, search metadata #1-3
  2. Address high-severity issues test: Phase 4 QA — comprehensive tests for Phase 3 core fixes #4-7
  3. Add integration test for concurrent backup attempts
  4. Consider adding a backup validation step that runs PRAGMA integrity_check on the uploaded file
Open in Web Open in Cursor 

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f0f0739d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

export PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:$HOME/.local/bin"
export PYTHONUNBUFFERED=1

exec "${BRAINLAYER_PYTHON:-python3}" -m brainlayer.backup_daily
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the installed Python for backups

When BrainLayer is installed in a venv, pipx, or any non-default interpreter, the existing launchd jobs still work because their plists execute __BRAINLAYER_BIN__, but this wrapper ignores that configured executable and falls back to launchd's python3. In that setup the scheduled backup can fail at import time for project or Google Drive dependencies even though scripts/launchd/install.sh successfully found a working BrainLayer install; the backup job should run with the same interpreter/environment as the installed CLI or have the installer populate BRAINLAYER_PYTHON.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/backup-strategy.md`:
- Line 87: The mv command shown uses the glob brainlayer.db* which will move the
main SQLite file and its auxiliary files; update the docs to explicitly list the
exact files matched (e.g., brainlayer.db, brainlayer.db-wal, brainlayer.db-shm)
and clarify that the command preserves the main DB plus WAL and shared-memory
files by moving brainlayer.db, brainlayer.db-wal, and brainlayer.db-shm into
corupt-$ts; mention that other similarly named files will also be included by
the wildcard and recommend verifying the moved files with ls before and after
running the mv command.

In `@src/brainlayer/backup_daily.py`:
- Around line 191-200: The function upload_file_to_drive is defined but never
used (upload_file_to_drive_raw is used instead); remove the unused
upload_file_to_drive function and its internal import of MediaFileUpload to
reduce dead code, or alternatively replace the call sites that use
upload_file_to_drive_raw with upload_file_to_drive if you intend to switch to
the resumable MediaFileUpload implementation—update only the function definition
(upload_file_to_drive) and any references so there are no lingering unused
imports or duplicate semantics.
- Around line 91-110: The code mixes naive and aware datetimes around
parsed_expiry, Credentials(... expiry=parsed_expiry) and refresh_before; make
all datetimes timezone-aware in UTC instead of stripping tzinfo: when computing
parsed_expiry (in the block that sets parsed_expiry) keep the tzinfo (remove
.replace(tzinfo=None)) and ensure it is converted to UTC with
.astimezone(dt.UTC), then pass that aware datetime into Credentials; likewise
create refresh_before as dt.datetime.now(dt.UTC) + dt.timedelta(hours=2) (keep
tzinfo) so the comparison in the if checking creds.expired / creds.valid /
creds.expiry < refresh_before uses consistent timezone-aware datetimes.
- Around line 354-358: The code currently falls back from
BRAINLAYER_BACKUP_DRIVE_FOLDER to BRAINLAYER_BACKUP_DRIVE_PATH when computing
folder_parts (using DEFAULT_FOLDER_PARTS as a final default); add a brief inline
comment next to that expression explaining that BRAINLAYER_BACKUP_DRIVE_PATH is
checked for backward compatibility/alternate config and that
BRAINLAYER_BACKUP_DRIVE_FOLDER is the preferred variable—reference the variables
BRAINLAYER_BACKUP_DRIVE_FOLDER, BRAINLAYER_BACKUP_DRIVE_PATH and
DEFAULT_FOLDER_PARTS in the comment so future readers know the intended
precedence.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9839ee43-e028-4724-ac21-df792d1aa44d

📥 Commits

Reviewing files that changed from the base of the PR and between 2d934bd and 0f0f073.

📒 Files selected for processing (7)
  • docs/backup-strategy.md
  • pyproject.toml
  • scripts/launchd/backup-daily.sh
  • scripts/launchd/com.brainlayer.backup-daily.plist
  • scripts/launchd/install.sh
  • src/brainlayer/backup_daily.py
  • tests/test_backup_daily.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.11)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: Macroscope - Correctness Check
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

  • src/brainlayer/backup_daily.py
  • tests/test_backup_daily.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

  • src/brainlayer/backup_daily.py
🔇 Additional comments (19)
docs/backup-strategy.md (4)

5-24: LGTM!


25-33: LGTM!


35-53: LGTM!


112-121: LGTM!

pyproject.toml (1)

38-39: LGTM!

src/brainlayer/backup_daily.py (5)

1-32: LGTM!


42-74: LGTM!


125-158: LGTM!


277-324: LGTM!


202-265: ⚡ Quick win

Implementation matches Google Drive API v3 resumable upload specification.

The custom resumable upload correctly follows the Drive API v3 protocol:

  • Initiates the resumable session with proper Authorization, Content-Type, and X-Upload-Content-* headers
  • Sends chunks via PUT with correct Content-Range format (bytes start-end/total)
  • Handles status codes correctly: 200/201 for completion, 308 for resume (parsing Range header to determine next offset), and 429/5xx for retryable errors
  • Range header parsing (extracting the final byte position and adding 1) is correct; the fallback to end + 1 when Range header is missing is reasonable for 308 responses

No changes needed.

tests/test_backup_daily.py (3)

6-33: LGTM!


36-88: LGTM!


91-105: LGTM!

scripts/launchd/backup-daily.sh (1)

1-7: LGTM!

scripts/launchd/com.brainlayer.backup-daily.plist (1)

1-43: LGTM!

scripts/launchd/install.sh (4)

12-12: LGTM!


20-20: LGTM!

Also applies to: 32-32


100-112: LGTM!


146-149: LGTM!

Also applies to: 155-156, 166-167, 170-170

Comment thread docs/backup-strategy.md
Comment thread src/brainlayer/backup_daily.py Outdated
Comment on lines +91 to +110
expiry = token_data.get("expiry")
if not expiry and token_data.get("expiry_date"):
expiry = dt.datetime.fromtimestamp(int(token_data["expiry_date"]) / 1000, tz=dt.UTC).isoformat()

parsed_expiry = dt.datetime.fromisoformat(expiry.replace("Z", "+00:00")) if expiry else None
if parsed_expiry and parsed_expiry.tzinfo:
parsed_expiry = parsed_expiry.astimezone(dt.UTC).replace(tzinfo=None)

creds = Credentials(
token=token_data.get("access_token"),
refresh_token=token_data.get("refresh_token"),
token_uri=client_data["token_uri"],
client_id=client_data["client_id"],
client_secret=client_data["client_secret"],
scopes=token_data.get("scope", " ".join(DRIVE_SCOPES)).split(),
expiry=parsed_expiry,
)

refresh_before = dt.datetime.now(dt.UTC).replace(tzinfo=None) + dt.timedelta(hours=2)
if creds.expired or not creds.valid or (creds.expiry and creds.expiry < refresh_before):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider simplifying timezone handling.

The credential refresh logic mixes timezone-aware and naive datetimes (lines 95-97 parse to aware, then convert to naive; line 109 creates naive UTC). While this works for google-auth's Credentials API, mixing aware and naive datetimes is fragile and discouraged in Python's datetime best practices.

If refactoring later, consider: keep all datetimes timezone-aware throughout, or clearly document why naive UTC is required here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/brainlayer/backup_daily.py` around lines 91 - 110, The code mixes naive
and aware datetimes around parsed_expiry, Credentials(... expiry=parsed_expiry)
and refresh_before; make all datetimes timezone-aware in UTC instead of
stripping tzinfo: when computing parsed_expiry (in the block that sets
parsed_expiry) keep the tzinfo (remove .replace(tzinfo=None)) and ensure it is
converted to UTC with .astimezone(dt.UTC), then pass that aware datetime into
Credentials; likewise create refresh_before as dt.datetime.now(dt.UTC) +
dt.timedelta(hours=2) (keep tzinfo) so the comparison in the if checking
creds.expired / creds.valid / creds.expiry < refresh_before uses consistent
timezone-aware datetimes.

Comment thread src/brainlayer/backup_daily.py Outdated
Comment thread src/brainlayer/backup_daily.py
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 13, 2026

Addressed review durability feedback in 99cc4e1: added token-file locking plus atomic token writes, backup staging free-space guard, stricter snapshot checkpoint, upload chunk-length validation/progress, traceback logging, and clearer encryption threat-model docs. Local focused checks passed and the full pre-push gate passed (1827 passed, MCP 3 passed, eval/hook 32 passed, Bun 1 pass, regression shell pass).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99cc4e1045

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +54 to +55
required_bytes = (db_path.stat().st_size * 2) + (512 * 1024 * 1024)
free_bytes = shutil.disk_usage(output_dir).free
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Account for WAL size in the space check

When the live database has a large uncheckpointed brainlayer.db-wal, this estimate only counts brainlayer.db, but the SQLite online backup materializes committed WAL pages into raw_snapshot. In the known WAL-heavy cases, a tiny main DB plus multi-GB WAL can pass this check with far less free space than the backup actually needs, then fail midway instead of refusing up front; include the WAL/logical backup size in the required-space calculation.

Useful? React with 👍 / 👎.

date_stamp: str | None = None,
upload: bool = True,
) -> dict[str, Any]:
snapshot = create_sqlite_backup_gzip(db_path or get_db_path(), staging_dir, date_stamp=date_stamp)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prune or delete local staging snapshots

With the default launchd path, every successful run leaves a dated .db.gz in ~/.local/share/brainlayer/backups, and the later retention code only deletes Drive files. For a large BrainLayer DB this unbounded local cache can eventually fill the same disk used by the live DB and prevent future writes/backups, so either remove the staging artifact after upload or apply the same retention policy locally.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 13, 2026

Addressed the latest CodeRabbit actionable comments in edea74e: clarified the restore wildcard docs, removed the unused MediaFileUpload helper/dead code, documented env-var precedence, and verified google-auth requires naive UTC expiry values (kept with inline comment after real get_drive_credentials() smoke test). Validation: focused backup tests 8 passed; ruff clean; full pre-push gate 1827 passed, MCP 3 passed, eval/hook 32 passed, Bun 1 pass, regression shell pass.

Comment on lines +268 to +275
while True:
result = (
service.files()
.list(
q=f"'{folder_id}' in parents and trashed = false",
spaces="drive",
fields="nextPageToken,files(id,name)",
pageSize=1000,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High brainlayer/backup_daily.py:268

When a 308 response lacks a Range header, the code sets sent = end + 1, assuming the chunk was received. Per the Drive resumable upload protocol, a missing Range header means the server received 0 bytes, so sent should remain unchanged. This causes the upload to skip that chunk, resulting in a corrupted backup file on Drive.

                    if response.status_code == 308:
                        uploaded_range = response.headers.get("Range")
                        if uploaded_range and "-" in uploaded_range:
                            sent = int(uploaded_range.rsplit("-", 1)[1]) + 1
                        else:
-                            sent = end + 1
+                            pass  # Server received 0 bytes; retry same chunk
                        print(f"drive upload progress: {sent}/{total} bytes", flush=True)
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/brainlayer/backup_daily.py around lines 268-275:

When a 308 response lacks a `Range` header, the code sets `sent = end + 1`, assuming the chunk was received. Per the Drive resumable upload protocol, a missing `Range` header means the server received 0 bytes, so `sent` should remain unchanged. This causes the upload to skip that chunk, resulting in a corrupted backup file on Drive.

Evidence trail:
src/brainlayer/backup_daily.py lines 270-273 (REVIEWED_COMMIT): `else: sent = end + 1` when Range header is missing from 308 response. Google Drive resumable upload docs: https://developers.google.com/workspace/drive/api/guides/manage-uploads — absent Range header in 308 means 0 bytes received.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edea74e33d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

start = sent
end = sent + len(chunk) - 1
headers = {
"Authorization": f"Bearer {credentials.token}",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Refresh Drive credentials on upload 401s

For multi-GB backups on a slow connection, the OAuth access token can expire after the resumable upload starts; each chunk's Authorization header is built from the same credentials.token, and a 401 falls through to raise_for_status() so the retry loop just resends the stale token up to max_attempts before failing. Refresh the credentials and rebuild the header when Drive returns 401/403 auth failures so long uploads can complete reliably.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit edea74e. Configure here.

f"sleeping {sleep_seconds}s",
flush=True,
)
time.sleep(sleep_seconds)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-retryable HTTP errors retried 30 times

Medium Severity

response.raise_for_status() on line 239 raises an HTTPError for non-retryable 4xx client errors (e.g., 400, 401, 403, 404), which is then caught by the generic except Exception handler on line 240. This causes permanent client errors to be retried up to 30 times with exponential backoff (up to 60s per sleep), potentially wasting 30+ minutes before ultimately failing. Only 429 and 5xx errors are intentionally retryable; other 4xx errors indicate a permanent problem (expired token, bad request, forbidden resource) that retrying won't fix.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit edea74e. Configure here.


# google-auth Credentials.expired compares against a naive UTC helper, so keep expiry comparisons naive UTC.
refresh_before = dt.datetime.now(dt.UTC).replace(tzinfo=None) + dt.timedelta(hours=2)
if creds.expired or not creds.valid or (creds.expiry and creds.expiry < refresh_before):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refresh threshold always exceeds Google token lifetime

Low Severity

refresh_before is set to now + 2 hours, but Google OAuth access tokens have a ~1-hour lifetime. Since a fresh token's expiry (~now + 1h) is always less than refresh_before (~now + 2h), the condition creds.expiry < refresh_before is always true, causing a token refresh on every single invocation. Combined with run_backup calling get_drive_credentials() twice (once directly, once inside build_drive_service()), this results in two unnecessary Google token refreshes per backup run.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit edea74e. Configure here.

uploaded = upload_file_to_drive_raw(snapshot, folder_id, credentials)
deleted = prune_drive_backups(service, folder_parts=folder_parts)
result.update({"uploaded": True, "drive_file": uploaded, "retention_deleted": deleted})
return result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local staging snapshots never pruned, accumulating indefinitely

Medium Severity

run_backup prunes remote Drive backups via prune_drive_backups but never removes old local .db.gz files from staging_dir. Each daily run of create_sqlite_backup_gzip creates a new date-stamped file (~5GB per the PR verification) that persists indefinitely. After 30 days, ~150GB of local disk is consumed; the free-space guard will eventually reject new backups, silently breaking the entire backup pipeline.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit edea74e. Configure here.

@EtanHey EtanHey merged commit ec4a544 into main May 13, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant