Skip to content

Memory-ingest has hardcoded 35-min timeout that SIGTERMs /sync-gbrain --full on big brains; no resume from import-checkpoint #1611

@pennyultima

Description

@pennyultima

Repro

On a brain with ~2000+ staged files (~/.gstack-brain-worktree/ + transcripts), /sync-gbrain --full reliably fails the memory stage:

gstack-gbrain-sync (full):
  OK    code         registered + synced gstack-code-… (18.1s)
  ERR   memory       staged 1989 pages → gbrain import (exit 143) after 2100.2s
  OK    brain-sync   curated artifacts pushed (2.4s)

exit 143 = signal 15 (SIGTERM). Runtime 2100.2s = exactly 35 minutes.

Root cause

bin/gstack-gbrain-sync.ts:756 hardcodes:

const result = spawnSync("bun", ingestArgs, {
  encoding: "utf-8",
  timeout: 35 * 60 * 1000,   // ← hardcoded; no escape hatch for big brains
  env: buildGbrainEnv({ announce: false }),
});

(Same value also at line 602 for the code stage.)

What happens after the timeout

gbrain leaves its checkpoint on disk:

$ cat ~/.gbrain/import-checkpoint.json
{
  "dir": "/Users/x/.gstack/.staging-ingest-42625-1779217370565",
  "totalFiles": 1989,
  "processedIndex": 1000,        ← made it 50% through
  "completedFiles": 1000,
  "timestamp": "2026-05-19T19:30:05.008Z"
}

But the next /sync-gbrain doesn't resume from this checkpoint — the memory-ingest child cleans up the staging dir on SIGTERM, so the checkpoint references a dir that no longer exists. The user re-runs and the next bulk pass also gets killed at 35 min on a fresh set of files. Progress is real (1000 pages got embedded into the default source) but the verdict reads as ERR.

Two asks

1. Make the timeout configurable

const memoryTimeoutMs = parseInt(
  process.env.GSTACK_SYNC_MEMORY_TIMEOUT_MS ?? `${35 * 60 * 1000}`,
  10,
);
const codeTimeoutMs = parseInt(
  process.env.GSTACK_SYNC_CODE_TIMEOUT_MS ?? `${35 * 60 * 1000}`,
  10,
);

Default unchanged. Big-brain users set GSTACK_SYNC_MEMORY_TIMEOUT_MS=7200000 (2h) in their shell rc and the stage finishes instead of dying mid-import. Honest documentation update for the same: the embedded estimate "~25-35 min for ~11.7K transcripts = ~150ms/page synchronous" is wildly off in practice (see #1612 — actual is ~2.1s/file because gbrain isn't batching embeddings).

2. Resume from ~/.gbrain/import-checkpoint.json on next run

If the memory-ingest stage exits with SIGTERM, the staging dir should NOT be cleaned up by the SIGTERM handler — leave it on disk so the next run's gbrain import picks up at processedIndex+1. Alternatively, persist the staging dir path to ~/.gstack/.gbrain-sync-state.json so the next --full consults it first.

Net: a /sync-gbrain --full that SIGTERMs at the 35-min wall today loses 35 minutes of work. With resume, it loses zero — next run picks up where it left off.

Environment

  • macOS 15.x, Apple Silicon (M-series Mac mini)
  • gstack v1.40.0.0
  • gbrain v0.33.1.0
  • Brain size: ~188k markdown files in ~/brain/ (mostly emails from iCloud + Gmail backfill), ~2000 in ~/.gstack-brain-worktree/
  • Engine: Supabase Session Pooler (postgres)

Adjacent issue

The 2.1s/file throughput points at a deeper gbrain-side issue (single-file vs batched OpenAI embedding calls). Filed separately at garrytan/gbrain.

Happy to PR the timeout-knob change if it helps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions