Skip to content

feat: pluggable scheduler (local sqlite + Workstacean adapter)#156

Merged
mabry1985 merged 6 commits into
mainfrom
feat/agent-scheduler
Apr 28, 2026
Merged

feat: pluggable scheduler (local sqlite + Workstacean adapter)#156
mabry1985 merged 6 commits into
mainfrom
feat/agent-scheduler

Conversation

@mabry1985
Copy link
Copy Markdown
Contributor

@mabry1985 mabry1985 commented Apr 28, 2026

Adds a default scheduler so the agent can defer work to itself — reminders, recurring sweeps, deadline check-ins. Three new tools (schedule_task, list_schedules, cancel_schedule) ship behind one SchedulerBackend protocol with two implementations.

Why now

Forks (and the multiple ginas about to spin up) need a way to say "remind me tomorrow about X" or "every Monday morning summarize last week's logs". The protoLabs fleet already has Workstacean for this; solo / local-dev forks don't. Same tool surface, env-selected backend, no fork-level decision required at tool-registration time.

Multi-agent isolation (the architectural headline)

Every job is namespaced by AGENT_NAME so spinning up gina-personal alongside gina-work on one box (or sharing one Workstacean install across N forks) doesn't cross-fire scheduled prompts.

Backend How it isolates
Local <scheduler_dir>/<agent_name>/jobs.db, plus agent_name filter on every read/write
Workstacean Job IDs prefixed <agent_name>-...; topics namespaced cron.<agent_name>.<job_id>

Verified with explicit cross-agent-cancel-blocked tests.

Backends

  • LocalScheduler (default) — sqlite + asyncio polling task on FastAPI's startup hook. Fires by POSTing message/send to the agent's own /a2a endpoint (bearer + X-API-Key forwarded), so the audit log and cost-v1 capture see scheduled fires identically to real callers. Cron via croniter, ISO one-shot via datetime.fromisoformat. Missed-fire recovery within 24h matches Workstacean's behaviour; older fires roll forward without firing to avoid flooding the agent after a long downtime.
  • WorkstaceanScheduler — HTTP adapter to POST /publish. Activated automatically when WORKSTACEAN_API_BASE + WORKSTACEAN_API_KEY env vars are set.

Wiring summary

  • New module: scheduler/{__init__,interface,local,workstacean}.py
  • tools/lg_tools.py::_build_scheduler_tools factory; get_all_tools(knowledge_store, scheduler=) takes a new optional kwarg.
  • server.py::_build_scheduler() picks backend, on_event("startup"/"shutdown") drives lifecycle, reload reuses the running instance.
  • Worker subagent's tool allowlist updated.
  • croniter>=2.0 added to requirements.txt.

Workstacean alignment gap (issue incoming)

Workstacean's existing channels are signal / cli — there's no native A2A delivery channel today. Forks running the Workstacean adapter need to wire a bridge that subscribes to cron.<agent>.* and POSTs to the agent's /a2a endpoint. I'll file an alignment issue on protoLabsAI/protoWorkstacean proposing a built-in A2A channel so this stops being a fork-level chore.

Tests

  • tests/test_scheduler_local.py — 32 cases: add/list/cancel, multi-agent isolation, reschedule vs delete, missed-fire recovery (recent vs stale), an end-to-end fire-the-job path with httpx monkeypatched.
  • tests/test_scheduler_workstacean.py — 16 cases: publish-payload shape, ID/topic namespacing, custom topic prefix, HTTP error handling, list_jobs intentionally empty.
  • All 216 tests in scope pass; the same single pre-existing test_subscribe_endpoint_origin_rejected failure that reproduces on main is unrelated.

Test plan

  • python server.py and confirm [scheduler] local backend started: agent=protoagent db=... shows up at startup.
  • Ask the agent in the chat UI: "Schedule a reminder for 30 seconds from now to say hello" — confirm it calls schedule_task with a near-future ISO and a fresh prompt arrives 30s later (visible as a new turn, not a continuation).
  • export WORKSTACEAN_API_BASE=http://ava-host:3000 WORKSTACEAN_API_KEY=... then restart — confirm [scheduler] workstacean backend ready: agent=protoagent base=http://... and that schedule_task calls publish to that endpoint.
  • Spin up two forks (AGENT_NAME=gina-personal and AGENT_NAME=gina-work) sharing SCHEDULER_DB_DIR; confirm list_schedules from each only sees its own jobs.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Built-in scheduler (local + Workstacean backends) with cron and one‑shot jobs, per‑agent isolation, and three user actions: schedule task, list schedules, cancel schedule
    • Config toggle and env var to enable/disable scheduler and control tool visibility
  • Documentation

    • New scheduler how‑to and updated configuration reference covering backends, behavior, env vars, and usage
  • Tests

    • Comprehensive scheduler and adapter test suites
  • Chores

    • Added cron parsing runtime dependency

Adds a default scheduler so agents can defer work to themselves —
"remind me tomorrow", recurring sweeps, deadline check-ins. Three
new tools land in get_all_tools() when a backend is wired up:
schedule_task, list_schedules, cancel_schedule.

Two backends ship behind a single SchedulerBackend protocol:

- LocalScheduler (default): sqlite + asyncio polling. Per-agent
  jobs.db at /sandbox/scheduler/<agent_name>/ with a
  ~/.protoagent/scheduler/<agent_name>/ fallback. Fires by POSTing
  message/send to the running agent's own /a2a endpoint, going
  through bearer + X-API-Key auth like a real caller (audit log +
  cost-v1 capture work the same). Cron expressions reschedule via
  croniter; ISO datetimes are one-shot. Missed-fire recovery:
  within 24h fires immediately, older fires roll forward without
  firing.

- WorkstaceanScheduler: HTTP adapter to a Workstacean install's
  POST /publish. Activated automatically when
  WORKSTACEAN_API_BASE and WORKSTACEAN_API_KEY env vars are set.
  Topic and job IDs are namespaced cron.<agent_name>.<job_id>
  so a single Workstacean can serve N protoAgent forks safely.

Multi-agent isolation is the headline architectural property —
spinning up gina-personal alongside gina-work on the same box (or
sharing one Workstacean) won't cross-fire scheduled prompts.
Verified with explicit tests in test_scheduler_local.py.

Wiring:
- scheduler/{__init__,interface,local,workstacean}.py — module
- tools/lg_tools.py — _build_scheduler_tools factory; get_all_tools
  takes a new optional scheduler= kwarg
- graph/agent.py — create_agent_graph and create_simple_agent
  accept scheduler=
- server.py — _build_scheduler() picks backend at boot,
  on_event("startup"/"shutdown") drives the polling task lifecycle,
  reload path reuses the running scheduler instance
- config/langgraph-config.yaml + graph/{config,subagents/config}.py
  — worker subagent gets the three new tools in its allowlist
- requirements.txt — croniter>=2.0

Tests: 48 new (test_scheduler_local.py covers add/list/cancel,
multi-agent isolation, reschedule-vs-delete, missed-fire recovery,
and an end-to-end fire path with httpx mocked; test_scheduler_workstacean.py
covers all the publish payload assertions, namespacing,
custom topic prefix, and HTTP error handling).

Docs: docs/guides/scheduler.md (Diataxis how-to with the firing
model, multi-agent story, env reference, and notes on the
Workstacean A2A-bridge gap), plus index/configuration/README/
TEMPLATE updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a pluggable scheduler subsystem (Local sqlite poller and Workstacean HTTP adapter), wires scheduler into server and agent creation, exposes three agent tools (schedule_task, list_schedules, cancel_schedule), updates config/UI/defaults, and adds docs and tests.

Changes

Cohort / File(s) Summary
Scheduler Core & API
scheduler/__init__.py, scheduler/interface.py
New scheduler package and public API: Job dataclass, SchedulerBackend protocol, cron/ISO helpers, and package exports.
Local Scheduler Backend
scheduler/local.py
New LocalScheduler: per-agent SQLite persistence, cron/ISO parsing (croniter), missed-fire recovery, async polling loop that POSTs JSON-RPC to agent /a2a, and full job lifecycle (add/cancel/list). Review DB schema, polling/claim logic, HTTP invoke/error handling, and croniter usage.
Workstacean Adapter
scheduler/workstacean.py
New WorkstaceanScheduler: publishes scheduling commands to /publish with namespaced IDs/topics, validates schedules, cancel/add via HTTP, list_jobs() stub. Check payload namespacing, idempotency, and auth header handling.
Server wiring
server.py
Introduces global _scheduler, _build_scheduler, async start/stop hooks, reconciles scheduler on reload, and injects scheduler into agent graph creation. Inspect lifecycle, async start/stop and error logging.
Agent & Tools Integration
graph/agent.py, tools/lg_tools.py
Agent creation signatures now accept scheduler; scheduler tools added (schedule_task, list_schedules, cancel_schedule) backed by thread-offloaded calls to scheduler API. Review async-to-thread bridging and tool output formatting.
Config & Defaults
config/langgraph-config.yaml, graph/config.py, graph/subagents/config.py
Adds middleware.scheduler toggle mapped to scheduler_enabled, enables scheduler tool names in worker subagent default allowlist. Confirm mapping and default semantics.
Tool discovery & UI
graph/config_io.py, tests/test_config_io.py
config_to_dict and list_available_tools() expose scheduler and memory tool names for UI; tests updated to expect new names and uniqueness.
Docs
docs/guides/index.md, docs/guides/scheduler.md, docs/reference/configuration.md
Adds scheduler guide and configuration docs describing backends, env vars, namespacing, local behavior, Workstacean flow, and disable toggles.
Dependencies
requirements.txt
Adds croniter>=2.0 for cron parsing in LocalScheduler.
Tests
tests/test_scheduler_local.py, tests/test_scheduler_workstacean.py
New test suites covering local/workstacean schedulers, cron/ISO helpers, add/list/cancel semantics, missed-fire recovery, HTTP publish payloads, and config_io tool-listing.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Agent (invoker)
    participant Server as LangGraph Server
    participant Scheduler as LocalScheduler
    participant LocalDB as SQLite (per-agent jobs.db)
    participant A2A as /a2a endpoint

    Agent->>Server: call tool schedule_task(prompt, schedule)
    Server->>Scheduler: add_job(prompt, schedule)
    Scheduler->>LocalDB: INSERT job (compute next_fire)
    Scheduler-->>Server: return Job.id
    loop polling
      Scheduler->>LocalDB: query due jobs (next_fire <= now)
      LocalDB-->>Scheduler: return due jobs
      Scheduler->>A2A: POST JSON-RPC message/send (includes job metadata)
      A2A->>Server: receive scheduled message
      Server->>Agent: execute scheduled prompt
      Scheduler->>LocalDB: reschedule (cron) or delete (one-shot)
    end
Loading
sequenceDiagram
    participant Agent as Agent (invoker)
    participant Server as LangGraph Server
    participant Work as Workstacean API
    participant Bridge as Bridge/Dispatcher

    Agent->>Server: call tool schedule_task(prompt, schedule)
    Server->>Work: POST /publish command.schedule (namespaced id & topic)
    Work-->>Server: 2xx ack
    Note over Work,Bridge: Workstacean later emits scheduled topic
    Work->>Bridge: publish topic cron.<agent>.<job>
    Bridge->>Server: forward to /a2a
    Server->>Agent: execute scheduled prompt
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: introducing a pluggable scheduler with two implementations (local sqlite and Workstacean adapter), which is the primary feature of this PR.
Description check ✅ Passed The PR description provides a comprehensive summary of changes, explains the architectural design (multi-agent isolation), documents both scheduler backends, details the wiring changes, includes test results, and provides a concrete test plan with checkboxes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-scheduler

Comment @coderabbitai help to get the list of available commands and usage tips.

@mabry1985
Copy link
Copy Markdown
Contributor Author

Filed the Workstacean alignment ask: protoLabsAI/protoWorkstacean#507 — proposes a native channel: "a2a" so forks don't each have to write their own bridge between Workstacean fires and the agent's /a2a endpoint. The publish-side implementation in this PR already emits the right payload shape; once #507 lands, no protoAgent changes are needed — the adapter starts delivering end-to-end.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/guides/scheduler.md`:
- Around line 9-16: Replace the typo "multiple ginas" in the scheduler guide
section (the bullet starting "You want forks (or your own multiple ginas) to
support reminders...") with a clear term such as "multiple agents" or "multiple
instances"; update that exact phrase "multiple ginas" to "multiple agents" (or
"multiple instances" if preferred) so the bullet reads unambigously and matches
surrounding terminology.

In `@scheduler/__init__.py`:
- Line 27: The __all__ list in scheduler/__init__.py is not alphabetically
ordered which triggers Ruff RUF022; update the exported names list so it is
sorted lexicographically (e.g., order the identifiers Job, LocalScheduler,
SchedulerBackend, WorkstaceanScheduler) to satisfy the linter and avoid noisy
warnings.

In `@scheduler/local.py`:
- Around line 141-149: In _init_db, replace the log.error call in the
sqlite3.DatabaseError except block with log.exception to preserve the traceback;
keep the same contextual message (including self.path) and pass the caught
exception (exc) so the full stack trace is recorded from the _init_db method
when schema init fails.

In `@scheduler/workstacean.py`:
- Around line 15-18: The docstring for Workstacean.list_jobs() is wrong — the
method currently returns an empty list and does not issue a `list` command;
update the docstring on the Workstacean class and the list_jobs() method to
state that the adapter is fire-and-forget, that Workstacean does not query the
schedule and therefore returns an empty list, and recommend using the local
backend for strict local introspection (remove or correct any text claiming it
"issues a `list` command" or waits on the `schedule.list` topic).

In `@server.py`:
- Around line 141-201: _build_scheduler is constructing LocalScheduler at import
time using the stale module-level _active_port (default 7870), causing wrong
invoke_url when the real port is parsed later in _main; either defer scheduler
construction until after port parsing (move the call out of
_init_langgraph_agent and into _main after _active_port is set) or change
_build_scheduler to accept the actual port (e.g., add a port parameter used to
compute invoke_url) and call it with the parsed port in _main; update references
to _build_scheduler, _init_langgraph_agent, _main, LocalScheduler, _active_port
and keep honoring SCHEDULER_INVOKE_URL if present.
- Around line 844-867: The lifecycle hook functions _scheduler_startup and
_scheduler_shutdown lack explicit return type annotations; add "-> None" to both
async def signatures (async def _scheduler_startup() -> None: and async def
_scheduler_shutdown() -> None:) so static analyzers know they return nothing;
keep the bodies unchanged and preserve the `@fastapi_app.on_event`("startup") and
`@fastapi_app.on_event`("shutdown") decorators.

In `@tests/test_scheduler_local.py`:
- Around line 75-77: Update the test_malformed_raises test to assert the
specific ValueError message by adding a match parameter to pytest.raises; locate
the test function test_malformed_raises and the parse_iso_to_utc call, then
change the context manager to something like with pytest.raises(ValueError,
match="expected error text or regex"): to narrow the assertion to the correct
error message (use a regex that matches the parse_iso_to_utc error string).

In `@tests/test_scheduler_workstacean.py`:
- Around line 103-105: The test test_malformed_schedule_rejected currently uses
with pytest.raises(ValueError): which is too broad; update the pytest.raises
call to include a match argument that matches the expected error message raised
by adapter.add_job("hi", "not-a-schedule", job_id="x") (e.g., a substring like
"invalid schedule" or the exact message produced by add_job) so the assertion
only passes for the intended ValueError from adapter.add_job.

In `@tools/lg_tools.py`:
- Line 478: The current return expression that yields f"Canceled {job_id}." if
ok else f"Error: no such job {job_id}." conflates “not found” with any backend
failure; update the surrounding function (the one returning that ternary using
job_id) to distinguish failure causes by inspecting the cancel operation result
or error payload (e.g., change the cancel call to return (ok, err) or catch
exceptions), then return "Canceled {job_id}." on success, "Error: no such job
{job_id}." only when the backend explicitly indicates a missing job, and for
other failures return a different message like "Error cancelling {job_id}:
{error_message}" and log the backend error for operators. Ensure you reference
and propagate the actual error from the cancellation call instead of coercing
all False results into “no such job.”
- Line 433: The scheduler calls are blocking and must be offloaded to a
threadpool: import asyncio and change direct calls like job =
scheduler.add_job(prompt, when, job_id=job_id) to await
asyncio.to_thread(scheduler.add_job, prompt, when, job_id=job_id), and likewise
wrap scheduler.list_jobs(...) and scheduler.cancel_job(...) (and any internal
blocking methods such as _publish()/httpx.post or sqlite3 access) with await
asyncio.to_thread(...). Ensure the functions that call these are async so they
can await the to_thread calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5ccf14bb-a2f8-4721-bd91-4a235848c3cf

📥 Commits

Reviewing files that changed from the base of the PR and between 3749914 and 0609c67.

⛔ Files ignored due to path filters (2)
  • README.md is excluded by !*.md
  • TEMPLATE.md is excluded by !*.md
📒 Files selected for processing (16)
  • config/langgraph-config.yaml
  • docs/guides/index.md
  • docs/guides/scheduler.md
  • docs/reference/configuration.md
  • graph/agent.py
  • graph/config.py
  • graph/subagents/config.py
  • requirements.txt
  • scheduler/__init__.py
  • scheduler/interface.py
  • scheduler/local.py
  • scheduler/workstacean.py
  • server.py
  • tests/test_scheduler_local.py
  • tests/test_scheduler_workstacean.py
  • tools/lg_tools.py

Comment thread docs/guides/scheduler.md
Comment thread scheduler/__init__.py Outdated
Comment thread scheduler/workstacean.py Outdated
Comment thread server.py Outdated
Comment on lines +141 to +201
def _build_scheduler(config):
"""Return the active scheduler backend, or ``None`` when disabled.

Selection order:

1. ``WORKSTACEAN_API_BASE`` + ``WORKSTACEAN_API_KEY`` set →
``WorkstaceanScheduler``. Forks running on the protoLabs fleet
infrastructure get this for free.
2. Otherwise → ``LocalScheduler`` with sqlite at
``/sandbox/scheduler/<agent_name>/jobs.db``.

Returns ``None`` when explicitly disabled via ``SCHEDULER_DISABLED=1``
so a fork can ship without a scheduler at all.

The agent's auth token + api-key are passed into the local backend
so its self-invocation HTTP call can pass through bearer / X-API-Key
auth — the scheduler hits the same A2A endpoint as a real caller.
"""
if os.environ.get("SCHEDULER_DISABLED", "").lower() in ("1", "true", "yes"):
log.info("[server] scheduler disabled via SCHEDULER_DISABLED env")
return None

name = agent_name()
workstacean_base = os.environ.get("WORKSTACEAN_API_BASE", "").strip()
workstacean_key = os.environ.get("WORKSTACEAN_API_KEY", "").strip()
if workstacean_base and workstacean_key:
try:
from scheduler import WorkstaceanScheduler
return WorkstaceanScheduler(
agent_name=name,
base_url=workstacean_base,
api_key=workstacean_key,
topic_prefix=os.environ.get("WORKSTACEAN_TOPIC_PREFIX") or None,
)
except Exception as exc:
log.warning(
"[server] WorkstaceanScheduler init failed: %s; falling back to local",
exc,
)

try:
from scheduler import LocalScheduler
invoke_url = os.environ.get(
"SCHEDULER_INVOKE_URL",
f"http://127.0.0.1:{_active_port}",
)
bearer = (config.auth_token or os.environ.get("A2A_AUTH_TOKEN", "")).strip()
api_key_env = f"{name.upper()}_API_KEY"
api_key = os.environ.get(api_key_env, "").strip()
return LocalScheduler(
agent_name=name,
invoke_url=invoke_url,
api_key=api_key,
bearer_token=bearer,
)
except Exception as exc:
log.warning(
"[server] LocalScheduler init failed: %s; running scheduler-less",
exc,
)
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential issue: _active_port may not reflect the actual port at scheduler construction time.

_build_scheduler is called from _init_langgraph_agent(), which runs at module import time (before _main() parses --port). At that point, _active_port is still the default 7870. If the server is started with a different port (e.g., --port 8080), the LocalScheduler will be constructed with invoke_url="http://127.0.0.1:7870" instead of the actual port.

The SCHEDULER_INVOKE_URL env var provides a workaround, but the default behavior is incorrect when using non-default ports.

Possible fix: defer scheduler construction or recompute invoke_url

One option is to move scheduler construction into _main() after port parsing:

 def _main():
     global _active_port
+    global _scheduler

     parser = argparse.ArgumentParser(...)
     parser.add_argument("--port", type=int, default=7870)
     args = parser.parse_args()
     _active_port = args.port

     # ... tracing/metrics init ...

     _init_langgraph_agent()
+    # Rebuild scheduler with correct port if needed
+    if _scheduler and hasattr(_scheduler, '_invoke_url'):
+        _scheduler._invoke_url = f"http://127.0.0.1:{_active_port}"

Or document that SCHEDULER_INVOKE_URL must be set when using non-default ports.

🧰 Tools
🪛 Ruff (0.15.12)

[warning] 141-141: Missing return type annotation for private function _build_scheduler

(ANN202)


[warning] 175-175: Do not catch blind exception: Exception

(BLE001)


[warning] 196-196: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server.py` around lines 141 - 201, _build_scheduler is constructing
LocalScheduler at import time using the stale module-level _active_port (default
7870), causing wrong invoke_url when the real port is parsed later in _main;
either defer scheduler construction until after port parsing (move the call out
of _init_langgraph_agent and into _main after _active_port is set) or change
_build_scheduler to accept the actual port (e.g., add a port parameter used to
compute invoke_url) and call it with the parsed port in _main; update references
to _build_scheduler, _init_langgraph_agent, _main, LocalScheduler, _active_port
and keep honoring SCHEDULER_INVOKE_URL if present.

Comment thread server.py
Comment thread tests/test_scheduler_workstacean.py
Comment thread tools/lg_tools.py Outdated
Comment thread tools/lg_tools.py Outdated
ok = scheduler.cancel_job(job_id)
except Exception as exc: # noqa: BLE001
return f"Error: scheduler cancel_job failed: {exc}"
return f"Canceled {job_id}." if ok else f"Error: no such job {job_id}."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not conflate backend failures with “no such job”.

Returning "Error: no such job ..." on every False hides backend failure conditions (e.g., transport/DB failure paths that also return False), which can mislead operator retries and incident diagnosis.

🔧 Proposed fix (minimum in this layer)
-        return f"Canceled {job_id}." if ok else f"Error: no such job {job_id}."
+        return f"Canceled {job_id}." if ok else (
+            f"Error: cancel failed or no such job {job_id}."
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return f"Canceled {job_id}." if ok else f"Error: no such job {job_id}."
return f"Canceled {job_id}." if ok else (
f"Error: cancel failed or no such job {job_id}."
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/lg_tools.py` at line 478, The current return expression that yields
f"Canceled {job_id}." if ok else f"Error: no such job {job_id}." conflates “not
found” with any backend failure; update the surrounding function (the one
returning that ternary using job_id) to distinguish failure causes by inspecting
the cancel operation result or error payload (e.g., change the cancel call to
return (ok, err) or catch exceptions), then return "Canceled {job_id}." on
success, "Error: no such job {job_id}." only when the backend explicitly
indicates a missing job, and for other failures return a different message like
"Error cancelling {job_id}: {error_message}" and log the backend error for
operators. Ensure you reference and propagate the actual error from the
cancellation call instead of coercing all False results into “no such job.”

- docs/guides/scheduler.md: replace jargon "multiple ginas" with "multiple agents"
- scheduler/__init__.py: sort __all__ lexicographically (RUF022)
- scheduler/local.py: log.error → log.exception in _init_db to preserve traceback (TRY400)
- scheduler/workstacean.py: correct stale module docstring that claimed list_jobs()
  issues a list command — it returns [] unconditionally
- server.py: add -> None return annotations to _scheduler_startup/_scheduler_shutdown (ANN202)
- tests: add match= to two bare pytest.raises(ValueError) calls (PT011)
- tools/lg_tools.py: wrap blocking scheduler calls in asyncio.to_thread() to avoid
  blocking the event loop under concurrent load; fix cancel_schedule error message to
  not conflate transport/DB failures with "no such job"

Co-Authored-By: claude-code <claude@anthropic.com>

https://claude.ai/code/session_01JmFYJSYRMRndZ43g3AYW2q
Copy link
Copy Markdown
Contributor Author

Round-1 CodeRabbit review — addressed (commit 7d1da5e)

All 10 inline comments triaged. 9 fixed, 1 declined (with reason). 48 scheduler tests pass.


Fixed

# File Finding Fix
1 docs/guides/scheduler.md:11 "multiple ginas" jargon in bullet → "multiple agents"
2 scheduler/__init__.py:27 __all__ not sorted (Ruff RUF022) LocalScheduler moved before SchedulerBackend
3 scheduler/local.py:148 log.error loses traceback on schema init failure (TRY400) log.exception (drops redundant exc arg — log.exception captures it automatically)
4 scheduler/workstacean.py:15–18 Module docstring claimed list_jobs() "issues a list command and waits on schedule.list topic" — incorrect; the method returns [] unconditionally Docstring corrected to match implementation
5 server.py:852, 861 Missing -> None on _scheduler_startup / _scheduler_shutdown (ANN202) Annotations added
6 tests/test_scheduler_local.py:76 Bare pytest.raises(ValueError) too broad (PT011) `match="Invalid isoformat
7 tests/test_scheduler_workstacean.py:104 Same PT011 on test_malformed_schedule_rejected Same match pattern added
8 tools/lg_tools.py:433, 452, 475 add_job / list_jobs / cancel_job are sync blocking calls (sqlite3 + httpx.post) made directly from async tool functions — blocks the event loop under concurrent load Wrapped with await asyncio.to_thread(...)
9 tools/lg_tools.py:478 cancel_schedule returns "Error: no such job …" for any False result, including transport/DB failures from WorkstaceanScheduler.cancel_job Message changed to "Error: cancel failed or no such job …"

Declined — with reason

server.py:201_active_port port-timing bug

CodeRabbit's analysis: "_build_scheduler constructs LocalScheduler at import time using the stale module-level _active_port (default 7870)".

This is incorrect. _init_langgraph_agent() has exactly one call site_main() at line 822 — and _main() sets _active_port = args.port on line 814 before that call. The SCHEDULER_INVOKE_URL escape hatch therefore remains a deliberate override for reverse-proxied deployments, not a workaround for a default-port bug. No code change needed.


All other pre-merge check warnings (description template completeness, docstring coverage %) are out of scope for this code-correctness pass and should be addressed as separate follow-ups if the team decides they matter for the repo gate.


Generated by Claude Code

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scheduler/local.py`:
- Around line 141-149: The _init_db method currently swallows
sqlite3.DatabaseError after logging, which allows LocalScheduler to appear
healthy with a broken backend; modify the except block in _init_db to log the
exception (using log.exception as already present) and then re-raise the
exception so the caller can fail fast or disable scheduling; target the _init_db
method and its except sqlite3.DatabaseError handler to add a `raise` after the
log.exception call.
- Around line 244-251: The current _tick() always calls
_reschedule_or_delete(job, fired_at=now) in a finally block even when _fire(job)
failed; change _fire(job) to signal success (return True/False or raise on
non-2xx) and update _tick() to only call _reschedule_or_delete(...) when _fire()
succeeded (i.e., check the returned success flag or catch only specific
exceptions and skip reschedule/delete on failure), and ensure _fire() logs
errors but also propagates failure via its return/exception so transient
HTTP/network errors do not cause one-shot jobs to be removed or cron jobs to be
treated as delivered.

In `@server.py`:
- Around line 241-248: The current reload path always injects the existing
_scheduler into create_agent_graph which can be None on first-run and can keep
stale LocalScheduler credentials/namespace after config changes; instead
construct the backend/scheduler from new_config (not reusing the module-level
_scheduler), obtain its scheduler instance and pass that into
create_agent_graph(new_config, knowledge_store=new_store,
scheduler=backend.scheduler), and if the newly built backend/scheduler differs
from the running _scheduler, stop/cleanup the old _scheduler and
start/initialize the new one then assign _scheduler = backend.scheduler so
future reloads use the up-to-date scheduler; ensure you call the backend
stop/start lifecycle methods when swapping.
- Around line 187-189: LocalScheduler currently builds the API key env var from
the local `name` value (api_key_env = f"{name.upper()}_API_KEY") which diverges
from the server `/a2a` auth lookup that derives the env name from the shared
AGENT_NAME source; create/consume a single helper to derive the agent-name env
token (e.g., a function like get_agent_env_name() or reuse the existing
AGENT_NAME_ENV logic used by the server auth layer) and use it when constructing
the API key env var in LocalScheduler (replace the f"{name.upper()}_API_KEY"
usage), so both the scheduler and the `/a2a` auth code use the same
env-name-to-API-key mapping.

In `@tools/lg_tools.py`:
- Around line 487-505: list_available_tools currently calls
get_all_tools(knowledge_store=...) without passing the scheduler, so the UI
never sees schedule_task/list_schedules/cancel_schedule; update
list_available_tools (in graph/config_io.py) to pass the module-level scheduler
into get_all_tools (the same scheduler used when registering callbacks and as
passed into create_agent_graph) so available tools include scheduler-backed
tools — ensure you reference get_all_tools(knowledge_store, scheduler) and
thread the global scheduler through to that call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3dfe2755-043a-48d0-a90e-a384458aeb18

📥 Commits

Reviewing files that changed from the base of the PR and between 0609c67 and 7d1da5e.

📒 Files selected for processing (8)
  • docs/guides/scheduler.md
  • scheduler/__init__.py
  • scheduler/local.py
  • scheduler/workstacean.py
  • server.py
  • tests/test_scheduler_local.py
  • tests/test_scheduler_workstacean.py
  • tools/lg_tools.py

Comment thread scheduler/local.py
Comment on lines +141 to +149
def _init_db(self) -> None:
try:
db = self._connect()
db.executescript(_SCHEMA)
db.commit()
db.close()
except sqlite3.DatabaseError:
log.exception("[scheduler] schema init failed at %s", self.path)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Propagate schema-init failures instead of continuing with a broken backend.

If _init_db() hits a corrupt or unwritable sqlite file, LocalScheduler() still succeeds and server._build_scheduler() will advertise scheduling as enabled. After that, list_jobs() degrades to [] and cancel_job() to False, which is much harder to diagnose than failing fast here. Log the exception, then re-raise so startup can fall back or disable the scheduler cleanly.

Suggested fix
     def _init_db(self) -> None:
         try:
             db = self._connect()
             db.executescript(_SCHEMA)
             db.commit()
             db.close()
-        except sqlite3.DatabaseError:
+        except sqlite3.DatabaseError:
             log.exception("[scheduler] schema init failed at %s", self.path)
+            raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scheduler/local.py` around lines 141 - 149, The _init_db method currently
swallows sqlite3.DatabaseError after logging, which allows LocalScheduler to
appear healthy with a broken backend; modify the except block in _init_db to log
the exception (using log.exception as already present) and then re-raise the
exception so the caller can fail fast or disable scheduling; target the _init_db
method and its except sqlite3.DatabaseError handler to add a `raise` after the
log.exception call.

Comment thread scheduler/local.py
Comment thread server.py
Comment thread server.py Outdated
Comment on lines +241 to +248
# Re-use the running scheduler instance — tearing down the
# polling loop on every drawer save would orphan in-flight
# fires. Env-driven scheduler config (WORKSTACEAN_API_BASE,
# SCHEDULER_DISABLED) only takes effect on full restart;
# the YAML doesn't carry scheduler settings yet.
new_graph = create_agent_graph(
new_config, knowledge_store=new_store, scheduler=_scheduler,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t reuse a stale-or-missing scheduler across reloads.

This path always injects the existing _scheduler into the rebuilt graph. On the first-run wizard flow that instance is still None, so finishing setup produces a graph with no scheduler until the process restarts. After that, config reloads keep the old LocalScheduler credentials and agent namespace alive even when auth or identity changes. Rebuild the backend from new_config when reloading, and swap/start/stop it if the instance changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server.py` around lines 241 - 248, The current reload path always injects the
existing _scheduler into create_agent_graph which can be None on first-run and
can keep stale LocalScheduler credentials/namespace after config changes;
instead construct the backend/scheduler from new_config (not reusing the
module-level _scheduler), obtain its scheduler instance and pass that into
create_agent_graph(new_config, knowledge_store=new_store,
scheduler=backend.scheduler), and if the newly built backend/scheduler differs
from the running _scheduler, stop/cleanup the old _scheduler and
start/initialize the new one then assign _scheduler = backend.scheduler so
future reloads use the up-to-date scheduler; ensure you call the backend
stop/start lifecycle methods when swapping.

Comment thread tools/lg_tools.py
Real bugs:
- scheduler/local.py: _fire() now returns bool (True on 2xx, False
  on HTTP error or network exception). _tick() only reschedules /
  deletes when _fire() succeeds, so a transient failure leaves the
  job in place for the next tick to retry. Previously a one-shot
  job hit by a 5xx silently vanished.
- server.py: the API key env var name now uses AGENT_NAME_ENV.upper()
  to match the auth handler at L893. The previous code read
  agent_name() (which returns the wizard-set identity.name when set),
  so a wizard rename pointed the scheduler at <RENAMED>_API_KEY while
  the auth handler still expected <ENV>_API_KEY → self-invocation
  401'd silently after every wizard rename.
- server.py: reload path now constructs a scheduler when _scheduler
  is None (first-run case: server boots pre-setup, wizard finishes,
  drawer triggers reload — this is when we *first* construct the
  scheduler). Existing instances are still reused — drawer saves
  don't tear down the polling loop.

Surface:
- tools/lg_tools.py: exported SCHEDULER_TOOL_NAMES and MEMORY_TOOL_NAMES
  as module constants.
- graph/config_io.py::list_available_tools: now exposes scheduler +
  memory tool names to the wizard's checkbox group even when the
  runtime hasn't yet constructed the underlying backends. Otherwise
  the wizard would hide tools that the runtime registers as soon as
  the user finishes setup.

Declined:
- scheduler/local.py L141-149: CodeRabbit asked to re-raise
  sqlite3.DatabaseError from _init_db. The store is intentionally
  fail-soft (matches knowledge/store.py + audit.py): _resolve_db_path
  already falls back to ~/.protoagent/scheduler/ when /sandbox is
  unwritable, and re-raising would crash boot in exactly the scenario
  the fallback is designed to handle. The graceful degradation
  contract is "scheduler tools return errors when storage is broken,
  agent keeps serving everything else".

Tests:
- tests/test_scheduler_local.py: new test_fire_failure_leaves_job_in_place
  regression guard + test_fire_returns_bool contract test.
- tests/test_config_io.py: list_available_tools assertions now check
  for memory + scheduler tools and no duplicates.

86 scheduler-scope tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mabry1985
Copy link
Copy Markdown
Contributor Author

Round-2 CodeRabbit feedback addressed in 1a05278. 4 of 5 fixed; 1 deliberately declined with reasoning.

Real bugs

  • scheduler/local.py_fire() returns bool now; _tick() only reschedules/deletes on success. A 5xx HTTP response no longer silently consumes a one-shot job — it stays in place for the next tick to retry. New regression test test_fire_failure_leaves_job_in_place locks this in.
  • server.py — API key env var name now uses AGENT_NAME_ENV.upper() to match the auth handler at L893. Previous code read agent_name() (the wizard-set identity.name when set), so any wizard rename pointed the scheduler at <RENAMED>_API_KEY while the auth handler still expected <ENV>_API_KEY → self-invocation 401'd silently. Real but quiet bug.
  • server.py — reload path builds a scheduler when _scheduler is None. First-run path (server boots pre-setup → wizard finishes → drawer reload) now correctly constructs and starts the scheduler instead of running scheduler-less until the next process restart. Existing instances are still reused so drawer saves don't tear down the polling loop.

Surface fix

  • tools/lg_tools.py exports SCHEDULER_TOOL_NAMES and MEMORY_TOOL_NAMES constants.
  • graph/config_io.py::list_available_tools now exposes scheduler + memory tool names to the wizard checkbox group even when the runtime hasn't yet constructed the backends. Otherwise the wizard would hide tools that the runtime would register as soon as the user finishes setup.

Declined: scheduler/local.py:141-149 — re-raise DatabaseError from _init_db

The store is intentionally fail-soft, matching knowledge/store.py and audit.py: _resolve_path-equivalents already fall back to ~/.protoagent/scheduler/ when /sandbox/ is unwritable, and re-raising would crash boot in exactly the scenario the fallback is designed to handle. The graceful-degradation contract is "scheduler tools return error strings when storage is broken; agent keeps serving everything else." Re-raising would couple scheduler health to overall agent boot, which is a regression on that contract.

If a fork wants strict mode (fail boot when scheduler is broken), that's a one-line subclass override in their _build_scheduler. The default behaviour shouldn't be strict.


Scoped tests: 86 pass across test_scheduler_local.py / test_scheduler_workstacean.py / test_config_io.py. Same single pre-existing baseline failure (test_subscribe_endpoint_origin_rejected) reproduces on main and is unrelated.

🤖 Generated with Claude Code

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
server.py (1)

141-206: 🧹 Nitpick | 🔵 Trivial

Add return type annotation for _build_scheduler.

Static analysis flags a missing return type. Adding it improves clarity and IDE support.

💡 Proposed fix
-def _build_scheduler(config):
+def _build_scheduler(config) -> "LocalScheduler | WorkstaceanScheduler | None":

Or use a Protocol/union import:

+from scheduler.interface import SchedulerBackend
 ...
-def _build_scheduler(config):
+def _build_scheduler(config) -> SchedulerBackend | None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server.py` around lines 141 - 206, Add a return type annotation to the
_build_scheduler function (e.g. -> Optional[Scheduler] or -> Scheduler | None)
and import the corresponding typing symbols; update imports to include
typing.Optional (or use | None syntax if using Python 3.10+), and ensure a
Scheduler protocol/base type is imported or declared so WorkstaceanScheduler and
LocalScheduler satisfy it (reference symbols: _build_scheduler,
WorkstaceanScheduler, LocalScheduler, Scheduler).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scheduler/local.py`:
- Around line 219-229: The stop method currently swallows all exceptions when
awaiting self._task; update SchedulerLocal.stop (the async def stop) to only
suppress asyncio.CancelledError but catch and log any other Exception (include
the exception details via log.exception or log.error with exc_info) before
re-raising or continuing, so unexpected errors during await self._task are
recorded; alternatively use contextlib.suppress(asyncio.CancelledError) around
await self._task and then a separate except Exception block to log the
exception.

In `@tests/test_config_io.py`:
- Around line 332-341: Add an assertion to the test to ensure the scheduler tool
name "list_schedules" is present in the returned names: the test currently
checks for "schedule_task" and "cancel_schedule" but omits "list_schedules" from
the SCHEDULER_TOOL_NAMES surface; update the test (in the block that asserts
presence of memory and scheduler tools) to include assert "list_schedules" in
names so it matches the full scheduler tool set used by list_available_tools and
SCHEDULER_TOOL_NAMES.

In `@tests/test_scheduler_local.py`:
- Around line 75-77: The test test_malformed_raises uses a regex pattern string
with a bare pipe character; change the pytest.raises call in
test_malformed_raises to use a raw string for the match pattern (e.g., r"Invalid
isoformat|could not convert") so the `|` is treated as a regex alternation
clearly and static analysis warnings are removed; update the assertion that
wraps parse_iso_to_utc accordingly.
- Around line 99-107: Update the two tests (test_empty_prompt_rejected and
test_malformed_schedule_rejected) to assert the specific ValueError messages
from Scheduler.add_job by passing a match regex to pytest.raises so they only
pass for the intended error; for example, in test_empty_prompt_rejected use
pytest.raises(ValueError, match=r"(empty|prompt).*") and in
test_malformed_schedule_rejected use pytest.raises(ValueError,
match=r"(malformed|invalid).*schedule") to target the add_job failure cases
precisely.

---

Duplicate comments:
In `@server.py`:
- Around line 141-206: Add a return type annotation to the _build_scheduler
function (e.g. -> Optional[Scheduler] or -> Scheduler | None) and import the
corresponding typing symbols; update imports to include typing.Optional (or use
| None syntax if using Python 3.10+), and ensure a Scheduler protocol/base type
is imported or declared so WorkstaceanScheduler and LocalScheduler satisfy it
(reference symbols: _build_scheduler, WorkstaceanScheduler, LocalScheduler,
Scheduler).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dc63c0a5-1ef6-42b5-8604-31c8e9454c14

📥 Commits

Reviewing files that changed from the base of the PR and between 7d1da5e and 1a05278.

📒 Files selected for processing (6)
  • graph/config_io.py
  • scheduler/local.py
  • server.py
  • tests/test_config_io.py
  • tests/test_scheduler_local.py
  • tools/lg_tools.py

Comment thread scheduler/local.py
Comment thread tests/test_config_io.py
Comment thread tests/test_scheduler_local.py
Comment thread tests/test_scheduler_local.py
…h knowledge/memory)

Scheduler had asymmetric opt-out — only env-based (SCHEDULER_DISABLED=1).
The knowledge and memory subsystems already exposed YAML toggles
(middleware.knowledge, middleware.memory) so forks could flip them
through the drawer or wizard. Scheduler now matches:

- LangGraphConfig.scheduler_enabled: bool = True (default-on)
- from_yaml() reads middleware.scheduler
- config_to_dict() emits it for the drawer round-trip
- config/langgraph-config.yaml has middleware.scheduler: true
- server.py::_build_scheduler honors the YAML toggle first, env second

Both subsystems are now genuinely opt-out:

  middleware:
    knowledge: true       # was already so
    memory: true          # was already so
    scheduler: true       # NEW — was env-only
    audit: true

Drawer/wizard can flip any of them without restart (the existing
reload path already rebuilds on config change). The env opt-out
(SCHEDULER_DISABLED=1) stays as a runtime escape hatch for fleet
operators who can't edit YAML in the moment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/reference/configuration.md`:
- Line 75: The wording around the scheduler opt-out is ambiguous: update the
description for `scheduler` to clarify that setting `middleware.scheduler:
false` in YAML is equivalent in effect to the env var `SCHEDULER_DISABLED=1` but
that YAML is the canonical opt-out while the env var is a runtime escape hatch;
reference both `middleware.scheduler` and `SCHEDULER_DISABLED` in the sentence
and replace "Equivalent to setting `SCHEDULER_DISABLED=1`" with a clearer phrase
such as "Has the same effect as `SCHEDULER_DISABLED=1` (YAML
`middleware.scheduler: false` is the canonical opt-out; the env var is a runtime
escape hatch)."

In `@server.py`:
- Line 141: Add an explicit return type annotation to the _build_scheduler
function to indicate it returns either a SchedulerBackend or None; update the
signature of _build_scheduler to annotate its return as
Optional[SchedulerBackend] (or "Optional[SchedulerBackend]" as a string literal)
and import SchedulerBackend from scheduler.interface or use a forward-reference
string to avoid import cycles, and ensure typing.Optional is imported or
referenced as "Optional" accordingly.
- Around line 255-286: The reload path currently always reuses and passes the
global _scheduler into create_agent_graph, ignoring
new_config.scheduler_enabled; update the logic to check
new_config.scheduler_enabled first: if False and _scheduler is not None, shut
down/stop and dereference it (e.g. call _scheduler.stop() or appropriate
shutdown API inside a try/except and then set _scheduler = None) so no scheduler
is passed to create_agent_graph, and if True only then build via
_build_scheduler when _scheduler is None and start it as before; ensure
create_agent_graph receives None for scheduler when the toggle is disabled.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6e50a89a-3154-47e9-aba1-610cd09b71f5

📥 Commits

Reviewing files that changed from the base of the PR and between 1a05278 and 8b79fa8.

📒 Files selected for processing (6)
  • config/langgraph-config.yaml
  • docs/guides/scheduler.md
  • docs/reference/configuration.md
  • graph/config.py
  • graph/config_io.py
  • server.py

Comment thread docs/reference/configuration.md Outdated
Comment thread server.py Outdated
Comment thread server.py Outdated
Real bugs:
- scheduler/local.py: stop() now suppresses only asyncio.CancelledError
  (the expected outcome of cancelling our own task) and logs any
  other exception via log.exception. Previously every exception
  was silently swallowed, so a polling-loop crash during shutdown
  would vanish without a trace.
- server.py: reload path now honors the new middleware.scheduler
  toggle. Three states:
  - flipped OFF (was on)  → stop + drop the running scheduler;
    new graph builds with scheduler=None.
  - flipped ON (was off / first run) → construct + start.
  - unchanged              → reuse the running instance.
  Helpers _start_scheduler_async / _stop_scheduler_async fire
  start()/stop() onto the active loop without forcing the entire
  reload chain to become async.

Type / nits:
- server.py: added `-> "SchedulerBackend | None"` return type to
  _build_scheduler, with a TYPE_CHECKING import to avoid runtime
  cycles.
- tests/test_scheduler_local.py: raw-string regex for `|`
  alternation (test_malformed_raises); added match= to the two
  bare ValueError tests (test_empty_prompt_rejected,
  test_malformed_schedule_rejected) so they only pass for the
  intended error message.
- tests/test_config_io.py: assert list_schedules in names alongside
  schedule_task / cancel_schedule.
- docs/reference/configuration.md: clarified the scheduler opt-out
  description — middleware.scheduler is canonical, SCHEDULER_DISABLED
  is a runtime escape hatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mabry1985
Copy link
Copy Markdown
Contributor Author

Rounds 3 + 4 of CodeRabbit feedback addressed in 7ce3f10. Combined because they came in back-to-back while I was shipping the opt-out symmetry; together they raised 7 actionable items (5 unique, 2 duplicates).

Real bugs

  • scheduler/local.py::stop() — now suppresses only asyncio.CancelledError (the expected outcome of cancelling our own task) and logs any other exception via log.exception. Previously every exception was silently swallowed, so a polling-loop crash during shutdown would vanish without a trace.
  • server.py reload path — now honors the new middleware.scheduler YAML toggle. Three states handled: flipped OFF (was on) → stop + drop; flipped ON (was off, or first-run) → build + start; unchanged → reuse. New helpers _start_scheduler_async / _stop_scheduler_async fire start() / stop() onto the active loop without forcing the entire reload chain to become async.

Type / nits

  • server.py::_build_scheduler — added -> "SchedulerBackend | None" return annotation with a TYPE_CHECKING import to avoid runtime cycles. (Round 3 + 4 both flagged this.)
  • tests/test_scheduler_local.py — raw-string regex for | alternation; match= added to the two bare ValueError tests so they only pass for the intended error message.
  • tests/test_config_io.pylist_schedules now asserted alongside schedule_task / cancel_schedule.
  • docs/reference/configuration.md — reworded the scheduler opt-out description to make middleware.scheduler: false the canonical path and SCHEDULER_DISABLED=1 an explicit runtime escape hatch.

86 scoped tests pass. Same single pre-existing baseline failure on test_subscribe_endpoint_origin_rejected reproduces on main.

🤖 Generated with Claude Code

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
server.py (1)

294-318: ⚠️ Potential issue | 🟠 Major

Make the scheduler swap atomic, and don’t reuse it blindly across config edits.

This block stops/starts and overwrites _scheduler before create_agent_graph(...) succeeds, so a failed reload can still tear down the old poller or start a new one. It also reuses any existing scheduler whenever the toggle stays on, which keeps stale LocalScheduler identity/auth state alive after wizard edits. Build a candidate backend from new_config, compile the graph with that candidate, then swap/start/stop only after the rebuild succeeds.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server.py` around lines 294 - 318, The reload currently mutates global
_scheduler and starts/stops it before create_agent_graph(new_config, ...)
succeeds; instead, build a candidate scheduler from new_config (e.g. call
_build_scheduler(new_config) into a local variable like candidate_scheduler) and
attempt to construct new_graph using that candidate_scheduler and new_store;
only after create_agent_graph completes successfully swap the global _scheduler:
if the toggle is off and _scheduler exists call
_stop_scheduler_async(_scheduler) and set to None, if toggle is on use
candidate_scheduler (start it with _start_scheduler_async(candidate_scheduler)
if it’s new) and assign it to _scheduler, and ensure any previously running
scheduler is stopped only after new_graph is valid to avoid tearing down the old
poller on failed reloads and to avoid reusing stale LocalScheduler identity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/reference/configuration.md`:
- Around line 91-100: Update the first paragraph to clarify that backend
selection and runtime knobs for the bundled scheduler are driven by environment
variables, but the overall enable/disable of scheduler tools is controlled by
the YAML setting middleware.scheduler; specifically mention middleware.scheduler
as the YAML toggle and keep the rest of the table for env-driven settings like
WORKSTACEAN_API_BASE, SCHEDULER_DB_DIR, SCHEDULER_INVOKE_URL, etc., so readers
understand env vars pick the backend/behavior while middleware.scheduler in YAML
can turn the scheduler on or off.

In `@scheduler/local.py`:
- Around line 50-64: The function _resolve_db_path currently trusts agent_name
and allows path traversal or absolute paths; validate/sanitize agent_name before
using it to build the base path: ensure agent_name is not absolute and does not
contain path separators or parent segments (e.g., check not
os.path.isabs(agent_name) and that Path(agent_name).parts contains only a single
name without '..' or separators), then use the sanitized single-name value (or
replace invalid chars with a safe fallback like a slug/underscore) when
constructing base = Path(...)/agent_name; if validation fails, either raise a
ValueError or fall back to a safe canonical name to preserve per-agent
isolation.
- Around line 361-369: The scheduler currently embeds scheduler_job_id and
scheduler_kind inside the nested "message.metadata", but a2a_handler._a2a_rpc
expects custom metadata on the top-level "params.metadata"; move the metadata
dict from inside the "message" to "params.metadata" so it looks like params: {
message: { ... }, metadata: {"scheduler_job_id": job.id, "scheduler_kind":
"local"}, messageId: message_id, ... }; update the construction around
job.prompt and message_id (the code building "params" in scheduler/local.py) to
place scheduler metadata at params.metadata instead of message.metadata so the
a2a RPC handler receives it.

In `@tests/test_scheduler_local.py`:
- Around line 169-179: The test test_cron_rescheduled_after_fire is
non-deterministic because it uses datetime.now(UTC); change it to use a fixed
fired_at value so the expected next_fire can be asserted exactly: create the
scheduler with _make_scheduler, add the cron job via s.add_job("hi", "0 9 * *
*", job_id="cron"), capture original_next from job.next_fire, call
s._reschedule_or_delete(job, fired_at=<fixed UTC datetime>) and then assert that
s.list_jobs()[0].next_fire equals the deterministic next slot computed for that
fixed fired_at (and also assert last_fire is populated); reference
_reschedule_or_delete, job.next_fire, and s.add_job to locate where to modify
the test.

---

Duplicate comments:
In `@server.py`:
- Around line 294-318: The reload currently mutates global _scheduler and
starts/stops it before create_agent_graph(new_config, ...) succeeds; instead,
build a candidate scheduler from new_config (e.g. call
_build_scheduler(new_config) into a local variable like candidate_scheduler) and
attempt to construct new_graph using that candidate_scheduler and new_store;
only after create_agent_graph completes successfully swap the global _scheduler:
if the toggle is off and _scheduler exists call
_stop_scheduler_async(_scheduler) and set to None, if toggle is on use
candidate_scheduler (start it with _start_scheduler_async(candidate_scheduler)
if it’s new) and assign it to _scheduler, and ensure any previously running
scheduler is stopped only after new_graph is valid to avoid tearing down the old
poller on failed reloads and to avoid reusing stale LocalScheduler identity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b6294e66-0316-41cc-975c-0bec55d8ee49

📥 Commits

Reviewing files that changed from the base of the PR and between 8b79fa8 and 7ce3f10.

📒 Files selected for processing (5)
  • docs/reference/configuration.md
  • scheduler/local.py
  • server.py
  • tests/test_config_io.py
  • tests/test_scheduler_local.py

Comment thread docs/reference/configuration.md Outdated
Comment thread scheduler/local.py
Comment thread scheduler/local.py Outdated
Comment thread tests/test_scheduler_local.py Outdated
Real bugs:
- scheduler/local.py::_fire(): metadata moved from
  params.message.metadata to params.metadata. The A2A handler
  reads custom metadata from params.metadata
  (a2a_handler.py L1244 — `msg_metadata = params.get("metadata")`),
  so the previous nesting silently dropped the
  scheduler_job_id / scheduler_kind breadcrumb. Observers now get
  it as intended.
- server.py reload path: scheduler swap is now planned before the
  graph rebuild and only committed after rebuild succeeds. A
  failed graph rebuild used to leave the scheduler torn down or a
  fresh one already started, dis-aligning runtime state. The new
  ordering: build candidate, rebuild graph (rollback-safe on
  failure), commit graph + scheduler atomically.
- scheduler/local.py: _resolve_db_path now sanitizes agent_name
  via a new _safe_segment() helper. Strips path separators, ``..``,
  and absolute-path prefixes; falls back to "default" when nothing
  usable remains. Defence in depth — the value comes from operator-
  controlled env / YAML, but a typo or copy-paste shouldn't be able
  to put a sqlite file outside the configured scheduler dir.

Tests:
- tests/test_scheduler_local.py::test_cron_rescheduled_after_fire:
  pinned to a fixed fired_at timestamp so the assertion is exact
  (next_fire == "2026-04-29T09:00:00+00:00") instead of a
  "different from original" near-tautology that depends on
  datetime.now().

Docs:
- docs/reference/configuration.md: clarified that the scheduler's
  enable/disable lives in YAML (middleware.scheduler), while
  backend selection and runtime knobs are env-driven. Repositioned
  SCHEDULER_DISABLED as the runtime escape hatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mabry1985
Copy link
Copy Markdown
Contributor Author

Round-5 CodeRabbit feedback addressed in 3098908. All 4 actionable items applied — 3 real bugs + 1 test-determinism fix + 1 doc clarification.

Real bugs

  • scheduler/local.py::_fire() — metadata location moved from params.message.metadata to params.metadata. Verified against a2a_handler.py L1244 (msg_metadata = params.get("metadata")) — the previous nesting silently dropped the scheduler_job_id / scheduler_kind breadcrumb. Observers now actually receive it.
  • server.py reload path — scheduler swap is now planned before the graph rebuild and only committed after rebuild succeeds. A failed create_agent_graph previously left the scheduler torn down (or a fresh one already started), dis-aligning runtime state. New ordering: build candidate → rebuild graph (rollback-safe on failure) → commit _graph + _scheduler together.
  • scheduler/local.py_resolve_db_path now sanitizes agent_name via _safe_segment(). Strips path separators, .., and absolute prefixes; falls back to "default" when nothing usable remains. Defence in depth — the value comes from operator-controlled env / YAML, but a typo or copy-paste shouldn't put a sqlite file outside the configured scheduler dir.

Test determinism

  • test_cron_rescheduled_after_fire now pins fired_at to 2026-04-28T10:00:00Z so the assertion is exact (next_fire == "2026-04-29T09:00:00+00:00"). Previous "different from original" assertion depended on datetime.now() and could pass spuriously.

Doc clarity

  • docs/reference/configuration.md — scheduler section now explicitly says enable/disable is YAML (middleware.scheduler), backend selection and runtime knobs are env-driven; SCHEDULER_DISABLED repositioned as the runtime escape hatch.

86 scoped tests pass. Same single pre-existing baseline failure on test_subscribe_endpoint_origin_rejected reproduces on main.

🤖 Generated with Claude Code

@mabry1985 mabry1985 merged commit ec86865 into main Apr 28, 2026
1 check passed
@mabry1985 mabry1985 deleted the feat/agent-scheduler branch April 28, 2026 03:38
mabry1985 added a commit that referenced this pull request Apr 28, 2026
…uler

docs: sync after KB store + scheduler PRs (#155 / #156)
mabry1985 added a commit to protoLabsAI/protoWorkstacean that referenced this pull request Apr 28, 2026
#509)

* feat(channels): native A2A delivery channel for scheduled fires

Adds `payload.channel: "a2a"` as a first-class scheduler delivery
target. When set, A2ADeliveryPlugin looks up
`workspace/a2a.yaml` → `targets[payload.agent_name]` and POSTs a
JSON-RPC `message/send` to the configured URL with bearer / X-API-Key
auth from config and `metadata: {scheduler_job_id, channel,
agent_name}` so observers can distinguish scheduler-driven turns.

The router short-circuits in `_handleCron` when `channel === "a2a"`
so the local skill resolver never sees these firings — `a2a` is a
delivery channel, not a reply channel.

Closes #507. Companion to protoLabsAI/protoAgent#156.

- workspace/a2a.yaml.example (env-var expansion supported)
- lib/plugins/a2a-delivery.ts + 7 unit tests covering shape,
  auth-header forwarding, env-var expansion, channel gating,
  missing target, missing agent_name, missing config file
- docs/reference/scheduler.md — new "A2A delivery channel" section
- docs/guides/a2a-scheduler-bridge.md — end-to-end how-to
- docs/reference/config-files.md — list a2a.yaml in user-overlay table

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* address PR review (no mocks, unsubscribe, content validation)

- Replace fetch-mock test helper with a real in-process Bun.serve HTTP
  sink. Honors the repo's no-mocks-in-tests rule; injectable fetch on
  the plugin is removed (greenfield: only existed for the mock).
- Plugin now captures and stores its subscription id, and uninstall()
  unsubscribes before nulling the bus. Adds a regression test that
  asserts no deliveries fire after teardown.
- Validate payload.content is a non-empty string before constructing
  JSON-RPC parts; missing/non-string/whitespace drops with a loud
  error. Adds a regression test covering the three invalid shapes.
- Wrap env-var setup in try/finally so cleanup survives assertion
  failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix tsc + address PR review (empty-url guard, fetch timeout)

- Drop import { Server } from "bun" — Server is generic in this Bun
  version, infer type from ReturnType<typeof startSink> instead.
  This was the CI tsc failure on the previous push.
- Fail fast when target.url expands to empty (e.g. unset env var)
  instead of letting fetch throw mid-call. Adds regression test.
- Bound outbound fetch with AbortSignal.timeout(15s) so a hung
  remote agent can't pin the handler indefinitely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* address PR review (yaml type guard, json-rpc error body)

- expandEnv() now type-guards on string before .replace(); a YAML
  config like `url: 1234` or `api_key: true` no longer throws inside
  the handler. Non-string targets fall through the empty-url guard
  and drop with a loud error. Adds regression test.
- A JSON-RPC 2.0 server can return HTTP 200 with an `error` body;
  parse the response and treat that as a delivery failure rather
  than logging success. Adds regression test using the sink's
  responseBody hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mabry1985 added a commit to protoLabsAI/protoWorkstacean that referenced this pull request Apr 28, 2026
* feat(audit): add bus topic audit script — Swagger-for-the-bus markdown index

Static AST scan of every bus.publish() and bus.subscribe() call site in
src/ + lib/, resolved to topic strings, grouped by domain prefix, and
dumped as a single browsable reference at docs/reference/bus-topics.md.

## What it does

- Walks all .ts files under src/ + lib/ (excluding tests, .d.ts, dist).
- Finds every CallExpression on a bus-like receiver where the method is
  `publish` or `subscribe`.
- Resolves the topic argument:
  - String literal "foo.bar"             → "foo.bar"
  - TOPICS.NAME / *_TOPICS.NAME           → looked up against the
                                            constants in
                                            src/event-bus/all-topics.ts
                                            → real value
  - Template literal `linear.reply.${id}` → pattern "linear.reply.{id}"
  - "a." + b string concatenation         → pattern "a.{b}"
  - Anything else                         → flagged unresolved
- Pulls payload-type bindings from src/event-bus/payloads.ts JSDoc
  (convention: "Payload for `topic.name`").
- Renders a markdown reference with per-topic publishers, subscribers,
  declaration status, payload type, and JSDoc.

## Output today

Run on dev tip:
- 110 distinct topics
-  18 declared in TOPICS (the ones in all-topics.ts)
-  92 raw-string / template topics NOT in TOPICS (audit signal — these
     are candidates to formalize into the registry)
-  25 topics that couldn't be statically resolved (computed at runtime)
- 142 publish call sites
-  85 subscribe call sites

## Not yet (deliberately)

- No CI gating — `--allow-unresolved` is the default. Leaving CI green
  while the registry catches up; flip to fail-on-unresolved later.
- No dashboard UI — markdown is enough for the audit case the operator
  asked about. Wiring into the existing event-viewer/:8080 dashboard is a
  follow-up if the markdown view proves insufficient.
- No payload-shape rendering beyond the type name. Could expand to dump
  the Zod / TS interface fields, but the type name + clickable file:line
  to the source is enough to navigate.

## Usage

  bun run audit:bus       # regenerate docs/reference/bus-topics.md

Tests: 1080/1080 pass (no regression — the script doesn't touch runtime
paths). Type-check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(tests): drop project-schema mock from onboarding tests (closes #480)

## Root cause

`tests/onboarding.test.ts` mocked `../lib/project-schema.ts` via
`mock.module(...)`. Bun's `mock.module()` replacements persist for the
entire test process — when `tests/project-schema.test.ts` ran in the same
`bun test` invocation (file-discovery order varies), it picked up the
mocked `validateProjectEntry` instead of the real schema-backed one.

The mock returned `{ ok: true, errors: [] }` — missing the `entry` field
that the real success branch carries. So the 12 tests in
project-schema.test.ts that destructure `result.entry.*` would crash
with `TypeError: undefined is not an object` — the exact failure pattern
the audit flagged across PRs #468, #478, #479, #486 in CI.

Failure shape was deterministic in CI (file-load order + memory pressure
made the leak repro consistently) and never reproduced locally (same
order ran fine on a beefier dev box). Doesn't matter — the mock leak is
the cause; the cure is to stop mocking.

## Fix

- Removed `mock.module("../lib/project-schema.ts", ...)` and the
  `validateProjectEntry` field from the shared `_mocks` state.
- Removed the per-test `_mocks.validateProjectEntry = mock(() => ({ok:
  true, errors: []}))` setups that paired with the now-gone module mock.
- Replaced the "fatal step error (projects.yaml schema fail)" test with
  a payload that genuinely fails the real schema (`discord.dev: 12345 as
  unknown as string` — a number satisfies neither branch of
  `ProjectDiscordChannelSchema`'s union(string, object)). Same pipeline
  branch is exercised; just driven by real validation now.
- Added a comment block documenting the mock-leak hazard so a future
  maintainer doesn't reintroduce it.

The other two `mock.module()` calls in this file (`github-auth`,
`google`) are kept — they're not the source of the flake (no other test
file imports them with assertions on the return shape) and ripping the
network calls out is genuine isolation.

## Verification

- Repro: `bun test tests/` 5 consecutive runs — all 1080 pass, 0 fail.
- Full suite + tsc clean.
- The "fatal" test still asserts the same `step: "projects_yaml"`
  failure reply — same pipeline path, real-schema-driven instead of
  mock-driven.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(a2a): stream progress events to inbound A2A clients (#472 Gap 1)

External A2A clients calling workstacean's `/a2a` endpoint with
`message/stream` previously saw `submitted → working → silence →
completed`. The card advertises `streaming: true` but the bus contract
between the agent executor and BusAgentExecutor was single-shot — there
was no way for an executor mid-task to push intermediate updates.

This wires that path end-to-end through a new opt-in bus topic family.

## Topic contract

```
agent.skill.progress.{correlationId}
```

Payload (`AgentSkillProgressPayload`, all fields optional):
- text:    visible message body in the streamed status update
- percent: 0-100
- step:    named phase
- meta:    arbitrary structured progress

`BusAgentExecutor` subscribes to this topic on every inbound A2A call
alongside the existing reply topic. Each event becomes an A2A
`status-update` with `state: "working"` and `final: false`. `text`
becomes the streamed message body; `percent` / `step` / `meta` go into
`status.metadata` for clients that render progress affordances.

## Opt-in semantics

Skill executors that don't publish to this topic lose nothing — the
`submitted → working → completed` lifecycle continues to work unchanged.
Only executors that explicitly publish progress events (e.g. a
multi-step researcher, a deep agent that wants to surface tool calls)
opt in.

## Late-event safety

Progress events arriving after the terminal reply are silently dropped.
Both `agent.skill.progress.*` and `agent.skill.response.*` subscriptions
are torn down together on terminal / cancel — no leaked subscriptions
across overlapping task ids.

## What this PR contains

- `src/event-bus/all-topics.ts` — new `AGENT_SKILL_PROGRESS_PREFIX` constant in `ACTION_TOPICS`
- `src/event-bus/payloads.ts` — new `AgentSkillProgressPayload` interface bound by JSDoc convention
- `src/api/a2a-server.ts` — progress subscriber wired into `BusAgentExecutor.execute()`, with cleanup on terminal + cancel
- `src/api/__tests__/a2a-server.test.ts` — 2 new tests:
  - 3 progress events with mixed shapes (text-only, percent+step, full meta) all stream as working updates with the right body / metadata
  - late progress event arriving after terminal is dropped
- `docs/guides/a2a-streaming.md` — new section covering the inbound direction (the existing doc was outbound-only)
- `docs/reference/bus-topics.md` — auto-regenerated by `bun run audit:bus`

## Tests

1082/1082 pass (+2 net new). Type-check clean.

## Not in this PR (deferred)

- **Wiring DeepAgentExecutor as a publisher** — needs a thoughtful design
  for WHEN to emit (per token? per tool call? per planning phase?). The
  receiver shipping first means executors can opt in incrementally
  without a coordinated change.
- **Push-notification persistence (#472 Gap 2)** — separate task in the
  issue scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(a2a): persistent push-notification store via SQLite (#472 Gap 2)

The SDK's `InMemoryPushNotificationStore` lost every registered webhook
callback when the container restarted. With watchtower auto-pulling the
`:dev` tag multiple times per day, an A2A client that called
`tasks/pushNotificationConfig/set` once and relied on push callbacks
would silently stop receiving them on the next container roll.

This replaces it with `SqlitePushNotificationStore`, persisted to
`${DATA_DIR}/push-notifications.db`. In-flight tasks survive redeploys.

## Shape

- One SQLite database (`bun:sqlite`, WAL journal, same pattern as
  TelemetryService and the event log).
- `(task_id, config_id)` primary key. The SDK supports multiple configs
  per task via `config.id`; configs without an id fall back to "" so
  the contract holds for callers that don't set one.
- Every row carries `expires_at` (24h TTL by default). Filtered from
  `load()` AND opportunistically GC'd on `save()` so the table stays
  bounded without a separate sweeper.
- Cold-start purge runs on `_init()` so a process that's been off for
  days doesn't surface stale configs on its first load.

## Failure modes

DB open failure (read-only mount, permissions, etc.) logs loudly and
degrades to behaving like an empty in-memory store — service stays up,
push notifications just don't fire. No crash on the A2A endpoint.

## Wiring

- New `dataDir?: string` on `ApiContext` so the a2a-server can construct
  the store path. `src/index.ts` populates it from the existing `dataDir`
  variable.
- `src/api/a2a-server.ts` chooses `SqlitePushNotificationStore` when
  `ctx.dataDir` is set, else falls back to `InMemoryPushNotificationStore`
  (kept on the import list so tests + ephemeral setups still work).

## Tests (12 new)

- save/load round-trip
- multiple configs per task keyed by config.id
- save with same (taskId, configId) overwrites
- delete with configId removes only that config
- delete without configId removes all configs for the task
- cross-restart persistence (close + reopen on the same db file)
- expired configs filtered from load() before GC runs
- expired configs GC'd on next save()
- ttlMs=0 means no expiry
- cold-start purge clears stale rows before first load
- degraded mode (init failure) → all ops are silent no-ops returning empty

## Suite

1092/1092 pass. Type-check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(pr-remediator): tighten diagnose_pr_stuck prompt to bias verdicts toward "rebasable"

PR #492 (push-notification store) was correctly small but raced against
PR #491 which landed first and touched the same `a2a-server.ts` and
`docs/guides/a2a-streaming.md` anchors. The diagnose_pr_stuck verdict
was "decomposable" — that triggered Quinn's auto-close + split-proposal
even though there was nothing to decompose; the right recovery was a
single back-merge.

The chokepoint guard for promotion PRs (#465) wasn't tripped because
this wasn't a promotion PR. The prompt's verdict definitions were
under-specified — "Conflicts cluster in a small number of specific
files. Splitting into smaller PRs would avoid the conflict" literally
described the situation.

## Prompt changes

- "rebasable" gets concrete examples that match recent friction:
  doc-section anchor races, both-sides-added-list-entries, lockfile
  drift, single-symbol renames the other side touched. Explicit
  threshold: prefer rebasable when the PR touches < 15 files OR when
  the conflicts cluster in docs / lockfiles / a single source file.
- "decomposable" tightened to require BOTH a large PR (multiple
  distinct subsystems) AND the conflicts genuinely splitting into
  independent landable groups. Explicit anti-pattern: "a small PR
  conflicting with a recent equally small PR is rebasable, not
  decomposable."
- New explicit precedence rule appended: "redundant → rebasable →
  decomposable → genuine. Default to rebasable…"
- The GitHub API tool guidance now asks Ava to count files in the diff
  and report the count in the evidence string. Counts < 15 should
  almost always land on rebasable.

## Why the prompt fix and not a runtime guard

The destructive-verdict guard at the chokepoint already exists for
promotion PRs (lib/plugins/pr-remediator.ts:1586). Extending it with a
"trivial-conflict bypass" was the alternative — both axes ended up
documented in the audit conversation. The prompt fix is the upstream
correction (prevent the bad call), the chokepoint guard is the
defense-in-depth (catch it if the prompt drifts). Filing the second
axis as a follow-up issue if the prompt fix doesn't hold.

## Tests

1094/1094 pass (the dispatch test asserts skill + agent + correlation
id, not prompt body — safe to ship). Type-check clean.

## Cross-reference

Friction context: PR #492 closed by Quinn, re-cut as #493 (merged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(google): wire GooglePlugin into pluginRegistry + ship OAuth helper script

GooglePlugin (Drive / Docs / Calendar / Gmail, ~915 LOC) shipped as a
self-contained module a while back but was never registered in
src/index.ts — its createDriveFolder utility ran during onboarding step
4 by direct import, but the OAuth refresher, Gmail polling, Calendar
polling, Docs handler, and Drive outbound handler were all dormant.

This wires the plugin into the registry, gated on the full OAuth2
credential triple (CLIENT_ID + CLIENT_SECRET + REFRESH_TOKEN).
Behavior is unchanged on environments missing any of those — the
plugin's own install() also gates on the same triple and logs+skips,
so the registry condition is purely a "don't pay the import cost" gate.

## To turn it on

1. Generate a refresh token via the new helper:
   bun run google:auth
2. Push the printed token to Infisical:
   infisical secrets set GOOGLE_REFRESH_TOKEN='1//...' --env=prod
3. Add compose passthroughs in homelab-iac/stacks/ai/docker-compose.yml
   to map GOOGLE_WORKSPACE_CLIENT_ID/_SECRET → GOOGLE_CLIENT_ID/_SECRET
   and pass GOOGLE_REFRESH_TOKEN. (Separate cross-repo PR; this PR is
   protoWorkstacean-only.)

## scripts/get-google-refresh-token.ts

One-shot OAuth2 helper. Reads CLIENT_ID + CLIENT_SECRET from env, spins
up a one-shot HTTP server on http://127.0.0.1:8765, walks the consent
flow, exchanges the code for a refresh token. Prints the token to
stderr (separate from stdout to avoid file-redirect capture).

Scopes requested cover all four GooglePlugin services: Drive, Docs,
Calendar, Gmail (modify). Refresh token is per-user, per-client; running
the script again rotates it.

NPM script: `bun run google:auth`.

## Tests

1094/1094 pass. Type-check clean. The plugin gate means no behavior
change in CI / dev where the credential triple isn't populated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(google): wire Gmail into channels.yaml + bus topics with reply path

Until this PR, GooglePlugin's Gmail handler published a single bus topic
(`message.inbound.google.gmail`, no per-message suffix) with no
`source.channelId` and no `reply.topic` — meaning RouterPlugin had no way
to route Gmail to a specific agent and nothing on the bus could reply
back to a Gmail thread. The Discord/GitHub-style "tag agent in
{channel}" flow simply didn't exist for email.

This PR closes the loop end-to-end.

## Inbound

- New topic shape: `message.inbound.google.gmail.{labelSlug}.{threadId}`
  - labelSlug = lowercase, non-alphanumeric → "-" (so `INBOX` → `inbox`,
    `Personal/Work` → `personal-work`)
  - ChannelRegistry.findByTopic matches against the labelSlug segment
- `source.channelId = labelSlug`, `source.userId = From` header
- `reply.topic = google.gmail.reply.{threadId}` with `format: "markdown"`
- Payload now carries `messageId`, `threadId`, `label`, `labelSlug`,
  `from`, `to`, `subject`, `date`, `body`, `content` (mirror of body for
  generic chat agents that read `content`)
- Per-thread context cache (To/Subject/In-Reply-To/References) populated
  on inbound publish for the outbound subscriber to consume

## Outbound (NEW)

`createGmailOutbound()` subscribes to `google.gmail.reply.#`. On each
event:
1. Resolves the original thread context — payload override > in-process
   cache > one-shot Gmail thread re-fetch (handles cold cache after
   restart)
2. Builds an RFC-822 reply with proper `To` / `Subject` (auto-prefixed
   `Re:`) / `In-Reply-To` / `References` headers so the message threads
   correctly in Gmail
3. POSTs to `users.messages.send` with the original threadId

Result: Ava's reply to a Gmail message appears in the same conversation
in the recipient's inbox.

## ChannelPlatformSchema

`google` added to the enum (yaml-schemas.ts) and the `ChannelPlatform`
+ `source.interface` unions (lib/types/channels.ts, lib/types.ts).

## ChannelRegistry

- New `byGoogleGmailLabel = Map<labelSlug, Channel>` index
- New `findByTopic` branch for `message.inbound.google.gmail.{slug}.{*}`
- New static helper `gmailLabelSlug(label)` exposes the same
  normalization callers use

## workspace/channels.yaml

Added a `google-gmail-inbox` entry routing the `INBOX` label to Ava with
a 30-min conversation timeout (email replies can take a while):

  - id: google-gmail-inbox
    platform: google
    channelId: "INBOX"
    agent: ava

## Tests

1094/1094 pass. Type-check clean.

## What lands when this ships + the deployed `:dev` watchtower-pulls

Inbound emails hitting INBOX → Gmail poller publishes
`message.inbound.google.gmail.inbox.{threadId}` → RouterPlugin matches
the `google-gmail-inbox` channel → dispatches `agent.skill.request{
skill: chat, targets: [ava], reply.topic: google.gmail.reply.{threadId}
}` → Ava processes → her response publishes on
`google.gmail.reply.{threadId}` → `createGmailOutbound` posts a properly
threaded reply via Gmail send. Email-as-chat works end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(workspace): enable Gmail INBOX poll + primary Calendar in google.yaml

Activates the now-fully-wired Google Workspace integration (#495 +
homelab-iac#13 + #496) on the standard inbox/calendar surface. Hot-reloaded
by GooglePlugin; no restart needed.

- gmail.watchLabels: ["INBOX"] — every email landing in the user's inbox
  routes to Ava via the google-gmail-inbox channel (channels.yaml).
- gmail.pollIntervalMinutes: 5 — unchanged.
- calendar.orgCalendarId: "primary" — the authenticated user's calendar
  (no separate calendar id needed for the single-operator setup).
- calendar.pollIntervalMinutes: 15 — bumped from 60 so events surface
  inside a typical work-block.

routingRules left empty — every inbound mail becomes a chat skill,
which is what the channels.yaml entry expects. Specific labels →
specific skills can be added later without changing this default path.

* chore(google): disable Gmail/Calendar polling + Gmail auto-reply

Reverses the auto-flow shipped in #496 + #497. The intended pattern is
Ava as a pull-mode reader/drafter — she reads / summarizes / drafts
replies when explicitly asked, via Gmail tools that create DRAFTS for
human review and send. Auto-polling the inbox and auto-replying on every
inbound was the wrong direction.

## Changes

- workspace/google.yaml — watchLabels: [] (no Gmail polling),
  orgCalendarId: "" (no Calendar polling). Header comment documents
  intent so the next operator doesn't accidentally re-enable.

- workspace/channels.yaml — drop the `google-gmail-inbox` channel route
  that was sending every inbound INBOX message to Ava's chat skill.
  Replaced with a commented-out template so the registry shape is
  documented for a future deliberate route.

- lib/plugins/google.ts — comment out `this.gmailOutbound.start(bus)`
  in install(). The subscriber implementation in google/gmail.ts stays
  intact so the path can be re-wired when an explicit Ava-tool
  "send-this-draft" flow is built. No subscription = no auto-reply,
  even if some other code accidentally publishes to
  `google.gmail.reply.{threadId}`.

## What's intact

- GooglePlugin still installs, OAuth tokens still refresh, Drive + Docs
  outbound subscribers still listen
- Gmail handler code still works — calling `start()` with watchLabels
  populated would resume polling identically to before
- Channel registry still recognizes `platform: google` entries when
  someone wants to re-enable label-scoped routing in the future

## Next move (NOT in this PR)

Add Gmail tools to Ava's toolkit so she can read/search/summarize/draft
on demand. Drafts created via the Gmail API land in the Drafts folder
for the human to review and send manually.

## Tests

1094/1094 pass. Type-check clean.

## Live container note

The bind-mounted workspace/google.yaml on the deployed container will
hot-reload to the new (empty) values within 1s of this PR landing,
which stops the active poller mid-cycle. Discord-style auto-routing of
inbound emails to Ava ceases immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ava): pull-mode Gmail/Calendar tools (read + draft only) (#499)

Adds six new Ava tools that read Gmail/Calendar on demand and create
draft replies that land in the Drafts folder for human send. No
auto-poll, no auto-reply.

Endpoints (src/api/google.ts):
  GET  /api/google/gmail/list-unread
  GET  /api/google/gmail/search
  GET  /api/google/gmail/thread/:threadId
  POST /api/google/gmail/draft         (uses users.drafts.create — no send)
  GET  /api/google/calendar/upcoming
  GET  /api/google/calendar/event/:eventId

Tools (deep-agent-executor):
  gmail_list_unread, gmail_search, gmail_get_thread, gmail_create_draft,
  calendar_list_upcoming, calendar_event_detail

Drafts auto-resolve To/Subject/In-Reply-To/References from threadId so
the draft threads correctly in Gmail's UI. Returns sent: false sentinel.

All endpoints share a withToken() helper that 503s with a clear message
when GOOGLE_CLIENT_ID/SECRET/REFRESH_TOKEN aren't set.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ava): linear board read + filing tools (#502)

Adds six Ava tools for on-demand Linear interaction. Reads are unrestricted;
writes (issue create / comment) only fire when explicitly asked.

Endpoints (src/api/linear.ts):
  GET  /api/linear/teams
  GET  /api/linear/issues?team=&state=&label=&assignee=me&max=
  GET  /api/linear/issues/search?q=&max=
  GET  /api/linear/issues/:idOrKey
  POST /api/linear/issues
  POST /api/linear/issues/:id/comment

Tools (deep-agent-executor):
  linear_list_teams, linear_list_issues, linear_search_issues,
  linear_get_issue, linear_create_issue, linear_add_comment

Extends LinearClient with listTeams/listIssues/searchIssues/getIssue —
keeps the SDK-coupling in one place and shares cache + auth with the
existing outbound write path. Read endpoints use a withClient() helper
that 503s with a clear message when LINEAR_API_KEY is unset.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(storage): pass captured `now` into _purgeExpired to fix save() race (#504) (#506)

When ttlMs is small, Date.now() can advance between the INSERT and the
opportunistic purge inside save() — purging the row that was just
inserted (`expires_at = now + ttlMs <= Date.now()`). The flake surfaced
on PR #503 with ttlMs=1 in the test:

  await store.save("task-old", ...)   // expires_at = T0 + 1
  setTimeout 5ms
  await store.save("task-new", ...)   // expires_at = T1 + 1; purge uses T1+δ
  // task-new could be purged in the same call → size==0 instead of 1

Fix: `_purgeExpired(cutoff?)` takes the captured `now` from save(), so
no row inserted in this call can also be deleted in this call.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Promote dev → main (Ava Linear tools) (#503) (#505)

* feat(audit): add bus topic audit script — Swagger-for-the-bus markdown index

Static AST scan of every bus.publish() and bus.subscribe() call site in
src/ + lib/, resolved to topic strings, grouped by domain prefix, and
dumped as a single browsable reference at docs/reference/bus-topics.md.

## What it does

- Walks all .ts files under src/ + lib/ (excluding tests, .d.ts, dist).
- Finds every CallExpression on a bus-like receiver where the method is
  `publish` or `subscribe`.
- Resolves the topic argument:
  - String literal "foo.bar"             → "foo.bar"
  - TOPICS.NAME / *_TOPICS.NAME           → looked up against the
                                            constants in
                                            src/event-bus/all-topics.ts
                                            → real value
  - Template literal `linear.reply.${id}` → pattern "linear.reply.{id}"
  - "a." + b string concatenation         → pattern "a.{b}"
  - Anything else                         → flagged unresolved
- Pulls payload-type bindings from src/event-bus/payloads.ts JSDoc
  (convention: "Payload for `topic.name`").
- Renders a markdown reference with per-topic publishers, subscribers,
  declaration status, payload type, and JSDoc.

## Output today

Run on dev tip:
- 110 distinct topics
-  18 declared in TOPICS (the ones in all-topics.ts)
-  92 raw-string / template topics NOT in TOPICS (audit signal — these
     are candidates to formalize into the registry)
-  25 topics that couldn't be statically resolved (computed at runtime)
- 142 publish call sites
-  85 subscribe call sites

## Not yet (deliberately)

- No CI gating — `--allow-unresolved` is the default. Leaving CI green
  while the registry catches up; flip to fail-on-unresolved later.
- No dashboard UI — markdown is enough for the audit case the operator
  asked about. Wiring into the existing event-viewer/:8080 dashboard is a
  follow-up if the markdown view proves insufficient.
- No payload-shape rendering beyond the type name. Could expand to dump
  the Zod / TS interface fields, but the type name + clickable file:line
  to the source is enough to navigate.

## Usage

  bun run audit:bus       # regenerate docs/reference/bus-topics.md

Tests: 1080/1080 pass (no regression — the script doesn't touch runtime
paths). Type-check clean.



* fix(tests): drop project-schema mock from onboarding tests (closes #480)

## Root cause

`tests/onboarding.test.ts` mocked `../lib/project-schema.ts` via
`mock.module(...)`. Bun's `mock.module()` replacements persist for the
entire test process — when `tests/project-schema.test.ts` ran in the same
`bun test` invocation (file-discovery order varies), it picked up the
mocked `validateProjectEntry` instead of the real schema-backed one.

The mock returned `{ ok: true, errors: [] }` — missing the `entry` field
that the real success branch carries. So the 12 tests in
project-schema.test.ts that destructure `result.entry.*` would crash
with `TypeError: undefined is not an object` — the exact failure pattern
the audit flagged across PRs #468, #478, #479, #486 in CI.

Failure shape was deterministic in CI (file-load order + memory pressure
made the leak repro consistently) and never reproduced locally (same
order ran fine on a beefier dev box). Doesn't matter — the mock leak is
the cause; the cure is to stop mocking.

## Fix

- Removed `mock.module("../lib/project-schema.ts", ...)` and the
  `validateProjectEntry` field from the shared `_mocks` state.
- Removed the per-test `_mocks.validateProjectEntry = mock(() => ({ok:
  true, errors: []}))` setups that paired with the now-gone module mock.
- Replaced the "fatal step error (projects.yaml schema fail)" test with
  a payload that genuinely fails the real schema (`discord.dev: 12345 as
  unknown as string` — a number satisfies neither branch of
  `ProjectDiscordChannelSchema`'s union(string, object)). Same pipeline
  branch is exercised; just driven by real validation now.
- Added a comment block documenting the mock-leak hazard so a future
  maintainer doesn't reintroduce it.

The other two `mock.module()` calls in this file (`github-auth`,
`google`) are kept — they're not the source of the flake (no other test
file imports them with assertions on the return shape) and ripping the
network calls out is genuine isolation.

## Verification

- Repro: `bun test tests/` 5 consecutive runs — all 1080 pass, 0 fail.
- Full suite + tsc clean.
- The "fatal" test still asserts the same `step: "projects_yaml"`
  failure reply — same pipeline path, real-schema-driven instead of
  mock-driven.



* feat(a2a): stream progress events to inbound A2A clients (#472 Gap 1)

External A2A clients calling workstacean's `/a2a` endpoint with
`message/stream` previously saw `submitted → working → silence →
completed`. The card advertises `streaming: true` but the bus contract
between the agent executor and BusAgentExecutor was single-shot — there
was no way for an executor mid-task to push intermediate updates.

This wires that path end-to-end through a new opt-in bus topic family.

## Topic contract

```
agent.skill.progress.{correlationId}
```

Payload (`AgentSkillProgressPayload`, all fields optional):
- text:    visible message body in the streamed status update
- percent: 0-100
- step:    named phase
- meta:    arbitrary structured progress

`BusAgentExecutor` subscribes to this topic on every inbound A2A call
alongside the existing reply topic. Each event becomes an A2A
`status-update` with `state: "working"` and `final: false`. `text`
becomes the streamed message body; `percent` / `step` / `meta` go into
`status.metadata` for clients that render progress affordances.

## Opt-in semantics

Skill executors that don't publish to this topic lose nothing — the
`submitted → working → completed` lifecycle continues to work unchanged.
Only executors that explicitly publish progress events (e.g. a
multi-step researcher, a deep agent that wants to surface tool calls)
opt in.

## Late-event safety

Progress events arriving after the terminal reply are silently dropped.
Both `agent.skill.progress.*` and `agent.skill.response.*` subscriptions
are torn down together on terminal / cancel — no leaked subscriptions
across overlapping task ids.

## What this PR contains

- `src/event-bus/all-topics.ts` — new `AGENT_SKILL_PROGRESS_PREFIX` constant in `ACTION_TOPICS`
- `src/event-bus/payloads.ts` — new `AgentSkillProgressPayload` interface bound by JSDoc convention
- `src/api/a2a-server.ts` — progress subscriber wired into `BusAgentExecutor.execute()`, with cleanup on terminal + cancel
- `src/api/__tests__/a2a-server.test.ts` — 2 new tests:
  - 3 progress events with mixed shapes (text-only, percent+step, full meta) all stream as working updates with the right body / metadata
  - late progress event arriving after terminal is dropped
- `docs/guides/a2a-streaming.md` — new section covering the inbound direction (the existing doc was outbound-only)
- `docs/reference/bus-topics.md` — auto-regenerated by `bun run audit:bus`

## Tests

1082/1082 pass (+2 net new). Type-check clean.

## Not in this PR (deferred)

- **Wiring DeepAgentExecutor as a publisher** — needs a thoughtful design
  for WHEN to emit (per token? per tool call? per planning phase?). The
  receiver shipping first means executors can opt in incrementally
  without a coordinated change.
- **Push-notification persistence (#472 Gap 2)** — separate task in the
  issue scope.



* feat(a2a): persistent push-notification store via SQLite (#472 Gap 2)

The SDK's `InMemoryPushNotificationStore` lost every registered webhook
callback when the container restarted. With watchtower auto-pulling the
`:dev` tag multiple times per day, an A2A client that called
`tasks/pushNotificationConfig/set` once and relied on push callbacks
would silently stop receiving them on the next container roll.

This replaces it with `SqlitePushNotificationStore`, persisted to
`${DATA_DIR}/push-notifications.db`. In-flight tasks survive redeploys.

## Shape

- One SQLite database (`bun:sqlite`, WAL journal, same pattern as
  TelemetryService and the event log).
- `(task_id, config_id)` primary key. The SDK supports multiple configs
  per task via `config.id`; configs without an id fall back to "" so
  the contract holds for callers that don't set one.
- Every row carries `expires_at` (24h TTL by default). Filtered from
  `load()` AND opportunistically GC'd on `save()` so the table stays
  bounded without a separate sweeper.
- Cold-start purge runs on `_init()` so a process that's been off for
  days doesn't surface stale configs on its first load.

## Failure modes

DB open failure (read-only mount, permissions, etc.) logs loudly and
degrades to behaving like an empty in-memory store — service stays up,
push notifications just don't fire. No crash on the A2A endpoint.

## Wiring

- New `dataDir?: string` on `ApiContext` so the a2a-server can construct
  the store path. `src/index.ts` populates it from the existing `dataDir`
  variable.
- `src/api/a2a-server.ts` chooses `SqlitePushNotificationStore` when
  `ctx.dataDir` is set, else falls back to `InMemoryPushNotificationStore`
  (kept on the import list so tests + ephemeral setups still work).

## Tests (12 new)

- save/load round-trip
- multiple configs per task keyed by config.id
- save with same (taskId, configId) overwrites
- delete with configId removes only that config
- delete without configId removes all configs for the task
- cross-restart persistence (close + reopen on the same db file)
- expired configs filtered from load() before GC runs
- expired configs GC'd on next save()
- ttlMs=0 means no expiry
- cold-start purge clears stale rows before first load
- degraded mode (init failure) → all ops are silent no-ops returning empty

## Suite

1092/1092 pass. Type-check clean.



* fix(pr-remediator): tighten diagnose_pr_stuck prompt to bias verdicts toward "rebasable"

PR #492 (push-notification store) was correctly small but raced against
PR #491 which landed first and touched the same `a2a-server.ts` and
`docs/guides/a2a-streaming.md` anchors. The diagnose_pr_stuck verdict
was "decomposable" — that triggered Quinn's auto-close + split-proposal
even though there was nothing to decompose; the right recovery was a
single back-merge.

The chokepoint guard for promotion PRs (#465) wasn't tripped because
this wasn't a promotion PR. The prompt's verdict definitions were
under-specified — "Conflicts cluster in a small number of specific
files. Splitting into smaller PRs would avoid the conflict" literally
described the situation.

## Prompt changes

- "rebasable" gets concrete examples that match recent friction:
  doc-section anchor races, both-sides-added-list-entries, lockfile
  drift, single-symbol renames the other side touched. Explicit
  threshold: prefer rebasable when the PR touches < 15 files OR when
  the conflicts cluster in docs / lockfiles / a single source file.
- "decomposable" tightened to require BOTH a large PR (multiple
  distinct subsystems) AND the conflicts genuinely splitting into
  independent landable groups. Explicit anti-pattern: "a small PR
  conflicting with a recent equally small PR is rebasable, not
  decomposable."
- New explicit precedence rule appended: "redundant → rebasable →
  decomposable → genuine. Default to rebasable…"
- The GitHub API tool guidance now asks Ava to count files in the diff
  and report the count in the evidence string. Counts < 15 should
  almost always land on rebasable.

## Why the prompt fix and not a runtime guard

The destructive-verdict guard at the chokepoint already exists for
promotion PRs (lib/plugins/pr-remediator.ts:1586). Extending it with a
"trivial-conflict bypass" was the alternative — both axes ended up
documented in the audit conversation. The prompt fix is the upstream
correction (prevent the bad call), the chokepoint guard is the
defense-in-depth (catch it if the prompt drifts). Filing the second
axis as a follow-up issue if the prompt fix doesn't hold.

## Tests

1094/1094 pass (the dispatch test asserts skill + agent + correlation
id, not prompt body — safe to ship). Type-check clean.

## Cross-reference

Friction context: PR #492 closed by Quinn, re-cut as #493 (merged).



* feat(google): wire GooglePlugin into pluginRegistry + ship OAuth helper script

GooglePlugin (Drive / Docs / Calendar / Gmail, ~915 LOC) shipped as a
self-contained module a while back but was never registered in
src/index.ts — its createDriveFolder utility ran during onboarding step
4 by direct import, but the OAuth refresher, Gmail polling, Calendar
polling, Docs handler, and Drive outbound handler were all dormant.

This wires the plugin into the registry, gated on the full OAuth2
credential triple (CLIENT_ID + CLIENT_SECRET + REFRESH_TOKEN).
Behavior is unchanged on environments missing any of those — the
plugin's own install() also gates on the same triple and logs+skips,
so the registry condition is purely a "don't pay the import cost" gate.

## To turn it on

1. Generate a refresh token via the new helper:
   bun run google:auth
2. Push the printed token to Infisical:
   infisical secrets set GOOGLE_REFRESH_TOKEN='1//...' --env=prod
3. Add compose passthroughs in homelab-iac/stacks/ai/docker-compose.yml
   to map GOOGLE_WORKSPACE_CLIENT_ID/_SECRET → GOOGLE_CLIENT_ID/_SECRET
   and pass GOOGLE_REFRESH_TOKEN. (Separate cross-repo PR; this PR is
   protoWorkstacean-only.)

## scripts/get-google-refresh-token.ts

One-shot OAuth2 helper. Reads CLIENT_ID + CLIENT_SECRET from env, spins
up a one-shot HTTP server on http://127.0.0.1:8765, walks the consent
flow, exchanges the code for a refresh token. Prints the token to
stderr (separate from stdout to avoid file-redirect capture).

Scopes requested cover all four GooglePlugin services: Drive, Docs,
Calendar, Gmail (modify). Refresh token is per-user, per-client; running
the script again rotates it.

NPM script: `bun run google:auth`.

## Tests

1094/1094 pass. Type-check clean. The plugin gate means no behavior
change in CI / dev where the credential triple isn't populated.



* feat(google): wire Gmail into channels.yaml + bus topics with reply path

Until this PR, GooglePlugin's Gmail handler published a single bus topic
(`message.inbound.google.gmail`, no per-message suffix) with no
`source.channelId` and no `reply.topic` — meaning RouterPlugin had no way
to route Gmail to a specific agent and nothing on the bus could reply
back to a Gmail thread. The Discord/GitHub-style "tag agent in
{channel}" flow simply didn't exist for email.

This PR closes the loop end-to-end.

## Inbound

- New topic shape: `message.inbound.google.gmail.{labelSlug}.{threadId}`
  - labelSlug = lowercase, non-alphanumeric → "-" (so `INBOX` → `inbox`,
    `Personal/Work` → `personal-work`)
  - ChannelRegistry.findByTopic matches against the labelSlug segment
- `source.channelId = labelSlug`, `source.userId = From` header
- `reply.topic = google.gmail.reply.{threadId}` with `format: "markdown"`
- Payload now carries `messageId`, `threadId`, `label`, `labelSlug`,
  `from`, `to`, `subject`, `date`, `body`, `content` (mirror of body for
  generic chat agents that read `content`)
- Per-thread context cache (To/Subject/In-Reply-To/References) populated
  on inbound publish for the outbound subscriber to consume

## Outbound (NEW)

`createGmailOutbound()` subscribes to `google.gmail.reply.#`. On each
event:
1. Resolves the original thread context — payload override > in-process
   cache > one-shot Gmail thread re-fetch (handles cold cache after
   restart)
2. Builds an RFC-822 reply with proper `To` / `Subject` (auto-prefixed
   `Re:`) / `In-Reply-To` / `References` headers so the message threads
   correctly in Gmail
3. POSTs to `users.messages.send` with the original threadId

Result: Ava's reply to a Gmail message appears in the same conversation
in the recipient's inbox.

## ChannelPlatformSchema

`google` added to the enum (yaml-schemas.ts) and the `ChannelPlatform`
+ `source.interface` unions (lib/types/channels.ts, lib/types.ts).

## ChannelRegistry

- New `byGoogleGmailLabel = Map<labelSlug, Channel>` index
- New `findByTopic` branch for `message.inbound.google.gmail.{slug}.{*}`
- New static helper `gmailLabelSlug(label)` exposes the same
  normalization callers use

## workspace/channels.yaml

Added a `google-gmail-inbox` entry routing the `INBOX` label to Ava with
a 30-min conversation timeout (email replies can take a while):

  - id: google-gmail-inbox
    platform: google
    channelId: "INBOX"
    agent: ava

## Tests

1094/1094 pass. Type-check clean.

## What lands when this ships + the deployed `:dev` watchtower-pulls

Inbound emails hitting INBOX → Gmail poller publishes
`message.inbound.google.gmail.inbox.{threadId}` → RouterPlugin matches
the `google-gmail-inbox` channel → dispatches `agent.skill.request{
skill: chat, targets: [ava], reply.topic: google.gmail.reply.{threadId}
}` → Ava processes → her response publishes on
`google.gmail.reply.{threadId}` → `createGmailOutbound` posts a properly
threaded reply via Gmail send. Email-as-chat works end-to-end.



* chore(workspace): enable Gmail INBOX poll + primary Calendar in google.yaml

Activates the now-fully-wired Google Workspace integration (#495 +
homelab-iac#13 + #496) on the standard inbox/calendar surface. Hot-reloaded
by GooglePlugin; no restart needed.

- gmail.watchLabels: ["INBOX"] — every email landing in the user's inbox
  routes to Ava via the google-gmail-inbox channel (channels.yaml).
- gmail.pollIntervalMinutes: 5 — unchanged.
- calendar.orgCalendarId: "primary" — the authenticated user's calendar
  (no separate calendar id needed for the single-operator setup).
- calendar.pollIntervalMinutes: 15 — bumped from 60 so events surface
  inside a typical work-block.

routingRules left empty — every inbound mail becomes a chat skill,
which is what the channels.yaml entry expects. Specific labels →
specific skills can be added later without changing this default path.

* chore(google): disable Gmail/Calendar polling + Gmail auto-reply

Reverses the auto-flow shipped in #496 + #497. The intended pattern is
Ava as a pull-mode reader/drafter — she reads / summarizes / drafts
replies when explicitly asked, via Gmail tools that create DRAFTS for
human review and send. Auto-polling the inbox and auto-replying on every
inbound was the wrong direction.

## Changes

- workspace/google.yaml — watchLabels: [] (no Gmail polling),
  orgCalendarId: "" (no Calendar polling). Header comment documents
  intent so the next operator doesn't accidentally re-enable.

- workspace/channels.yaml — drop the `google-gmail-inbox` channel route
  that was sending every inbound INBOX message to Ava's chat skill.
  Replaced with a commented-out template so the registry shape is
  documented for a future deliberate route.

- lib/plugins/google.ts — comment out `this.gmailOutbound.start(bus)`
  in install(). The subscriber implementation in google/gmail.ts stays
  intact so the path can be re-wired when an explicit Ava-tool
  "send-this-draft" flow is built. No subscription = no auto-reply,
  even if some other code accidentally publishes to
  `google.gmail.reply.{threadId}`.

## What's intact

- GooglePlugin still installs, OAuth tokens still refresh, Drive + Docs
  outbound subscribers still listen
- Gmail handler code still works — calling `start()` with watchLabels
  populated would resume polling identically to before
- Channel registry still recognizes `platform: google` entries when
  someone wants to re-enable label-scoped routing in the future

## Next move (NOT in this PR)

Add Gmail tools to Ava's toolkit so she can read/search/summarize/draft
on demand. Drafts created via the Gmail API land in the Drafts folder
for the human to review and send manually.

## Tests

1094/1094 pass. Type-check clean.

## Live container note

The bind-mounted workspace/google.yaml on the deployed container will
hot-reload to the new (empty) values within 1s of this PR landing,
which stops the active poller mid-cycle. Discord-style auto-routing of
inbound emails to Ava ceases immediately.



* feat(ava): pull-mode Gmail/Calendar tools (read + draft only) (#499)

Adds six new Ava tools that read Gmail/Calendar on demand and create
draft replies that land in the Drafts folder for human send. No
auto-poll, no auto-reply.

Endpoints (src/api/google.ts):
  GET  /api/google/gmail/list-unread
  GET  /api/google/gmail/search
  GET  /api/google/gmail/thread/:threadId
  POST /api/google/gmail/draft         (uses users.drafts.create — no send)
  GET  /api/google/calendar/upcoming
  GET  /api/google/calendar/event/:eventId

Tools (deep-agent-executor):
  gmail_list_unread, gmail_search, gmail_get_thread, gmail_create_draft,
  calendar_list_upcoming, calendar_event_detail

Drafts auto-resolve To/Subject/In-Reply-To/References from threadId so
the draft threads correctly in Gmail's UI. Returns sent: false sentinel.

All endpoints share a withToken() helper that 503s with a clear message
when GOOGLE_CLIENT_ID/SECRET/REFRESH_TOKEN aren't set.




* feat(ava): linear board read + filing tools (#502)

Adds six Ava tools for on-demand Linear interaction. Reads are unrestricted;
writes (issue create / comment) only fire when explicitly asked.

Endpoints (src/api/linear.ts):
  GET  /api/linear/teams
  GET  /api/linear/issues?team=&state=&label=&assignee=me&max=
  GET  /api/linear/issues/search?q=&max=
  GET  /api/linear/issues/:idOrKey
  POST /api/linear/issues
  POST /api/linear/issues/:id/comment

Tools (deep-agent-executor):
  linear_list_teams, linear_list_issues, linear_search_issues,
  linear_get_issue, linear_create_issue, linear_add_comment

Extends LinearClient with listTeams/listIssues/searchIssues/getIssue —
keeps the SDK-coupling in one place and shares cache + auth with the
existing outbound write path. Read endpoints use a withClient() helper
that 503s with a clear message when LINEAR_API_KEY is unset.




---------

Co-authored-by: Josh Mabry <31560031+mabry1985@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(storage): inject clock into push store to deterministically test TTL (#508)

Replaces real-time sleeps in TTL tests with a virtual clock injected
via constructor option. The previous tests relied on `setTimeout(5)`
between sub-ms TTL boundaries, which raced with `Date.now()` advancing
and produced intermittent failures on busy CI runners (#504).

The clock is also threaded through `save()`, `load()`, and
`_purgeExpired()` so a single mocked time source covers all expiry
boundaries. Production callers fall back to `Date.now` — no behavior
change.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(channels): native A2A delivery channel for scheduled fires (#507) (#509)

* feat(channels): native A2A delivery channel for scheduled fires

Adds `payload.channel: "a2a"` as a first-class scheduler delivery
target. When set, A2ADeliveryPlugin looks up
`workspace/a2a.yaml` → `targets[payload.agent_name]` and POSTs a
JSON-RPC `message/send` to the configured URL with bearer / X-API-Key
auth from config and `metadata: {scheduler_job_id, channel,
agent_name}` so observers can distinguish scheduler-driven turns.

The router short-circuits in `_handleCron` when `channel === "a2a"`
so the local skill resolver never sees these firings — `a2a` is a
delivery channel, not a reply channel.

Closes #507. Companion to protoLabsAI/protoAgent#156.

- workspace/a2a.yaml.example (env-var expansion supported)
- lib/plugins/a2a-delivery.ts + 7 unit tests covering shape,
  auth-header forwarding, env-var expansion, channel gating,
  missing target, missing agent_name, missing config file
- docs/reference/scheduler.md — new "A2A delivery channel" section
- docs/guides/a2a-scheduler-bridge.md — end-to-end how-to
- docs/reference/config-files.md — list a2a.yaml in user-overlay table

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* address PR review (no mocks, unsubscribe, content validation)

- Replace fetch-mock test helper with a real in-process Bun.serve HTTP
  sink. Honors the repo's no-mocks-in-tests rule; injectable fetch on
  the plugin is removed (greenfield: only existed for the mock).
- Plugin now captures and stores its subscription id, and uninstall()
  unsubscribes before nulling the bus. Adds a regression test that
  asserts no deliveries fire after teardown.
- Validate payload.content is a non-empty string before constructing
  JSON-RPC parts; missing/non-string/whitespace drops with a loud
  error. Adds a regression test covering the three invalid shapes.
- Wrap env-var setup in try/finally so cleanup survives assertion
  failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix tsc + address PR review (empty-url guard, fetch timeout)

- Drop import { Server } from "bun" — Server is generic in this Bun
  version, infer type from ReturnType<typeof startSink> instead.
  This was the CI tsc failure on the previous push.
- Fail fast when target.url expands to empty (e.g. unset env var)
  instead of letting fetch throw mid-call. Adds regression test.
- Bound outbound fetch with AbortSignal.timeout(15s) so a hung
  remote agent can't pin the handler indefinitely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* address PR review (yaml type guard, json-rpc error body)

- expandEnv() now type-guards on string before .replace(); a YAML
  config like `url: 1234` or `api_key: true` no longer throws inside
  the handler. Non-string targets fall through the empty-url guard
  and drop with a loud error. Adds regression test.
- A JSON-RPC 2.0 server can return HTTP 200 with an `error` body;
  parse the response and treat that as a delivery failure rather
  than logging success. Adds regression test using the sink's
  responseBody hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(pr-remediator): live CI re-check before stuck-PR HITL escalation (#510)

The escalation builds its alert from `latestPrData`, the cached
`pr_pipeline` snapshot. Snapshots refresh every ~2min, so a PR whose
CI was failing at the last poll can have already flipped green by the
time the auto-remediator exhausts its budget — the cached `ciStatus:
"fail"` then drives a bogus alert. Hit this on PR #509: the snapshot
caught the red window between two pushes; CI was green by escalation
time.

Add a live re-check at escalation time (fix_ci kind only — conflict
escalations carry `conflictDetails` and are not CI-driven). On
`pass` / `pending` / `none`, suppress the alert and clear the
in-flight entry so the loop can re-decide on the next world-state
tick. `null` (unknown — no GH creds, network error) falls through to
escalate based on the cached snapshot — never silently swallow.

Also include the head SHA (short prefix) in the alert body so the
recipient can reconcile the alert against the *current* commit.

Same chokepoint-invariant pattern as #437/#444/#459/#465: an invariant
("live state still warrants escalation") lives at the verdict-handler
site, with a clear `null`-fallthrough to the existing path.

The fetcher is injectable via constructor option so tests stay
deterministic without hitting api.github.com. Production keeps the
zero-arg call site unchanged. Mirrors the `_fetchLiveCiStatus`
aggregation logic in `src/api/github.ts` `handleGetPrPipeline` so
cached and live answers stay consistent.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): bump to v0.7.22

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants