Skip to content

fix: auto-reset telegram bridge session after lock failure#862

Open
Junior00619 wants to merge 3 commits intoNVIDIA:mainfrom
Junior00619:fix/auto-reset-session-on-lock-failure
Open

fix: auto-reset telegram bridge session after lock failure#862
Junior00619 wants to merge 3 commits intoNVIDIA:mainfrom
Junior00619:fix/auto-reset-session-on-lock-failure

Conversation

@Junior00619
Copy link
Contributor

@Junior00619 Junior00619 commented Mar 25, 2026

Fixes #833

Problem

A session-file lock failure could leave the Telegram bridge reusing a corrupted session on later messages. In practice this caused stale tool-call context and invalid history entries to bleed into subsequent turns, producing erratic replies.

Root Cause

runAgentInSandbox returned only a response string, so the caller had no access to exit status or stderr. That made session-lock failures indistinguishable from normal completions, and the corresponding entry in activeSessions was left intact.

Fix

Changed runAgentInSandbox to return structured process results:

{ response, exitCode, stderr }

Added isSessionLockFailure(...) to identify the specific class of failures that should invalidate session state. Detection uses both content and exit status:

  • explicit lock indicators in stderr or response text
  • exit code 255 only when paired with lock/corruption signals

This avoids resetting sessions for unrelated failures that can also return 255, such as SSH connection or auth errors.

On a detected lock failure, the poll handler now:

  • deletes the session from activeSessions
  • logs the reset
  • asks the user to resend so the next turn runs with a clean session

Unrelated error handling is unchanged.

Also gated the entrypoint with require.main === module to make the file safely loadable in tests.

Tests

Added 13 regression tests covering detection behavior, false-positive avoidance, edge cases, and poll-handler integration.

Summary by CodeRabbit

  • Bug Fixes

    • Improved session recovery: corrupted sessions are auto-rotated, users are prompted to resend when a session lock is detected, and the /reset command now rotates the session suffix for a fresh session.
  • Reliability

    • Startup no longer triggers exits during imports; environment checks and external initialization are deferred. Agent runs now return structured results (response, exit status, error output) so failures are detected before forwarding.
  • Tests

    • Added tests verifying session-lock detection, automatic session rotation, and the user-notification flow.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Bridge startup is lazy-initialized; agent sandbox calls return structured results { response, exitCode, stderr }. The bridge detects session-lock failures, rotates per-chat session IDs (including on /reset), notifies users to resend, and prevents forwarding corrupted agent output.

Changes

Cohort / File(s) Summary
Telegram bridge core
scripts/telegram-bridge.js
Introduced lazy init() and require.main === module guard; added crypto usage, environment init, per-chat getSessionId(chatId) and rotateSession(chatId); runAgentInSandbox now returns { response, exitCode, stderr }; added isSessionLockFailure(result); polling flow rotates sessions on lock, notifies user to resend, and no longer forwards corrupted output.
Tests — session reset
test/telegram-bridge-session-reset.test.js
New Vitest suites validating isSessionLockFailure behavior across result shapes and asserting source-level wiring: runAgentInSandbox(..., getSessionId(chatId)), lock branch calls rotateSession(chatId), and users are sent a "session reset / resend" notification.

Sequence Diagram

sequenceDiagram
    participant User as User (Telegram)
    participant Bridge as Telegram Bridge
    participant Sandbox as Agent Sandbox
    participant Session as Session Store

    User->>Bridge: Send message
    Bridge->>Session: getSessionId(chatId)
    Bridge->>Sandbox: runAgentInSandbox(message, sessionId)
    Sandbox-->>Bridge: { response, exitCode, stderr }
    Bridge->>Bridge: isSessionLockFailure(result)?
    alt not locked
        Bridge-->>User: Send result.response
    else locked
        Bridge->>Session: rotateSession(chatId)
        Bridge-->>User: Notify "session reset, please resend"
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I sniffed a locked file and twitched my nose,
spun a fresh session where the old one froze.
If a lock comes knocking, I swap the key —
resend your message, and fresh replies you'll see. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: detecting session-file lock failures and automatically resetting the session.
Linked Issues check ✅ Passed The PR fully addresses all coding requirements from #833: detects lock/corruption failures via exit code and stderr, rotates session UUID on lock failure, distinguishes lock failures from unrelated errors, and prevents corrupted history persistence.
Out of Scope Changes check ✅ Passed All changes are in scope: lazy initialization and require.main guard improve module testability, runAgentInSandbox return value change enables lock detection, session UUID rotation replaces history deletion, and test coverage validates the new behavior.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/telegram-bridge.js`:
- Around line 166-170: The isSessionLockFailure function currently treats any
response text containing "session file locked" as a lock failure even when the
process succeeded; change the logic so that checks against result.response are
only performed when the run actually failed (e.g., result.exitCode !== 0), while
continuing to always check result.stderr regardless of exitCode; update the
first branch that tests output.includes("session file locked") to only consider
result.response when exitCode indicates failure and use stderr unconditionally,
keeping the existing second branch (result.exitCode === 255 &&
/lock|session.*corrupt/i.test(output)) as-is.
- Around line 224-233: The session-reset branch only deletes the local
activeSessions entry but does not change the session id used by
runAgentInSandbox, so retries reopen the same remote session; update the reset
logic in the isSessionLockFailure handling to rotate the per-chat session id
stored in activeSessions (e.g., generate a new unique session id and set
activeSessions.set(chatId, newId)) before sending the user the reset message, or
alternatively call the helper that deletes the remote session state used by
runAgentInSandbox/SSH (remove remote session file) so the next
runAgentInSandbox(msg.text, chatId) call will use a fresh session; apply the
same fix in the /reset path to ensure activeSessions is actually changed or
remote state cleared.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8d794e28-ec62-4a02-969b-dce09d0ac93d

📥 Commits

Reviewing files that changed from the base of the PR and between ba824a6 and 2e1db39.

📒 Files selected for processing (2)
  • scripts/telegram-bridge.js
  • test/telegram-bridge-session-reset.test.js

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/telegram-bridge.js`:
- Around line 294-296: Top-level environment and OpenShell validation that can
call process.exit is still executed on require; move those checks into the
entrypoint guard or into lazy initialization inside the exported main() so
requiring the module is safe. Specifically, take the top-level env/OpenShell
validation (the block currently run before main and that may call process.exit)
and either invoke it only when require.main === module or call it from the start
of the exported main() (or a new init function) so tests can require the module
without triggering process.exit; ensure the existing require.main === module
guard still calls main() after the validation is relocated.
- Around line 42-54: The session rotation currently uses an in-memory Map
(activeSessions) and a deterministic counter (sessionCounter) in getSessionId
and rotateSession, so restarts can recreate prior session IDs; change this by
generating collision-resistant per-chat IDs (e.g., crypto.randomUUID or secure
random suffix) instead of chatId-<counter> and persist the mapping for
activeSessions across restarts (load on startup and save on update) or store the
session id per chat in durable storage (file/DB). Update getSessionId to load or
generate a UUID when missing and persist it, and update rotateSession to
generate a new UUID and persist the new mapping, removing reliance on
sessionCounter.
- Around line 181-185: The fallback that treats exitCode 255 as a session-lock
error is too broad because /lock|session.*corrupt/i will match unrelated "lock"
mentions; update the matcher in the condition checking result.exitCode === 255
to require session-specific phrasing (e.g., require "session" near "lock" or
explicit phrases like "session file", "session.*locked", or "session.*corrupt")
so only outputs that clearly reference a session issue trigger the reset; locate
the condition using result.exitCode === 255 and the current regex and tighten it
to match session-scoped tokens (for example by combining "session" with
"lock"/"locked"/"corrupt") instead of matching bare "lock".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dc92bc7d-ff72-4560-a844-a34beec67331

📥 Commits

Reviewing files that changed from the base of the PR and between 2e1db39 and c0dde1e.

📒 Files selected for processing (2)
  • scripts/telegram-bridge.js
  • test/telegram-bridge-session-reset.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/telegram-bridge-session-reset.test.js

Comment on lines +42 to +54
const activeSessions = new Map(); // chatId → session suffix
let sessionCounter = 0;

function getSessionId(chatId) {
if (!activeSessions.has(chatId)) {
activeSessions.set(chatId, `${chatId}-${++sessionCounter}`);
}
return activeSessions.get(chatId);
}

function rotateSession(chatId) {
activeSessions.set(chatId, `${chatId}-${++sessionCounter}`);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Session rotation is not durable across bridge restarts.

The new session choice only lives in activeSessions. If the bridge restarts after /reset or after a detected lock failure, the map is empty again and Line 47/Line 53 fall back to the deterministic chatId-counter pattern, which can reopen the remote session you just rotated away from. Persist the per-chat session ID, or use a collision-resistant ID scheme that cannot recreate abandoned session IDs after restart.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/telegram-bridge.js` around lines 42 - 54, The session rotation
currently uses an in-memory Map (activeSessions) and a deterministic counter
(sessionCounter) in getSessionId and rotateSession, so restarts can recreate
prior session IDs; change this by generating collision-resistant per-chat IDs
(e.g., crypto.randomUUID or secure random suffix) instead of chatId-<counter>
and persist the mapping for activeSessions across restarts (load on startup and
save on update) or store the session id per chat in durable storage (file/DB).
Update getSessionId to load or generate a UUID when missing and persist it, and
update rotateSession to generate a new UUID and persist the new mapping,
removing reliance on sessionCounter.

@Junior00619 Junior00619 force-pushed the fix/auto-reset-session-on-lock-failure branch from 60aa4a2 to cfe3d63 Compare March 25, 2026 05:43
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
scripts/telegram-bridge.js (1)

179-197: Consider exporting utilities for testability.

The AI summary indicates tests extract and test isSessionLockFailure. If tests are parsing the source file to extract functions, consider exporting the utility functions (like isSessionLockFailure, getSessionId, rotateSession) for cleaner test imports:

module.exports = { isSessionLockFailure, getSessionId, rotateSession, init };

This is optional since the current test approach works, but explicit exports would make the test setup more maintainable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/telegram-bridge.js` around lines 179 - 197, Export the session
utility functions to make them directly importable for tests: add explicit
exports for isSessionLockFailure, getSessionId, rotateSession and init from
scripts/telegram-bridge.js so tests can require/import these helpers instead of
parsing the file; update the module.exports (or export object) to include those
function names and ensure existing default behavior remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/telegram-bridge.js`:
- Around line 179-197: Export the session utility functions to make them
directly importable for tests: add explicit exports for isSessionLockFailure,
getSessionId, rotateSession and init from scripts/telegram-bridge.js so tests
can require/import these helpers instead of parsing the file; update the
module.exports (or export object) to include those function names and ensure
existing default behavior remains unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f01d2ae-2094-4051-91cf-e6c913c2d0a9

📥 Commits

Reviewing files that changed from the base of the PR and between 60aa4a2 and cfe3d63.

📒 Files selected for processing (2)
  • scripts/telegram-bridge.js
  • test/telegram-bridge-session-reset.test.js
✅ Files skipped from review due to trivial changes (1)
  • test/telegram-bridge-session-reset.test.js

…ID on reset

- Only check result.response for lock text when exitCode !== 0, so a
  successful reply quoting the error string is not misidentified.
- Replace activeSessions.delete() with rotateSession() that assigns a
  new unique session suffix, ensuring the next runAgentInSandbox call
  uses a different --session-id and avoids reopening the corrupted
  remote session file.
- Update tests: add false-positive coverage, verify rotateSession
  wiring and getSessionId usage.
…op-level init

- Replace deterministic chatId-counter with crypto.randomUUID() so
  restarted bridges never recreate abandoned session IDs.
- Tighten exit-255 fallback regex to require session-specific phrasing
  (session.*lock, lock.*session, session.*corrupt) instead of bare 'lock'.
- Move env/OpenShell validation into init(), called only when
  require.main === module, so requiring the file in tests is safe.
- Add test for bare 'lock' false positive on exit 255.
@Junior00619 Junior00619 force-pushed the fix/auto-reset-session-on-lock-failure branch from cfe3d63 to a91573e Compare March 25, 2026 15:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
scripts/telegram-bridge.js (1)

188-197: Consider exporting isSessionLockFailure for cleaner testing.

The test file currently extracts this function via regex and eval, which is fragile if the function formatting changes. Exporting it would enable direct import in tests:

// At end of file, before the entrypoint guard:
module.exports = { isSessionLockFailure };

This is optional since the current approach works, but it would make tests more maintainable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/telegram-bridge.js` around lines 188 - 197, Export the
isSessionLockFailure function so tests can import it directly instead of using
regex+eval; add a module.exports assignment that exposes isSessionLockFailure
(e.g., module.exports = { isSessionLockFailure }) near the end of the file
before the entrypoint guard so tests can require/import the function by name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/telegram-bridge.js`:
- Around line 188-197: Export the isSessionLockFailure function so tests can
import it directly instead of using regex+eval; add a module.exports assignment
that exposes isSessionLockFailure (e.g., module.exports = { isSessionLockFailure
}) near the end of the file before the entrypoint guard so tests can
require/import the function by name.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 74eef71a-8990-442f-82f9-06897edf3710

📥 Commits

Reviewing files that changed from the base of the PR and between cfe3d63 and a91573e.

📒 Files selected for processing (2)
  • scripts/telegram-bridge.js
  • test/telegram-bridge-session-reset.test.js
✅ Files skipped from review due to trivial changes (1)
  • test/telegram-bridge-session-reset.test.js

@Junior00619
Copy link
Contributor Author

🧹 Nitpick comments (1)

scripts/telegram-bridge.js (1)> 188-197: Consider exporting isSessionLockFailure for cleaner testing.

The test file currently extracts this function via regex and eval, which is fragile if the function formatting changes. Exporting it would enable direct import in tests:

// At end of file, before the entrypoint guard:
module.exports = { isSessionLockFailure };

This is optional since the current approach works, but it would make tests more maintainable.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/telegram-bridge.js` around lines 188 - 197, Export the
isSessionLockFailure function so tests can import it directly instead of using
regex+eval; add a module.exports assignment that exposes isSessionLockFailure
(e.g., module.exports = { isSessionLockFailure }) near the end of the file
before the entrypoint guard so tests can require/import the function by name.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/telegram-bridge.js`:
- Around line 188-197: Export the isSessionLockFailure function so tests can
import it directly instead of using regex+eval; add a module.exports assignment
that exposes isSessionLockFailure (e.g., module.exports = { isSessionLockFailure
}) near the end of the file before the entrypoint guard so tests can
require/import the function by name.

ℹ️ Review info

Good suggestion but im keeping it as-is for now since the PR scope is the queue fix, but happy to add exports in a follow-up if the team wants it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Session not automatically reset after lock failure

1 participant