fix: auto-reset telegram bridge session after lock failure#862
fix: auto-reset telegram bridge session after lock failure#862Junior00619 wants to merge 3 commits intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughBridge startup is lazy-initialized; agent sandbox calls return structured results Changes
Sequence DiagramsequenceDiagram
participant User as User (Telegram)
participant Bridge as Telegram Bridge
participant Sandbox as Agent Sandbox
participant Session as Session Store
User->>Bridge: Send message
Bridge->>Session: getSessionId(chatId)
Bridge->>Sandbox: runAgentInSandbox(message, sessionId)
Sandbox-->>Bridge: { response, exitCode, stderr }
Bridge->>Bridge: isSessionLockFailure(result)?
alt not locked
Bridge-->>User: Send result.response
else locked
Bridge->>Session: rotateSession(chatId)
Bridge-->>User: Notify "session reset, please resend"
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/telegram-bridge.js`:
- Around line 166-170: The isSessionLockFailure function currently treats any
response text containing "session file locked" as a lock failure even when the
process succeeded; change the logic so that checks against result.response are
only performed when the run actually failed (e.g., result.exitCode !== 0), while
continuing to always check result.stderr regardless of exitCode; update the
first branch that tests output.includes("session file locked") to only consider
result.response when exitCode indicates failure and use stderr unconditionally,
keeping the existing second branch (result.exitCode === 255 &&
/lock|session.*corrupt/i.test(output)) as-is.
- Around line 224-233: The session-reset branch only deletes the local
activeSessions entry but does not change the session id used by
runAgentInSandbox, so retries reopen the same remote session; update the reset
logic in the isSessionLockFailure handling to rotate the per-chat session id
stored in activeSessions (e.g., generate a new unique session id and set
activeSessions.set(chatId, newId)) before sending the user the reset message, or
alternatively call the helper that deletes the remote session state used by
runAgentInSandbox/SSH (remove remote session file) so the next
runAgentInSandbox(msg.text, chatId) call will use a fresh session; apply the
same fix in the /reset path to ensure activeSessions is actually changed or
remote state cleared.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8d794e28-ec62-4a02-969b-dce09d0ac93d
📒 Files selected for processing (2)
scripts/telegram-bridge.jstest/telegram-bridge-session-reset.test.js
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/telegram-bridge.js`:
- Around line 294-296: Top-level environment and OpenShell validation that can
call process.exit is still executed on require; move those checks into the
entrypoint guard or into lazy initialization inside the exported main() so
requiring the module is safe. Specifically, take the top-level env/OpenShell
validation (the block currently run before main and that may call process.exit)
and either invoke it only when require.main === module or call it from the start
of the exported main() (or a new init function) so tests can require the module
without triggering process.exit; ensure the existing require.main === module
guard still calls main() after the validation is relocated.
- Around line 42-54: The session rotation currently uses an in-memory Map
(activeSessions) and a deterministic counter (sessionCounter) in getSessionId
and rotateSession, so restarts can recreate prior session IDs; change this by
generating collision-resistant per-chat IDs (e.g., crypto.randomUUID or secure
random suffix) instead of chatId-<counter> and persist the mapping for
activeSessions across restarts (load on startup and save on update) or store the
session id per chat in durable storage (file/DB). Update getSessionId to load or
generate a UUID when missing and persist it, and update rotateSession to
generate a new UUID and persist the new mapping, removing reliance on
sessionCounter.
- Around line 181-185: The fallback that treats exitCode 255 as a session-lock
error is too broad because /lock|session.*corrupt/i will match unrelated "lock"
mentions; update the matcher in the condition checking result.exitCode === 255
to require session-specific phrasing (e.g., require "session" near "lock" or
explicit phrases like "session file", "session.*locked", or "session.*corrupt")
so only outputs that clearly reference a session issue trigger the reset; locate
the condition using result.exitCode === 255 and the current regex and tighten it
to match session-scoped tokens (for example by combining "session" with
"lock"/"locked"/"corrupt") instead of matching bare "lock".
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dc92bc7d-ff72-4560-a844-a34beec67331
📒 Files selected for processing (2)
scripts/telegram-bridge.jstest/telegram-bridge-session-reset.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
- test/telegram-bridge-session-reset.test.js
scripts/telegram-bridge.js
Outdated
| const activeSessions = new Map(); // chatId → session suffix | ||
| let sessionCounter = 0; | ||
|
|
||
| function getSessionId(chatId) { | ||
| if (!activeSessions.has(chatId)) { | ||
| activeSessions.set(chatId, `${chatId}-${++sessionCounter}`); | ||
| } | ||
| return activeSessions.get(chatId); | ||
| } | ||
|
|
||
| function rotateSession(chatId) { | ||
| activeSessions.set(chatId, `${chatId}-${++sessionCounter}`); | ||
| } |
There was a problem hiding this comment.
Session rotation is not durable across bridge restarts.
The new session choice only lives in activeSessions. If the bridge restarts after /reset or after a detected lock failure, the map is empty again and Line 47/Line 53 fall back to the deterministic chatId-counter pattern, which can reopen the remote session you just rotated away from. Persist the per-chat session ID, or use a collision-resistant ID scheme that cannot recreate abandoned session IDs after restart.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/telegram-bridge.js` around lines 42 - 54, The session rotation
currently uses an in-memory Map (activeSessions) and a deterministic counter
(sessionCounter) in getSessionId and rotateSession, so restarts can recreate
prior session IDs; change this by generating collision-resistant per-chat IDs
(e.g., crypto.randomUUID or secure random suffix) instead of chatId-<counter>
and persist the mapping for activeSessions across restarts (load on startup and
save on update) or store the session id per chat in durable storage (file/DB).
Update getSessionId to load or generate a UUID when missing and persist it, and
update rotateSession to generate a new UUID and persist the new mapping,
removing reliance on sessionCounter.
60aa4a2 to
cfe3d63
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
scripts/telegram-bridge.js (1)
179-197: Consider exporting utilities for testability.The AI summary indicates tests extract and test
isSessionLockFailure. If tests are parsing the source file to extract functions, consider exporting the utility functions (likeisSessionLockFailure,getSessionId,rotateSession) for cleaner test imports:module.exports = { isSessionLockFailure, getSessionId, rotateSession, init };This is optional since the current test approach works, but explicit exports would make the test setup more maintainable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/telegram-bridge.js` around lines 179 - 197, Export the session utility functions to make them directly importable for tests: add explicit exports for isSessionLockFailure, getSessionId, rotateSession and init from scripts/telegram-bridge.js so tests can require/import these helpers instead of parsing the file; update the module.exports (or export object) to include those function names and ensure existing default behavior remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@scripts/telegram-bridge.js`:
- Around line 179-197: Export the session utility functions to make them
directly importable for tests: add explicit exports for isSessionLockFailure,
getSessionId, rotateSession and init from scripts/telegram-bridge.js so tests
can require/import these helpers instead of parsing the file; update the
module.exports (or export object) to include those function names and ensure
existing default behavior remains unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8f01d2ae-2094-4051-91cf-e6c913c2d0a9
📒 Files selected for processing (2)
scripts/telegram-bridge.jstest/telegram-bridge-session-reset.test.js
✅ Files skipped from review due to trivial changes (1)
- test/telegram-bridge-session-reset.test.js
…ID on reset - Only check result.response for lock text when exitCode !== 0, so a successful reply quoting the error string is not misidentified. - Replace activeSessions.delete() with rotateSession() that assigns a new unique session suffix, ensuring the next runAgentInSandbox call uses a different --session-id and avoids reopening the corrupted remote session file. - Update tests: add false-positive coverage, verify rotateSession wiring and getSessionId usage.
…op-level init - Replace deterministic chatId-counter with crypto.randomUUID() so restarted bridges never recreate abandoned session IDs. - Tighten exit-255 fallback regex to require session-specific phrasing (session.*lock, lock.*session, session.*corrupt) instead of bare 'lock'. - Move env/OpenShell validation into init(), called only when require.main === module, so requiring the file in tests is safe. - Add test for bare 'lock' false positive on exit 255.
cfe3d63 to
a91573e
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
scripts/telegram-bridge.js (1)
188-197: Consider exportingisSessionLockFailurefor cleaner testing.The test file currently extracts this function via regex and
eval, which is fragile if the function formatting changes. Exporting it would enable direct import in tests:// At end of file, before the entrypoint guard: module.exports = { isSessionLockFailure };This is optional since the current approach works, but it would make tests more maintainable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/telegram-bridge.js` around lines 188 - 197, Export the isSessionLockFailure function so tests can import it directly instead of using regex+eval; add a module.exports assignment that exposes isSessionLockFailure (e.g., module.exports = { isSessionLockFailure }) near the end of the file before the entrypoint guard so tests can require/import the function by name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@scripts/telegram-bridge.js`:
- Around line 188-197: Export the isSessionLockFailure function so tests can
import it directly instead of using regex+eval; add a module.exports assignment
that exposes isSessionLockFailure (e.g., module.exports = { isSessionLockFailure
}) near the end of the file before the entrypoint guard so tests can
require/import the function by name.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 74eef71a-8990-442f-82f9-06897edf3710
📒 Files selected for processing (2)
scripts/telegram-bridge.jstest/telegram-bridge-session-reset.test.js
✅ Files skipped from review due to trivial changes (1)
- test/telegram-bridge-session-reset.test.js
Good suggestion but im keeping it as-is for now since the PR scope is the queue fix, but happy to add exports in a follow-up if the team wants it. |
Fixes #833
Problem
A session-file lock failure could leave the Telegram bridge reusing a corrupted session on later messages. In practice this caused stale tool-call context and invalid history entries to bleed into subsequent turns, producing erratic replies.
Root Cause
runAgentInSandboxreturned only a response string, so the caller had no access to exit status orstderr. That made session-lock failures indistinguishable from normal completions, and the corresponding entry inactiveSessionswas left intact.Fix
Changed
runAgentInSandboxto return structured process results:{ response, exitCode, stderr }Added
isSessionLockFailure(...)to identify the specific class of failures that should invalidate session state. Detection uses both content and exit status:stderror response text255only when paired with lock/corruption signalsThis avoids resetting sessions for unrelated failures that can also return
255, such as SSH connection or auth errors.On a detected lock failure, the poll handler now:
activeSessionsUnrelated error handling is unchanged.
Also gated the entrypoint with
require.main === moduleto make the file safely loadable in tests.Tests
Added 13 regression tests covering detection behavior, false-positive avoidance, edge cases, and poll-handler integration.
Summary by CodeRabbit
Bug Fixes
Reliability
Tests