Skip to content

feat: add unanswered Slack mention detection to heartbeat#186

Merged
benvinegar merged 5 commits intomodem-dev:mainfrom
vltbaudbot:feat/unanswered-slack-mentions
Mar 1, 2026
Merged

feat: add unanswered Slack mention detection to heartbeat#186
benvinegar merged 5 commits intomodem-dev:mainfrom
vltbaudbot:feat/unanswered-slack-mentions

Conversation

@vltbaudbot
Copy link
Contributor

Summary

Adds periodic checking for Slack app_mention events that didn't receive a reply, helping catch messages lost during agent restarts.

Problem

During control-agent restarts, messages forwarded to dead sockets are lost forever because the Slack bridge acks them before confirming agent receipt. This creates a window where user messages can be silently dropped.

Example incident: On 2026-02-27, 3 messages were lost during a 45-minute restart window (05:03-05:48 UTC).

Solution

This PR adds an unanswered mention check to the heartbeat extension that:

  1. Scans bridge logs for recent app_mention events (last hour)
  2. Checks for replies by searching:
    • Bridge logs for POST /send or /reply with the thread_ts
    • Control-agent session logs for thread_ts references
  3. Reports unanswered mentions older than 5 minutes (grace period for processing)
  4. Runs automatically every 10 minutes via existing heartbeat mechanism
  5. Zero-token when healthy - only prompts LLM when issues are found

Changes

  • Add checkUnansweredMentions() function
  • Add hasRepliedToThread() helper for reply detection
  • Integrate into main heartbeat flow
  • Add HEARTBEAT_CHECK_UNANSWERED_MENTIONS env var (enabled by default)
  • Add configuration output for new settings

Testing

Tested locally during incident recovery - successfully detected the 3 lost messages from the restart.

Configuration

# Enabled by default, disable with:
HEARTBEAT_CHECK_UNANSWERED_MENTIONS=0

# Adjust grace period (default 5 min) by editing UNANSWERED_MENTION_THRESHOLD_MS constant

Follow-up Work

This is a detection/monitoring solution. Long-term reliability improvements would require changes to the bridge architecture (two-phase ack, persistent queue, etc) and potentially the broker service itself.

Adds periodic checking for Slack app_mention events that didn't receive a
reply, helping catch messages lost during agent restarts.

Changes:
- Add checkUnansweredMentions() function that scans bridge log for recent
  app_mention events (last hour, older than 5 min)
- Add hasRepliedToThread() helper that checks bridge logs and control-agent
  session logs for evidence of replies
- Integrate check into main heartbeat flow (runs every 10 min by default)
- Add HEARTBEAT_CHECK_UNANSWERED_MENTIONS env var (enabled by default)
- Add UNANSWERED_MENTION_THRESHOLD_MS constant (5 min grace period)
- Update heartbeat config output to show new settings

This addresses the issue where messages forwarded to dead sockets during
restarts are lost because the bridge acks them to Slack before confirming
agent receipt. The heartbeat will now alert the control-agent about
unanswered mentions so it can follow up.

Fixes dropped messages during agent restart windows.
The initial implementation checked for any occurrence of thread_ts in session
logs, which would match both inbound mentions AND outbound replies. This led
to false positives.

Improvements:
1. Add support for slack-reply-log.jsonl (most reliable source)
2. Check for thread_ts in outbound context specifically (JSON keys in curl
   commands or send_to_session calls)
3. Use proper session subdirectory path
4. Add multiple JSON formatting variations to catch escaped quotes
5. Add documentation for future reaction-based checking

This makes the check more precise - only counting actual replies, not just
seeing the mention.
@greptile-apps
Copy link

greptile-apps bot commented Feb 27, 2026

Greptile Summary

Added periodic unanswered Slack mention detection to the heartbeat extension to catch messages lost during agent restarts. The implementation scans bridge logs for app_mention events and checks for replies by searching both bridge and session logs.

Key changes:

  • Scans last 500 lines of bridge log for mentions within the past hour
  • Checks for replies via POST /send or /reply in bridge logs, or thread_ts references in control-agent session logs
  • Reports mentions older than 5 minutes without replies
  • Configurable via HEARTBEAT_CHECK_UNANSWERED_MENTIONS env var (enabled by default)
  • Zero-token when healthy, only prompts LLM when unanswered mentions are found

Issues found:

  • Documentation: New env var HEARTBEAT_CHECK_UNANSWERED_MENTIONS is not documented in CONFIGURATION.md (violates guideline: "When behavior changes, update docs in the same PR")
  • Tests: No tests added for new functions (isUnansweredMentionsCheckEnabled(), checkUnansweredMentions(), hasRepliedToThread()) despite existing comprehensive test suite
  • Comment on line 18 incorrectly states the env var needs to be set to "1" to enable when it's actually enabled by default

Confidence Score: 4/5

  • Safe to merge with minor documentation and testing improvements recommended
  • Implementation is solid with proper error handling and follows existing patterns. Deducted one point for missing documentation in CONFIGURATION.md and lack of tests for new functionality. The code is safe and will work correctly, but doesn't meet the project's documentation and testing standards.
  • Check that CONFIGURATION.md is updated with the new env var before merging

Important Files Changed

Filename Overview
pi/extensions/heartbeat.ts Added unanswered Slack mention detection with log parsing and reply checking. Minor documentation and test coverage issues.

Last reviewed commit: ed2c48d

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

* HEARTBEAT_INTERVAL_MS — interval between heartbeats (default: 600000 = 10 min)
* HEARTBEAT_ENABLED — set to "0" or "false" to disable (default: enabled)
* HEARTBEAT_EXPECTED_SESSIONS — comma-separated session aliases to check (default: "sentry-agent")
* HEARTBEAT_CHECK_UNANSWERED_MENTIONS — set to "1" or "true" to enable (default: enabled)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is misleading - should say "enabled by default, set to 0/false/no to disable"

Suggested change
* HEARTBEAT_CHECK_UNANSWERED_MENTIONS set to "1" or "true" to enable (default: enabled)
* HEARTBEAT_CHECK_UNANSWERED_MENTIONS enabled by default, set to "0" or "false" to disable
Prompt To Fix With AI
This is a comment left during a code review.
Path: pi/extensions/heartbeat.ts
Line: 18

Comment:
Comment is misleading - should say "enabled by default, set to 0/false/no to disable"

```suggestion
 *   HEARTBEAT_CHECK_UNANSWERED_MENTIONS — enabled by default, set to "0" or "false" to disable
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this comment for accuracy: HEARTBEAT_CHECK_UNANSWERED_MENTIONS is enabled by default, and can be disabled with 0, false, or no.

Responded by pi-coding-agent using openai/gpt-5.

try {
const content = readFileSync(join(controlAgentSessionDir, file), "utf-8");
// Look for evidence of a reply: the thread_ts appearing in a curl /send command
// or in a send_to_session message. The thread_ts in an outbound context
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple string search could have false positives if threadTs appears in error messages or logs without being an actual reply. Consider checking for more specific patterns like the thread being mentioned in a tool call or message send context.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: pi/extensions/heartbeat.ts
Line: 411

Comment:
Simple string search could have false positives if `threadTs` appears in error messages or logs without being an actual reply. Consider checking for more specific patterns like the thread being mentioned in a tool call or message send context.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — I tightened hasRepliedToThread() to avoid broad file-level string matches. It now:

  • parses slack-reply-log.jsonl line-by-line and matches exact thread_ts
  • scans recent control-agent JSONL sessions for assistant bash tool calls that are explicit outbound curl ... /send commands with matching thread_ts

I also added tests covering outbound detection and false-positive avoidance.

Responded by pi-coding-agent using openai/gpt-5.

@benvinegar benvinegar merged commit d19502c into modem-dev:main Mar 1, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants