Skip to content

feat(contentsafety): add content-safety scanning at CLI output boundary#435

Open
deane0310 wants to merge 1 commit intomainfrom
feat/content-safety
Open

feat(contentsafety): add content-safety scanning at CLI output boundary#435
deane0310 wants to merge 1 commit intomainfrom
feat/content-safety

Conversation

@deane0310
Copy link
Copy Markdown
Collaborator

@deane0310 deane0310 commented Apr 13, 2026

Summary

  • Add a cross-cutting content-safety layer that scans API responses for prompt injection patterns before they reach AI agents
  • The feature is opt-in via two environment variables (LARK_CONTENT_SAFETY_MODE + LARK_CONTENT_SAFETY_ALLOWLIST) and defaults to off with zero impact on existing behavior
  • Built-in regex provider with 4 rules, pluggable Provider interface for extensibility

Architecture

  • extension/contentsafety: pluggable Provider interface and registry
  • internal/security/contentsafety: built-in regex provider (4 rules)
  • internal/output: ScanForSafety entry point + EmitLarkResponse
  • runner.go Out/OutFormat: 5-line block check at top, minimal invasiveness
  • response.go HandleResponse: routes through EmitLarkResponse

Key design decisions

  • Single-provider registry (last-write-wins), aligned with extension/transport
  • 100ms deadline enforced via goroutine+select (works even if provider ignores ctx)
  • Panic/error/timeout all fail-open with stderr warning
  • Built-in provider self-registers via init(), activated by blank import in factory_default.go

Test plan

  • Unit tests for scanner, provider, normalizer, injection detector
  • Integration tests for EmitLarkResponse with safety scanning
  • Verify opt-in behavior: feature off by default, no impact on existing commands
  • Verify fail-open behavior on timeout/error/panic

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Built-in content-safety scanning for CLI outputs and API responses with modes: off/warn/block and command-path allowlist.
    • Injects a response alert field or emits console warnings; blocking prevents output and returns a structured error.
    • New environment variables to configure mode and allowlist.
  • Tests

    • Comprehensive unit and integration tests covering scanning, allowlist, modes, provider behavior, and robustness.

@github-actions github-actions bot added the size/L Large or sensitive change across domains or core paths label Apr 13, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 54315e34-74ad-4d19-b97c-c8a70024fabc

📥 Commits

Reviewing files that changed from the base of the PR and between b50cb5b and 17655ae.

📒 Files selected for processing (23)
  • cmd/api/api.go
  • cmd/service/service.go
  • extension/contentsafety/registry.go
  • extension/contentsafety/types.go
  • extension/contentsafety/types_test.go
  • internal/client/response.go
  • internal/cmdutil/factory_default.go
  • internal/envvars/envvars.go
  • internal/output/emit.go
  • internal/output/emit_core.go
  • internal/output/emit_core_test.go
  • internal/output/emit_integration_test.go
  • internal/output/emit_test.go
  • internal/output/envelope.go
  • internal/security/contentsafety/injection.go
  • internal/security/contentsafety/injection_test.go
  • internal/security/contentsafety/normalize.go
  • internal/security/contentsafety/normalize_test.go
  • internal/security/contentsafety/provider.go
  • internal/security/contentsafety/provider_test.go
  • internal/security/contentsafety/scanner.go
  • internal/security/contentsafety/scanner_test.go
  • shortcuts/common/runner.go
✅ Files skipped from review due to trivial changes (9)
  • internal/envvars/envvars.go
  • extension/contentsafety/registry.go
  • internal/security/contentsafety/injection_test.go
  • internal/security/contentsafety/scanner.go
  • internal/security/contentsafety/normalize_test.go
  • internal/security/contentsafety/scanner_test.go
  • extension/contentsafety/types.go
  • internal/output/emit_test.go
  • internal/output/emit_integration_test.go
🚧 Files skipped from review as they are similar to previous changes (10)
  • internal/cmdutil/factory_default.go
  • cmd/service/service.go
  • internal/output/envelope.go
  • internal/security/contentsafety/normalize.go
  • internal/security/contentsafety/injection.go
  • extension/contentsafety/types_test.go
  • internal/security/contentsafety/provider.go
  • internal/client/response.go
  • internal/security/contentsafety/provider_test.go
  • shortcuts/common/runner.go

📝 Walkthrough

Walkthrough

Adds a content-safety scanning subsystem for CLI responses: provider interface and registry, a regex-based provider, allowlist/mode controls, scan-before-emit integration into output paths, command-path propagation to response handling, and extensive tests.

Changes

Cohort / File(s) Summary
API & Service call-sites
cmd/api/api.go, cmd/service/service.go
Pass CommandPath into client.ResponseOptions at HandleResponse call sites.
Content-safety public API & registry
extension/contentsafety/types.go, extension/contentsafety/registry.go, extension/contentsafety/types_test.go
Add Provider interface, ScanRequest/Alert/RuleMatch types, and mutex-protected Register/GetProvider.
Output emission & scan orchestration
internal/output/emit.go, internal/output/emit_core.go, internal/output/envelope.go, shortcuts/common/runner.go
Introduce scan entrypoints (ScanForSafety), emit helpers (EmitShortcut, EmitLarkResponse), allowlist/mode logic, envelope _content_safety_alert injection and scan-before-emit behavior for shortcut and non-JSON formats.
Response plumbing
internal/client/response.go, internal/cmdutil/factory_default.go
Add ResponseOptions.CommandPath; refactor JSON response handling to use EmitLarkResponse; add blank import to register default provider.
Env vars
internal/envvars/envvars.go
Add CliContentSafetyMode and CliContentSafetyAllowlist constants.
Regex provider implementation
internal/security/contentsafety/injection.go, internal/security/contentsafety/normalize.go, internal/security/contentsafety/scanner.go, internal/security/contentsafety/provider.go
Add precompiled injection regex rules, normalization helper, traversal/scanning logic with limits, and regex-based provider that returns sorted matches.
Provider & core tests
internal/security/contentsafety/*.go tests (*_test.go), internal/output/emit_core_test.go, internal/output/emit_integration_test.go, internal/output/emit_test.go, extension/contentsafety/types_test.go
Extensive unit and integration tests covering mode/allowlist behavior, panic/timeout recovery, injection detection, envelope injection, and emit flows.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant APIHandler as API/Service Handler
    participant RespHandler as Response Handler
    participant OutputEmitter as Output Emitter
    participant SafetyGate as Content-Safety Gate
    participant Provider as Content-Safety Provider
    participant StdOut as StdOut/Err

    Client->>APIHandler: request
    APIHandler->>RespHandler: HandleResponse(data, opts with CommandPath)
    RespHandler->>OutputEmitter: EmitLarkResponse/EmitShortcut(data, format...)
    OutputEmitter->>SafetyGate: ScanForSafety(cmdPath, data)
    SafetyGate->>SafetyGate: check mode & allowlist
    alt scan needed
        SafetyGate->>Provider: Scan(ctx, ScanRequest)
        Provider-->>SafetyGate: Alert or nil
        alt Alert & Mode=Block
            SafetyGate-->>OutputEmitter: return block error (no output)
        else Alert & Mode=Warn
            SafetyGate-->>OutputEmitter: ScanResult{Alert}
            OutputEmitter->>StdOut: emit output with _content_safety_alert / warn to Err
        else no alert
            SafetyGate-->>OutputEmitter: proceed
            OutputEmitter->>StdOut: emit normal output
        end
    else not scanning
        SafetyGate-->>OutputEmitter: proceed
        OutputEmitter->>StdOut: emit normal output
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested labels

size/XL

Suggested reviewers

  • liangshuo-1
  • chanthuang
  • fangshuyu-768

Poem

🐰 I hop and sniff through outputs bright,
I scan for tricks that hide from sight,
With regex ears and allowlist map,
I warn or block the sneaky trap,
Safe responses—hop!—back to the light.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.10% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding content-safety scanning at the CLI output boundary, which aligns with the PR's core objective.
Description check ✅ Passed The description covers key sections: Summary explains the feature and opt-in nature, Architecture outlines the structure, Key design decisions highlights implementation approach, and Test plan is provided. However, the Test plan section uses checkboxes with unchecked items rather than describing actual verification performed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/content-safety

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 13, 2026

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@17655ae6fbe5e28bb2faeea91f144b0cf6dcb8b8

🧩 Skill update

npx skills add larksuite/cli#feat/content-safety -y -g

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/security/contentsafety/normalize_test.go (1)

79-87: Strengthen the unmarshalable fallback assertion.

The current check only verifies non-nil. Assert type preservation so regressions are caught.

Suggested test tightening
 func TestNormalize_Unmarshalable(t *testing.T) {
 	// A struct with a func field cannot be marshaled; normalize must return original.
 	type Bad struct{ F func() }
 	v := Bad{F: func() {}}
 	got := normalize(v)
-	// Just verify no panic and that we got something back (the original value).
-	if got == nil {
-		t.Error("expected non-nil return for unmarshalable value")
-	}
+	if _, ok := got.(Bad); !ok {
+		t.Fatalf("expected original Bad value, got %T", got)
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/security/contentsafety/normalize_test.go` around lines 79 - 87, The
test TestNormalize_Unmarshalable must not only ensure normalize(v) doesn't
return nil but also that it returns a value of the original type; update the
assertion on the result of calling normalize(v) to verify the returned value's
dynamic type matches the input type (i.e., preserve type Bad) using a type
assertion or reflect.TypeOf on the result from normalize, so regressions that
change the fallback behavior are detected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/security/contentsafety/injection.go`:
- Around line 35-37: The delimiter_smuggle regex currently uses
regexp.MustCompile(...) and is case-sensitive, so tokens like "### SYSTEM:" or
"### Assistant:" bypass it; update the pattern in delimiter_smuggle by adding
the case-insensitive flag at the start (e.g. prepend "(?i)" to the existing
regex passed to regexp.MustCompile) so the whole alternation
(`<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`)
is matched case-insensitively. Ensure you modify the pattern string where
regexp.MustCompile is called (the variable/reference named delimiter_smuggle)
and run tests to confirm behavior.

---

Nitpick comments:
In `@internal/security/contentsafety/normalize_test.go`:
- Around line 79-87: The test TestNormalize_Unmarshalable must not only ensure
normalize(v) doesn't return nil but also that it returns a value of the original
type; update the assertion on the result of calling normalize(v) to verify the
returned value's dynamic type matches the input type (i.e., preserve type Bad)
using a type assertion or reflect.TypeOf on the result from normalize, so
regressions that change the fallback behavior are detected.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 695887df-217d-49bf-b922-dbc8e6d6086d

📥 Commits

Reviewing files that changed from the base of the PR and between dc0d927 and b50cb5b.

📒 Files selected for processing (23)
  • cmd/api/api.go
  • cmd/service/service.go
  • extension/contentsafety/registry.go
  • extension/contentsafety/types.go
  • extension/contentsafety/types_test.go
  • internal/client/response.go
  • internal/cmdutil/factory_default.go
  • internal/envvars/envvars.go
  • internal/output/emit.go
  • internal/output/emit_core.go
  • internal/output/emit_core_test.go
  • internal/output/emit_integration_test.go
  • internal/output/emit_test.go
  • internal/output/envelope.go
  • internal/security/contentsafety/injection.go
  • internal/security/contentsafety/injection_test.go
  • internal/security/contentsafety/normalize.go
  • internal/security/contentsafety/normalize_test.go
  • internal/security/contentsafety/provider.go
  • internal/security/contentsafety/provider_test.go
  • internal/security/contentsafety/scanner.go
  • internal/security/contentsafety/scanner_test.go
  • shortcuts/common/runner.go

Comment on lines +35 to +37
Pattern: regexp.MustCompile(
`<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`,
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make delimiter_smuggle case-insensitive to avoid simple bypasses.

Line 36 currently misses (?i), so variants like ### SYSTEM: / ### Assistant: won’t match.

Suggested fix
 		{
 			ID: "delimiter_smuggle",
 			Pattern: regexp.MustCompile(
-				`<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`,
+				`(?i)<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`,
 			),
 		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/security/contentsafety/injection.go` around lines 35 - 37, The
delimiter_smuggle regex currently uses regexp.MustCompile(...) and is
case-sensitive, so tokens like "### SYSTEM:" or "### Assistant:" bypass it;
update the pattern in delimiter_smuggle by adding the case-insensitive flag at
the start (e.g. prepend "(?i)" to the existing regex passed to
regexp.MustCompile) so the whole alternation
(`<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`)
is matched case-insensitively. Ensure you modify the pattern string where
regexp.MustCompile is called (the variable/reference named delimiter_smuggle)
and run tests to confirm behavior.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 13, 2026

Greptile Summary

This PR adds an opt-in content-safety scanning layer (LARKSUITE_CLI_CONTENT_SAFETY_MODE + LARKSUITE_CLI_CONTENT_SAFETY_ALLOWLIST) that intercepts API responses before they reach AI agents and checks them for prompt-injection patterns using a built-in regex provider (4 rules) with a pluggable Provider interface. The feature is wired into shortcuts/common/runner.go (Out/OutFormat) and internal/client/response.go (EmitLarkResponse), defaults to off, and is fail-open on timeout/panic/error.

Confidence Score: 4/5

Safe to merge with reservations — three open P1 findings from prior threads (map mutation, goroutine data race on bytes.Buffer, dead EmitShortcut export) should be addressed before shipping.

The core scanning logic is well-structured, fail-open, and thoroughly tested. Two new P2 findings (isLarkShapedMap too broad, wrapBlockError nil fragility) are minor. However, three P1 issues from prior review threads remain unaddressed: routeLarkAlert mutates the caller's map in-place, the leaked goroutine can race on non-thread-safe errOut writers in tests, and EmitShortcut is documented as the canonical shortcut path but is never actually called by runner.go.

internal/output/emit.go (routeLarkAlert mutation, isLarkShapedMap), internal/output/emit_core.go (goroutine/errOut race, wrapBlockError fragility)

Important Files Changed

Filename Overview
internal/output/emit.go New: ScanForSafety, EmitShortcut, EmitLarkResponse, and routeLarkAlert; routeLarkAlert mutates caller's map in-place (flagged in prior thread); isLarkShapedMap heuristic is too broad
internal/output/emit_core.go New: runContentSafety with goroutine+select timeout enforcement; leaked-goroutine/data-race risk on bytes.Buffer errOut (flagged in prior thread); wrapBlockError fragile if nil alert is ever passed
internal/security/contentsafety/injection.go New: 4 compiled injection rules (instruction_override, role_injection, system_prompt_leak, delimiter_smuggle); patterns look correct and well-tested
internal/security/contentsafety/scanner.go New: depth-first walker with maxDepth=64 and maxStringBytes=128KiB guards; byte-index truncation on UTF-8 boundary noted in prior thread
shortcuts/common/runner.go Modified: ScanForSafety wired into Out and OutFormat; pretty+nil prettyFn path correctly delegates scanning to Out; three separate scan sites are consistent but scattered
internal/client/response.go Modified: adds CommandPath to ResponseOptions and routes JSON output through EmitLarkResponse; correct
extension/contentsafety/registry.go New: mutex-protected global provider registry with last-write-wins semantics; clean and correct
extension/contentsafety/types.go New: Provider interface, ScanRequest, Alert, and RuleMatch types; well-documented with clear contracts
internal/output/envelope.go Modified: adds ContentSafetyAlert interface{} field serialized as _content_safety_alert; omitempty ensures zero impact when feature is off
internal/cmdutil/factory_default.go Modified: blank import registers the default regex provider via init(); consistent with existing localfileio pattern
internal/envvars/envvars.go Modified: two new env var constants added; PR description incorrectly refers to them as LARK_CONTENT_SAFETY_* instead of LARKSUITE_CLI_CONTENT_SAFETY_*

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[API Response] --> B{MODE env var}
    B -- off / unset --> Z[Pass through - no scanning]
    B -- warn / block --> C{Command in ALLOWLIST?}
    C -- no --> Z
    C -- yes --> D{Provider registered?}
    D -- nil --> Z
    D -- regex provider --> E[runContentSafety goroutine + 100ms deadline]
    E -- timeout / panic / error --> Z
    E -- clean nil alert --> Z
    E -- hit + warn mode --> F[alert non-nil errBlocked = nil]
    E -- hit + block mode --> G[alert non-nil errBlocked returned]
    G --> H[wrapBlockError - return ExitError]
    F --> I{Output path}
    I -- EmitShortcut json/jq --> J[Envelope._content_safety_alert]
    I -- EmitShortcut pretty+prettyFn --> K[WriteAlertWarning to stderr]
    I -- EmitLarkResponse Lark-shaped JSON --> L[Inject into response map]
    I -- EmitLarkResponse non-JSON / jq --> K
    I -- runner.Out --> J
    I -- runner.OutFormat pretty+fn / default --> K
Loading

Reviews (2): Last reviewed commit: "feat(contentsafety): add content-safety ..." | Re-trigger Greptile

Add a cross-cutting content-safety layer that scans API responses for
prompt injection patterns before they reach AI agents. The feature is
opt-in via two environment variables (MODE + ALLOWLIST) and defaults
to off with zero impact on existing behavior.

Architecture:
- extension/contentsafety: pluggable Provider interface and registry
- internal/security/contentsafety: built-in regex provider (4 rules)
- internal/output: ScanForSafety entry point + EmitLarkResponse
- runner.go Out/OutFormat: 5-line block check at top, minimal invasiveness
- response.go HandleResponse: routes through EmitLarkResponse

Key design decisions:
- Single-provider registry (last-write-wins), aligned with extension/transport
- 100ms deadline enforced via goroutine+select (works even if provider ignores ctx)
- Panic/error/timeout all fail-open with stderr warning
- Built-in provider self-registers via init(), activated by blank import in factory_default.go

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@deane0310 deane0310 force-pushed the feat/content-safety branch from b50cb5b to 17655ae Compare April 13, 2026 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large or sensitive change across domains or core paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant