feat(contentsafety): add content-safety scanning at CLI output boundary#435
feat(contentsafety): add content-safety scanning at CLI output boundary#435
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (23)
✅ Files skipped from review due to trivial changes (9)
🚧 Files skipped from review as they are similar to previous changes (10)
📝 WalkthroughWalkthroughAdds a content-safety scanning subsystem for CLI responses: provider interface and registry, a regex-based provider, allowlist/mode controls, scan-before-emit integration into output paths, command-path propagation to response handling, and extensive tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant APIHandler as API/Service Handler
participant RespHandler as Response Handler
participant OutputEmitter as Output Emitter
participant SafetyGate as Content-Safety Gate
participant Provider as Content-Safety Provider
participant StdOut as StdOut/Err
Client->>APIHandler: request
APIHandler->>RespHandler: HandleResponse(data, opts with CommandPath)
RespHandler->>OutputEmitter: EmitLarkResponse/EmitShortcut(data, format...)
OutputEmitter->>SafetyGate: ScanForSafety(cmdPath, data)
SafetyGate->>SafetyGate: check mode & allowlist
alt scan needed
SafetyGate->>Provider: Scan(ctx, ScanRequest)
Provider-->>SafetyGate: Alert or nil
alt Alert & Mode=Block
SafetyGate-->>OutputEmitter: return block error (no output)
else Alert & Mode=Warn
SafetyGate-->>OutputEmitter: ScanResult{Alert}
OutputEmitter->>StdOut: emit output with _content_safety_alert / warn to Err
else no alert
SafetyGate-->>OutputEmitter: proceed
OutputEmitter->>StdOut: emit normal output
end
else not scanning
SafetyGate-->>OutputEmitter: proceed
OutputEmitter->>StdOut: emit normal output
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
🚀 PR Preview Install Guide🧰 CLI updatenpm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@17655ae6fbe5e28bb2faeea91f144b0cf6dcb8b8🧩 Skill updatenpx skills add larksuite/cli#feat/content-safety -y -g |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/security/contentsafety/normalize_test.go (1)
79-87: Strengthen the unmarshalable fallback assertion.The current check only verifies non-nil. Assert type preservation so regressions are caught.
Suggested test tightening
func TestNormalize_Unmarshalable(t *testing.T) { // A struct with a func field cannot be marshaled; normalize must return original. type Bad struct{ F func() } v := Bad{F: func() {}} got := normalize(v) - // Just verify no panic and that we got something back (the original value). - if got == nil { - t.Error("expected non-nil return for unmarshalable value") - } + if _, ok := got.(Bad); !ok { + t.Fatalf("expected original Bad value, got %T", got) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/security/contentsafety/normalize_test.go` around lines 79 - 87, The test TestNormalize_Unmarshalable must not only ensure normalize(v) doesn't return nil but also that it returns a value of the original type; update the assertion on the result of calling normalize(v) to verify the returned value's dynamic type matches the input type (i.e., preserve type Bad) using a type assertion or reflect.TypeOf on the result from normalize, so regressions that change the fallback behavior are detected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/security/contentsafety/injection.go`:
- Around line 35-37: The delimiter_smuggle regex currently uses
regexp.MustCompile(...) and is case-sensitive, so tokens like "### SYSTEM:" or
"### Assistant:" bypass it; update the pattern in delimiter_smuggle by adding
the case-insensitive flag at the start (e.g. prepend "(?i)" to the existing
regex passed to regexp.MustCompile) so the whole alternation
(`<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`)
is matched case-insensitively. Ensure you modify the pattern string where
regexp.MustCompile is called (the variable/reference named delimiter_smuggle)
and run tests to confirm behavior.
---
Nitpick comments:
In `@internal/security/contentsafety/normalize_test.go`:
- Around line 79-87: The test TestNormalize_Unmarshalable must not only ensure
normalize(v) doesn't return nil but also that it returns a value of the original
type; update the assertion on the result of calling normalize(v) to verify the
returned value's dynamic type matches the input type (i.e., preserve type Bad)
using a type assertion or reflect.TypeOf on the result from normalize, so
regressions that change the fallback behavior are detected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 695887df-217d-49bf-b922-dbc8e6d6086d
📒 Files selected for processing (23)
cmd/api/api.gocmd/service/service.goextension/contentsafety/registry.goextension/contentsafety/types.goextension/contentsafety/types_test.gointernal/client/response.gointernal/cmdutil/factory_default.gointernal/envvars/envvars.gointernal/output/emit.gointernal/output/emit_core.gointernal/output/emit_core_test.gointernal/output/emit_integration_test.gointernal/output/emit_test.gointernal/output/envelope.gointernal/security/contentsafety/injection.gointernal/security/contentsafety/injection_test.gointernal/security/contentsafety/normalize.gointernal/security/contentsafety/normalize_test.gointernal/security/contentsafety/provider.gointernal/security/contentsafety/provider_test.gointernal/security/contentsafety/scanner.gointernal/security/contentsafety/scanner_test.goshortcuts/common/runner.go
| Pattern: regexp.MustCompile( | ||
| `<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`, | ||
| ), |
There was a problem hiding this comment.
Make delimiter_smuggle case-insensitive to avoid simple bypasses.
Line 36 currently misses (?i), so variants like ### SYSTEM: / ### Assistant: won’t match.
Suggested fix
{
ID: "delimiter_smuggle",
Pattern: regexp.MustCompile(
- `<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`,
+ `(?i)<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`,
),
},🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/security/contentsafety/injection.go` around lines 35 - 37, The
delimiter_smuggle regex currently uses regexp.MustCompile(...) and is
case-sensitive, so tokens like "### SYSTEM:" or "### Assistant:" bypass it;
update the pattern in delimiter_smuggle by adding the case-insensitive flag at
the start (e.g. prepend "(?i)" to the existing regex passed to
regexp.MustCompile) so the whole alternation
(`<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`)
is matched case-insensitively. Ensure you modify the pattern string where
regexp.MustCompile is called (the variable/reference named delimiter_smuggle)
and run tests to confirm behavior.
Greptile SummaryThis PR adds an opt-in content-safety scanning layer ( Confidence Score: 4/5Safe to merge with reservations — three open P1 findings from prior threads (map mutation, goroutine data race on bytes.Buffer, dead EmitShortcut export) should be addressed before shipping. The core scanning logic is well-structured, fail-open, and thoroughly tested. Two new P2 findings (isLarkShapedMap too broad, wrapBlockError nil fragility) are minor. However, three P1 issues from prior review threads remain unaddressed: routeLarkAlert mutates the caller's map in-place, the leaked goroutine can race on non-thread-safe errOut writers in tests, and EmitShortcut is documented as the canonical shortcut path but is never actually called by runner.go. internal/output/emit.go (routeLarkAlert mutation, isLarkShapedMap), internal/output/emit_core.go (goroutine/errOut race, wrapBlockError fragility) Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[API Response] --> B{MODE env var}
B -- off / unset --> Z[Pass through - no scanning]
B -- warn / block --> C{Command in ALLOWLIST?}
C -- no --> Z
C -- yes --> D{Provider registered?}
D -- nil --> Z
D -- regex provider --> E[runContentSafety goroutine + 100ms deadline]
E -- timeout / panic / error --> Z
E -- clean nil alert --> Z
E -- hit + warn mode --> F[alert non-nil errBlocked = nil]
E -- hit + block mode --> G[alert non-nil errBlocked returned]
G --> H[wrapBlockError - return ExitError]
F --> I{Output path}
I -- EmitShortcut json/jq --> J[Envelope._content_safety_alert]
I -- EmitShortcut pretty+prettyFn --> K[WriteAlertWarning to stderr]
I -- EmitLarkResponse Lark-shaped JSON --> L[Inject into response map]
I -- EmitLarkResponse non-JSON / jq --> K
I -- runner.Out --> J
I -- runner.OutFormat pretty+fn / default --> K
Reviews (2): Last reviewed commit: "feat(contentsafety): add content-safety ..." | Re-trigger Greptile |
Add a cross-cutting content-safety layer that scans API responses for prompt injection patterns before they reach AI agents. The feature is opt-in via two environment variables (MODE + ALLOWLIST) and defaults to off with zero impact on existing behavior. Architecture: - extension/contentsafety: pluggable Provider interface and registry - internal/security/contentsafety: built-in regex provider (4 rules) - internal/output: ScanForSafety entry point + EmitLarkResponse - runner.go Out/OutFormat: 5-line block check at top, minimal invasiveness - response.go HandleResponse: routes through EmitLarkResponse Key design decisions: - Single-provider registry (last-write-wins), aligned with extension/transport - 100ms deadline enforced via goroutine+select (works even if provider ignores ctx) - Panic/error/timeout all fail-open with stderr warning - Built-in provider self-registers via init(), activated by blank import in factory_default.go Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
b50cb5b to
17655ae
Compare
Summary
LARK_CONTENT_SAFETY_MODE+LARK_CONTENT_SAFETY_ALLOWLIST) and defaults to off with zero impact on existing behaviorArchitecture
extension/contentsafety: pluggable Provider interface and registryinternal/security/contentsafety: built-in regex provider (4 rules)internal/output:ScanForSafetyentry point +EmitLarkResponserunner.goOut/OutFormat: 5-line block check at top, minimal invasivenessresponse.goHandleResponse: routes throughEmitLarkResponseKey design decisions
extension/transportinit(), activated by blank import infactory_default.goTest plan
EmitLarkResponsewith safety scanning🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests