Skip to content

[SECURITY] No independent LLM verification before command execution #800

@h-network

Description

@h-network

Problem Statement

NemoClaw relies on OpenShell policies (filesystem + network rules) to constrain agent behavior. There is no AI-level verification of agent actions before execution — the agent decides and acts within the policy boundaries.

This means:

  • Any action within policy boundaries is automatically allowed
  • No second opinion on whether an action is appropriate
  • No distinction between "technically allowed" and "operationally safe"

Impact

An agent operating within policy boundaries can still perform harmful actions:

  • Delete all user data in writable directories (allowed by filesystem policy)
  • Send sensitive data to allowed endpoints (allowed by network policy)
  • Execute a sequence of individually-safe actions that are collectively dangerous

Proposed Design

Implement a stateless LLM safety gate that evaluates every command before execution:

  1. Separate model: Use a small, dedicated safety model (e.g., 8B parameter) distinct from the main agent
  2. Zero conversation context: The safety model sees ONLY the proposed action, not the conversation that led to it. This prevents social engineering through context buildup
  3. Binary decision: ALLOW or DENY with reasoning
  4. Runs after pattern denylist: Deterministic checks first (zero-latency), LLM gate second (for novel patterns)
safety:
  pattern_denylist: enabled    # Layer 1: deterministic, zero-latency
  llm_gate:                    # Layer 2: catches what patterns miss
    model: nvidia/llama-3.1-nemotron-safety-guard-8b-v3
    context: none              # stateless — no conversation history
    action: deny_and_kill      # on denial: abort the agent

The key insight: a single LLM cannot reliably judge its own actions (self-enforcement fails under adversarial conditions). A separate, stateless model with no shared context provides independent verification.

References

Alternatives Considered

No response

Category

enhancement: feature

Checklist

  • I searched existing issues and this is not a duplicate
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

Labels

enhancement: featureUse this label to identify requests for new capabilities in NemoClaw.priority: highImportant issue that should be resolved in the next releasesecuritySomething isn't securestatus: wont-fixThis will not be worked on

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions