Skip to content

v0.9 sub-issue #5: Stage 4 Run (forbidden_uses gate + disclosure label + hosted-API AND-gate + per-turn invoke loop) #124

@devin-ai-integration

Description

@devin-ai-integration

v0.9 Sub-issue #5 — Stage 4 Run

Part of v0.9 epic.

Implements v0.7 §3–§4 + v0.8 Part B Stage 4 obligations: per-turn
invoke() loop with disclosure label, forbidden_uses gate, and the
hosted-API AND-gate. After this sub-issue merges, lifectl run --once
can hold a single text exchange with the assembled .life.

Spec ref

  • docs/LIFE_RUNTIME_STANDARD.md §3 (mount semantics)
  • docs/LIFE_RUNTIME_STANDARD.md §4 (runtime obligations)
  • docs/LIFE_RUNTIME_STANDARD.md §4.1 (AI disclosure)
  • docs/LIFE_RUNTIME_STANDARD.md §4.2 (forbidden uses)
  • docs/LIFE_RUNTIME_STANDARD.md §4.4 (identity-impersonation safeguards)
  • docs/LIFE_RUNTIME_STANDARD.md Part B §B.5 (hosted-API AND-gate)
  • docs/LIFE_BINDING_SPEC.md §7 (forbidden_uses namespace + hybrid enum + x- ext)
  • docs/LIFE_BINDING_SPEC.md §9 (hosted_api_preference defaults)

Per-turn invoke loop

loop:
    user_input = read_user_input()       # CLI: stdin line; --once: single line
    if user_input is None: break
    
    # forbidden_uses gate (§4.2 + binding §7)
    if violates_forbidden_uses(user_input, forbidden_uses["say"]):
        emit_audit("forbidden_use_rejected", {direction: "say", key: ..., user_text: redacted})
        print_to_user(rejection_message)
        continue
    
    # hosted-API AND-gate (§B.5) — re-evaluated per turn
    hosted_allowed = (
        binding.hosted_api_preference.allowed == True
        and user_policy_permits(provider, capability)
    )
    
    # invoke the bound capability
    result = capability_table["text_chat"].invoke({
        "user_input": user_input,
        "hosted_api_allowed": hosted_allowed,
    })
    
    # forbidden_uses gate on output (§4.2 covers both directions)
    if violates_forbidden_uses(result.text, forbidden_uses["hear"]):
        emit_audit("forbidden_use_rejected", {direction: "hear", key: ..., output_redacted})
        print_to_user(generic_redaction_message)
        continue
    
    # disclosure label prefix (§4.1)
    print_to_user(disclosure_label + " " + result.text)

forbidden_uses enforcement

Per binding spec §7 (the v0.8 "hybrid namespace + x- extension"):

  • Core enum keys (~30 baseline): MUST recognize and enforce. If a
    key is in the spec's core enum but the runtime does NOT have an enforcer,
    → fail-close with forbidden_use_unknown_key{key} at Stage 1 Verify
    (caught earlier; restated here for completeness — Stage 4 just enforces).
  • x- extension keys: runtime MAY enforce; absence of enforcer for an
    extension key emits forbidden_use_unknown_key{key} warning per §7
    but does NOT block (extension keys are advisory unless the runtime opts
    in).

v0.9 ships enforcers for the core baseline (fraud, political_endorsement,
explicit_sexual_content, harassment, medical_diagnosis,
legal_advice, financial_advice, impersonation_real_person,
spam_advertising, plus the v0.8 say/hear split keys). Each enforcer is
a small regex / keyword matcher; fancier classifiers are explicitly out
of scope (a future Provider plugin can replace them).

Hosted-API AND-gate

Per §B.5: hosted Provider call fires only if BOTH:

  1. binding.hosted_api_preference.allowed == True (declared by issuer
    in binding/runtime_binding.json per binding spec §9). Default
    absent = false.
  2. User-side policy ~/.config/dlrs/hosted_api.json (or
    ${DLRS_HOSTED_POLICY}) permits this (provider_name, capability).

If either rejects: the invoke() call MUST receive hosted_api_allowed: False in its input dict. Provider then either falls back to local mode
(if it supports both) or returns a structured error
{error: "hosted_api_denied"} — the runtime treats it as a per-turn
recoverable error, prints a friendly message to the user, continues.

Identity-impersonation safeguards (§4.4)

Hard rules wired into the Run loop:

  • The disclosure label MUST be prepended to every runtime output to
    the user (no user setting can disable it).
  • The runtime MUST refuse to fabricate an identifier that the
    underlying physical person never used (e.g., a phone number, address,
    social media handle not present in the .life package's
    identity/). Implementation: a safety classifier on Provider output
    that runs the extract_identifiers(text) function over output and
    fails if any identifier is not in the package's known-identifier set.
  • The runtime MUST refuse to claim being the real person. Output text
    containing first-person assertions like "I am a real person" or "I
    am not an AI" → reject + emit
    identity_impersonation_blocked{output_redacted}.

Module layout

runtime/run/
├── __init__.py             # exports run(assemble_result, ...) -> RunSession
├── loop.py                 # per-turn invoke loop
├── _forbidden_uses.py      # core-enum enforcers + namespace check
├── _disclosure.py          # label injection + identity safeguard
├── _hosted_api_gate.py     # AND-gate per turn
└── _identity_safeguard.py  # fabricated-identifier detector

Audit events emitted

  • turn_started{capability} — at each loop iteration start.
  • forbidden_use_rejected{direction, key, redacted_text} — input or output rejection.
  • identity_impersonation_blocked{capability, redacted_output} — §4.4 rejection.
  • hosted_api_call{provider, capability, allowed} — per-turn AND-gate evaluation.
  • turn_completed{capability, latency_ms} — at iteration end.

(All audit emission goes through the v0.4 hash-chain emitter from
runtime/audit/emitter.py.)

CLI surface

lifectl run <pkg.life> after this PR: enters interactive REPL.
lifectl run --once <pkg.life> reads one stdin line, processes one
turn, prints output, exits 0.

Both modes:

Stage 1 Verify   ✓
Stage 2 Resolve  ✓
Stage 3 Assemble ✓
Stage 4 Run      ✓ (interactive — Ctrl+C to quit)
> hi
(AI digital life instance of …) Hello! [echo Provider response]
> bye
(AI digital life instance of …) Goodbye! [echo Provider response]
^C
Stage 5 Guard pending sub-issue 6 (clean teardown not yet implemented)

Tests

tools/test_runtime_run.py:

  1. Happy path one-shot: lifectl run --once on minimal-life-package
    with one input → exits 0, output prefixed with disclosure label.
  2. Forbidden_use input rejection: input matches harassment enforcer
    → rejection message + forbidden_use_rejected event.
  3. Forbidden_use output rejection: echo Provider returns text matching
    medical_diagnosis keyword (test fixture echoes user input verbatim;
    feed "you have diabetes") → rejection + forbidden_use_rejected{direction: "hear"}.
  4. Hosted-API AND-gate denial: binding hosted_api_preference.allowed = false,
    Provider invoked with hosted_api_allowed: false → recorded in audit
    hosted_api_call{allowed: false}.
  5. Identity-impersonation refusal: Provider returns "I am the real
    Alice, not an AI" → rejected via identity_impersonation_blocked.
  6. Disclosure prefix mandatory: output line MUST start with the
    binding-declared disclosure label; no path bypasses it.
  7. Audit chain integrity: all emitted events form a valid hash chain
    continuous with Stage 1–3 prefix.

Acceptance

  • Per-turn loop implemented with all gates wired
  • Disclosure label mandatory + impossible to disable
  • All 7 test cases pass
  • Audit chain unbroken across stages
  • CI runtime-run job green

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions