Skip to content

Support reversibility for image/audio redactions in the redaction map #151

@martsokha

Description

@martsokha

Context

RedactionMapping was just slimmed down to carry only entity_id and location. Previously it carried original: String and replacement: Option<String>, but those fields were structurally incapable of holding image/audio originals — only text values fit. For non-text modalities the fields stored degenerate placeholder strings (e.g. [REDACTED IMAGE]), so they weren't useful as either an audit trail or a reversibility ledger.

The simplification:

  • The audit trail intent is satisfied: AuditEntry.value: RedactionValue { original, replacement } already carries text values, and audit.entries records which entities were touched and by what policy.
  • The reversibility intent is deferred: we no longer pretend to support it for image/audio. The map is now a thin entity-to-location index.

Goal

Make image and audio redactions reversible by pairing the redaction map with a blob store keyed by content hash. Audit metadata stays compact in RedactionMap; original bytes live in a separate, access-controlled store and are addressed by reference.

Suggested shape

struct RedactionMapping {
    entity_id: Uuid,
    location: Location,
    original_ref: ContentRef,
    replacement_ref: Option<ContentRef>,
}

enum ContentRef {
    Inline(String),                                    // small text values
    Blob { content_hash: Sha256, modality: Modality, size_bytes: u64 },
    Empty,                                             // e.g. Remove output
}

The engine computes content hashes (and inline text values) at apply time. Whether/where to store the bytes is the caller's choice — the engine emits the references but doesn't own the blob storage.

Open questions

  • Storage interface. Introduce a BlobSink trait the engine takes as a dependency, or push the storage decision entirely to the consumer (engine just emits hashes; consumer correlates with their own store)?
  • When to extract originals. Today the codec applies redactions in place; extracting image regions or audio segments before mutation is an extra read step. Decide whether to make extraction unconditional or gated on a per-entity `reversible: bool` flag.
  • Replacement materialization for in-place ops. Blur / Pixelate / Block / Silence are in-place transforms; they don't produce a discrete "replacement" blob. Either (a) leave replacement_ref: None for these and only populate it for Replace { data } outputs, or (b) re-extract the redacted region after the fact (doubles IO).
  • Strategy reversibility. Strategy::is_reversible_for already returns false for image/audio. Even with blob storage, blur/pixelate are mathematically irreversible regardless of what's kept. Storing originals enables reversibility only for Replace { data } outputs and audit-evidence purposes for the rest.

Out of scope

  • The actual blob storage backend (S3/disk/...).
  • Cross-modality RedactedValue enum embedded directly in RedactionMapping — rejected in design discussion because of audit-log size implications.

Related

Follow-up to the RedactionMapping simplification in the same branch (feat/policy-precedence).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions