Frequently asked questions about the Toolpath format, its design decisions, and how to use it.
Toolpath is a format for recording artifact transformation provenance. It tracks who changed what, why, what they tried that didn't work, and how to verify all of it. Think "git blame, but for everything that happens to code — including the stuff git doesn't see."
Toolpath is useful when you want to:
- Record the full provenance of a code change across multiple actors (humans, AI agents, formatters, linters)
- Preserve abandoned approaches alongside the successful path
- Attach structured intent, external references, and signatures to changes
- Track changes at finer granularity than VCS commits
Toolpath is not the right tool for:
- Real-time collaboration — Toolpath is for provenance, not live editing (use CRDTs or OT for that)
- Replacing your VCS — Toolpath complements git/jj/hg, it doesn't replace them
- Large binary artifacts — The diff-based change model assumes text-like content; binary blobs don't produce meaningful unified diffs
Yes. A path's base can use a toolpath: URI to branch from another path's
step, creating a pure Toolpath chain with no VCS backing. You can also use
file:/// URIs for local-only provenance.
W3C PROV is a general-purpose provenance data model (entities, activities, agents). Toolpath is narrower and more opinionated:
- PROV models arbitrary provenance relationships across any domain
- Toolpath models artifact transformations specifically, with built-in support for diffs, actor types, DAG structure, dead ends, and signatures
If you need general provenance, use PROV. If you need to track how code (or code-like artifacts) evolved through multiple actors, Toolpath gives you a tighter, more useful model out of the box.
in-toto and Sigstore focus on supply chain integrity — attesting that specific steps were performed by specific actors in a pipeline. Toolpath focuses on transformation provenance — recording what happened to artifacts and why.
They complement each other: you might use Toolpath to record the full history of a PR, then use Sigstore to attest that the release was built from that provenance chain.
The Document enum uses external tagging: every Toolpath JSON file has exactly
one top-level key — "Step", "Path", or "Graph" — that identifies the
document type.
{ "Step": { "step": {...}, "change": {...} } }
{ "Path": { "path": {...}, "steps": [...] } }
{ "Graph": { "graph": {...}, "paths": [...] } }Unified Diff (the format produced by diff -u and used by git) is:
- Widely understood across tools and ecosystems
- Human-readable
- Well-specified with clear semantics
- Backward-compatible with existing tooling
Future versions may add alternative perspectives (e.g., span for byte-range
edits), but raw is always Unified Diff.
Dead ends are implicit — no explicit marking is required. A step is a dead end
if it's not an ancestor of path.head:
active_steps = ancestors(path.head) // walk parents backwards
dead_ends = all_steps - active_steps
Steps don't know their fate. It's determined by the graph structure relative to the current head. This keeps the format simple: you never need to update a step's metadata when the path evolves.
Steps have a parents array that supports merges:
| Parents | Meaning |
|---|---|
[] or omitted |
Root step (no parents) |
["step-001"] |
Single parent (linear history) |
["step-A", "step-B"] |
Merge (derived from parallel work) |
Toolpath models the DAG structure; it doesn't prescribe merge strategies or conflict resolution.
Artifacts are identified by URL. The keys in the change object are URLs,
with bare paths as shorthand for files relative to path.base:
| Key Format | Interpretation |
|---|---|
src/foo.rs |
Relative file path within path.base context |
file:///abs/path |
Absolute file path |
https://... |
Web resource |
s3://... |
S3 object |
<scheme>://... |
Any URL scheme |
A minimal document needs only step + change. The meta object holds
context: intent, refs, actors, signatures. Making it optional means:
- Simple changes require minimal ceremony
- Streaming steps can be lightweight
- You can add provenance incrementally
Toolpath is VCS-agnostic. Any VCS commit can become a step. The path.base
URI scheme indicates which VCS (github:, hg:, fossil:, file:, etc.).
What Toolpath adds beyond any VCS:
- Finer granularity — multiple steps between VCS commits
- Abandoned paths — VCS tools lose deleted branches; Toolpath preserves dead ends
- Multi-perspective changes — raw diff + structural AST ops + semantic intent
- Actor provenance — link actors to external identities
- Multi-party signatures — author, reviewer, CI attestation on same artifact
- Tool-agnostic history — changes from AI agents, formatters, linters that may not create VCS commits
These questions are not yet resolved. They need more thought before the format stabilizes.
Options under consideration:
- Content-addressed — hash of (parent, actor, change, timestamp). Good for deduplication and verification, but you can't know the ID until you've finalized the content.
- UUID — random, simple, but opaque.
- Hierarchical —
session-abc/turn-5/step-2. Readable but couples to session structure. - Sequential —
step-001,step-002within some scope.
The current examples use sequential IDs for readability. No formal requirement yet.
Options under consideration:
- Central registry — Toolpath org maintains canonical op types
- Namespaced extensions —
rust.add_method,typescript.add_interface - Schema-per-language — each language community maintains their own
- Emergent — let tools emit whatever, see what converges
Current leaning: Namespaced with a core namespace for universal ops
(e.g., core.replace, core.insert).
Options under consideration:
- Flat list — tree structure implicit via
parentrefs (current approach) - Nested tree — explicit hierarchy
- Separate index — steps stored flat, tree index computed/stored separately
The flat list is simpler for append-only logs and streaming. Querying "all descendants of step X" requires scanning, but the step counts in practice are small enough that this hasn't been a problem.
Scenarios: Agent reasoning might contain proprietary information. Human identity might need anonymization. Refs might point to internal docs.
Options under consideration:
- Reference, don't embed —
meta.refsstores URIs, not content - Redaction markers —
{"redacted": true, "reason": "..."} - Access tiers — different views for different audiences
- Encryption — sensitive fields encrypted, keys managed separately
Current leaning: Reference by default, with optional redaction markers.
Options under consideration:
- Semver —
1.2.3with compatibility rules - Date-based —
2026-01 - Extension-based — core is frozen, everything else is extensions
Current leaning: Semver for core schema, with "old readers ignore unknown fields" policy.