SDIF Benchmarks
Evidence-first benchmarks measuring SDIF against JSON, YAML, XML, CSV Bundle
and other formats from the perspective of AI and LLM developers.
Tracks · Quick start · Latest results · Corpus model · Result model · Environment
Every compared representation is derived from the same canonical JSON source. Claims must name the tokenizer and document coverage that produced them. Optional external tools degrade gracefully.
|
Token efficiency
Byte and token reduction across shared semantic fixtures. Ranks all formats against JSON Compact as the stable baseline. |
Context packing
How many document copies fit inside fixed token budgets (4K, 8K, 32K, 128K). Fit rate and median copies per budget. |
Round-trip fidelity
JSON→format→JSON preservation. Scores value, type and structure fidelity. N/A for SDIF AI and TOON. |
|
Delta compactness
Token overhead of re-sending a mutated document. Applies a deterministic mutation to the first 10% of leaf values. |
Retrieval accuracy
LLM answer quality by format. Deterministic validators — no LLM judge. Opt-in: requires ANTHROPIC_API_KEY.
|
Semantic quality
Guards that SDIF preserves relations, rules, schema validation, canonicalization and reversible AI projection boundaries. |
|
Semantic fidelity
Structural recovery after format conversion. Separate axes for relations, rules, tables, and scalar fields. Unparsed formats report not_measured, not zero.
|
Operability
Static capability matrix across 8 formats: canonical forms, stable hashing, native relation support, rule declaration vs. evaluation, semantic type vocabulary. |
This repository expects access to the core SDIF repository. By default it looks for it at ../sdif; override this with SDIF_CORE_REPO.
# Token reduction across formats
make benchmark-token
# Context-window fit rate by budget
make benchmark-packing
# JSON→format→JSON round-trip fidelity
make benchmark-roundtrip
# Mutation sensitivity (re-send overhead)
make benchmark-delta
# LLM retrieval accuracy by format — opt-in
SDIF_BENCHMARK_RETRIEVAL=1 ANTHROPIC_API_KEY=<key> make benchmark-retrieval
# Semantic quality checks
make benchmark-quality
# Structural recovery fidelity (semantic fidelity track)
make benchmark-semantic
# Format capability matrix (operability track)
make benchmark-operabilityAlternatively, you can run them directly as Python modules or using the CLI command:
# Run the full suite using the CLI entry point
uv run sdif-benchmarks
# Run individual tracks as python modules
uv run python -m sdif_benchmarks.tracks.token_efficiency
uv run python -m sdif_benchmarks.tracks.context_packing
uv run python -m sdif_benchmarks.tracks.roundtrip_fidelity
uv run python -m sdif_benchmarks.tracks.delta_compactness
uv run python -m sdif_benchmarks.tracks.semantic_fidelity
uv run python -m sdif_benchmarks.tracks.operability
uv run python -m sdif_benchmarks.checks.check_semantic_qualityResults from the most recent token efficiency run across 21 documents and 3 tokenizers (Estimate, TokenX, tiktoken).
| Format | Consensus avg rank | Median ratio vs JSON Compact | Wins (63 pairs) |
|---|---|---|---|
| SDIF AI | 1.10 | 56.8% | 57 |
| SDIF | 2.60 | 59.5% | 2 |
| CSV Bundle | 2.70 | 61.2% | 4 |
| TOON | 3.60 | 63.2% | 0 |
| YAML | 5.35 | 95.3% | 0 |
| JSON Compact | 5.65 | 100.0% | 0 |
| JSON Pretty | 7.00 | 137.3% | 0 |
| XML | 8.00 | 171.7% | 0 |
Tokenizer-specific winners:
| Tokenizer | Winning format | Wins |
|---|---|---|
| Estimate | SDIF AI | 19/21 |
| TokenX | SDIF AI | 20/21 |
| tiktoken | SDIF AI | 18/21 |
These results are corpus-dependent. Results for Claude and Llama3 tokenizers require separate opt-in. Full per-document breakdowns live in results/token_efficiency/.
The canonical semantic corpus lives in the core repo's examples/golden/ directory, not duplicated here. This avoids drift between parser fixtures and benchmark fixtures.
Each fixture contains:
../sdif/examples/golden/<fixture>/
├── equivalent.json # canonical semantic source (benchmark input)
├── source.sdif # hand-authored or generated SDIF source
├── canonical.sdif # canonical SDIF form
└── canonical.sha256 # canonical hash evidence
The benchmark path defaults to ../sdif/examples/golden/ and can be overridden with SDIF_BENCHMARK_GOLDEN_DIR.
Each benchmark run writes scratch output to tmp/<track>/ while running and promotes it to results/<track>/ on success. Failed runs leave tmp/<track>/ for diagnosis without touching the last clean result.
results/<track>/
├── comparison.log # console output
├── comparison.md # per-document detail
├── summary.md # key findings
├── summary.json # machine-readable summary
├── summary.sdif # SDIF encoding
├── summary.sdif.ai # compact AI projection
├── dashboard.html # self-contained HTML dashboard
└── corpus/ # exact format files measured
└── <document>/
├── json_compact.json
├── json_pretty.json
├── yaml.yaml
├── xml.xml
├── csv_bundle.csv
├── sdif.sdif
├── sdif_ai.sdif.ai
└── toon.toon # when TOON is enabled
Common switches (all tracks):
SDIF_BENCHMARK_OUTPUT_DIR=/tmp/sdif-benchmarks # redirect all output
SDIF_CORE_REPO=../sdif # path to core repo
SDIF_BENCHMARK_GOLDEN_DIR=/tmp/golden-fixtures # use a custom corpus
SDIF_BENCHMARK_TOON=0 # disable TOON comparison
SDIF_BENCHMARK_VERBOSE=1 # print optional-tool diagnostics
SDIF_ENV_OVERRIDE=0 # keep existing env vars; skip .envToken efficiency additional switches:
SDIF_TIKTOKEN_ENCODING=cl100k_base # tiktoken encoding (default)
SDIF_BENCHMARK_TOKENX=0 # disable TokenX estimation
SDIF_BENCHMARK_LLAMA=0 # disable Llama tokenizer
SDIF_BENCHMARK_CLAUDE=1 # enable Claude counting; needs ANTHROPIC_API_KEYRetrieval accuracy:
SDIF_BENCHMARK_RETRIEVAL=1 # opt-in
ANTHROPIC_API_KEY=<key> # requiredAll scripts load .env from the repository root when present, unless SDIF_ENV_OVERRIDE=0.
sdif-benchmarks/
├── src/ # packaged source code, helpers, tracks, generators, checks
├── results/ # completed benchmark output (committed evidence)
└── tmp/ # in-progress output (gitignored)
- Packaged modules (tracks, generators, checks) belong under
src/sdif_benchmarks/. - Reusable helpers belong under
src/sdif_benchmarks/— e.g.formats.py,infra.py,report.py. - Each track writes scratch output to
tmp/<track>/; completed evidence goes toresults/<track>/. - Canonical semantic sources belong in the
sdifcore repo'sexamples/golden/, unlessSDIF_BENCHMARK_GOLDEN_DIRoverrides. - Optional external tools (TOON, tiktoken) must degrade gracefully.
- Claims must name the tokenizer and model coverage that produced them.
- Retrieval accuracy must use deterministic validators, not subjective LLM judging.
This GitHub organization hosts the official SDIF ecosystem: the core format, reference tooling, benchmarks, examples, libraries, and editor extensions.
|
PYTHON CLIENT & CLI
Specification, parser, canonicalizer, and CLI. |
SPECIFICATION (SSOT)
Official format specification, canonicalization rules, |
BENCHMARKS Reproducible benchmark datasets and reports comparing SDIF with JSON, YAML, XML, and CSV. |
|
RUST IMPLEMENTATION Pure Rust parser implementation with a span-annotated AST designed for editor tooling. |
LANGUAGE SERVER (LSP) LSP language server binary (tower-lsp) providing real-time diagnostics and IDE features. |
EDITOR INTEGRATION VS Code extension client providing syntax highlighting, diagnostics, and LSP configuration. |
|
GRAMMAR FOUNDATION Tree-sitter grammar foundation for syntax highlighting and incremental parsing. |
DOCUMENTATION Official documentation website containing specification guides, tutorials, and examples. |
ORGANIZATION META Organization profile, assets, and shared community configuration files. |
Repository map
| Repository | Purpose |
|---|---|
sdif-py |
Core Python parser, validator, canonicalizer, and CLI |
sdif-spec |
Official format specification and conformance test suite (SSOT) |
sdif-benchmarks |
Benchmark datasets, reports, and comparison tooling |
sdif-rs |
Rust parser crate with span-annotated AST |
sdif-lsp |
LSP language server binary |
tree-sitter-sdif |
Tree-sitter grammar foundation for syntax highlighting |
vscode-sdif |
VS Code extension client for SDIF |
sdif-format.github.io |
Public documentation website (Docusaurus) |
.github |
Organization profile, assets, and shared GitHub community files |
MIT. See LICENSE.
