GitHub - sdif-format/sdif-benchmarks: Reproducible benchmarks for SDIF size, token efficiency, latency and format comparison.

SDIF Benchmarks

Evidence-first benchmarks measuring SDIF against JSON, YAML, XML, CSV Bundle
and other formats from the perspective of AI and LLM developers.

Tracks · Quick start · Latest results · Corpus model · Result model · Environment

Every compared representation is derived from the same canonical JSON source. Claims must name the tokenizer and document coverage that produced them. Optional external tools degrade gracefully.

Benchmark tracks

Token efficiency Byte and token reduction across shared semantic fixtures. Ranks all formats against JSON Compact as the stable baseline.	Context packing How many document copies fit inside fixed token budgets (4K, 8K, 32K, 128K). Fit rate and median copies per budget.	Round-trip fidelity JSON→format→JSON preservation. Scores value, type and structure fidelity. N/A for SDIF AI and TOON.
Delta compactness Token overhead of re-sending a mutated document. Applies a deterministic mutation to the first 10% of leaf values.	Retrieval accuracy LLM answer quality by format. Deterministic validators — no LLM judge. Opt-in: requires `ANTHROPIC_API_KEY`.	Semantic quality Guards that SDIF preserves relations, rules, schema validation, canonicalization and reversible AI projection boundaries.
Semantic fidelity Structural recovery after format conversion. Separate axes for relations, rules, tables, and scalar fields. Unparsed formats report `not_measured`, not zero.	Operability Static capability matrix across 8 formats: canonical forms, stable hashing, native relation support, rule declaration vs. evaluation, semantic type vocabulary.

Quick start

This repository expects access to the core SDIF repository. By default it looks for it at ../sdif; override this with SDIF_CORE_REPO.

# Token reduction across formats
make benchmark-token

# Context-window fit rate by budget
make benchmark-packing

# JSON→format→JSON round-trip fidelity
make benchmark-roundtrip

# Mutation sensitivity (re-send overhead)
make benchmark-delta

# LLM retrieval accuracy by format — opt-in
SDIF_BENCHMARK_RETRIEVAL=1 ANTHROPIC_API_KEY=<key> make benchmark-retrieval

# Semantic quality checks
make benchmark-quality

# Structural recovery fidelity (semantic fidelity track)
make benchmark-semantic

# Format capability matrix (operability track)
make benchmark-operability

Alternatively, you can run them directly as Python modules or using the CLI command:

# Run the full suite using the CLI entry point
uv run sdif-benchmarks

# Run individual tracks as python modules
uv run python -m sdif_benchmarks.tracks.token_efficiency
uv run python -m sdif_benchmarks.tracks.context_packing
uv run python -m sdif_benchmarks.tracks.roundtrip_fidelity
uv run python -m sdif_benchmarks.tracks.delta_compactness
uv run python -m sdif_benchmarks.tracks.semantic_fidelity
uv run python -m sdif_benchmarks.tracks.operability
uv run python -m sdif_benchmarks.checks.check_semantic_quality

Latest results

Results from the most recent token efficiency run across 21 documents and 3 tokenizers (Estimate, TokenX, tiktoken).

Format	Consensus avg rank	Median ratio vs JSON Compact	Wins (63 pairs)
SDIF AI	1.10	56.8%	57
SDIF	2.60	59.5%	2
CSV Bundle	2.70	61.2%	4
TOON	3.60	63.2%	0
YAML	5.35	95.3%	0
JSON Compact	5.65	100.0%	0
JSON Pretty	7.00	137.3%	0
XML	8.00	171.7%	0

Tokenizer-specific winners:

Tokenizer	Winning format	Wins
Estimate	SDIF AI	19/21
TokenX	SDIF AI	20/21
tiktoken	SDIF AI	18/21

These results are corpus-dependent. Results for Claude and Llama3 tokenizers require separate opt-in. Full per-document breakdowns live in results/token_efficiency/.

Corpus model

The canonical semantic corpus lives in the core repo's examples/golden/ directory, not duplicated here. This avoids drift between parser fixtures and benchmark fixtures.

Each fixture contains:

../sdif/examples/golden/<fixture>/
├── equivalent.json     # canonical semantic source (benchmark input)
├── source.sdif         # hand-authored or generated SDIF source
├── canonical.sdif      # canonical SDIF form
└── canonical.sha256    # canonical hash evidence

The benchmark path defaults to ../sdif/examples/golden/ and can be overridden with SDIF_BENCHMARK_GOLDEN_DIR.

Result model

Each benchmark run writes scratch output to tmp/<track>/ while running and promotes it to results/<track>/ on success. Failed runs leave tmp/<track>/ for diagnosis without touching the last clean result.

results/<track>/
├── comparison.log       # console output
├── comparison.md        # per-document detail
├── summary.md           # key findings
├── summary.json         # machine-readable summary
├── summary.sdif         # SDIF encoding
├── summary.sdif.ai      # compact AI projection
├── dashboard.html       # self-contained HTML dashboard
└── corpus/              # exact format files measured
    └── <document>/
        ├── json_compact.json
        ├── json_pretty.json
        ├── yaml.yaml
        ├── xml.xml
        ├── csv_bundle.csv
        ├── sdif.sdif
        ├── sdif_ai.sdif.ai
        └── toon.toon    # when TOON is enabled

Environment

Common switches (all tracks):

SDIF_BENCHMARK_OUTPUT_DIR=/tmp/sdif-benchmarks   # redirect all output
SDIF_CORE_REPO=../sdif                            # path to core repo
SDIF_BENCHMARK_GOLDEN_DIR=/tmp/golden-fixtures    # use a custom corpus
SDIF_BENCHMARK_TOON=0                             # disable TOON comparison
SDIF_BENCHMARK_VERBOSE=1                          # print optional-tool diagnostics
SDIF_ENV_OVERRIDE=0                               # keep existing env vars; skip .env

Token efficiency additional switches:

SDIF_TIKTOKEN_ENCODING=cl100k_base    # tiktoken encoding (default)
SDIF_BENCHMARK_TOKENX=0               # disable TokenX estimation
SDIF_BENCHMARK_LLAMA=0                # disable Llama tokenizer
SDIF_BENCHMARK_CLAUDE=1               # enable Claude counting; needs ANTHROPIC_API_KEY

Retrieval accuracy:

SDIF_BENCHMARK_RETRIEVAL=1    # opt-in
ANTHROPIC_API_KEY=<key>       # required

All scripts load .env from the repository root when present, unless SDIF_ENV_OVERRIDE=0.

Project structure

sdif-benchmarks/
├── src/           # packaged source code, helpers, tracks, generators, checks
├── results/       # completed benchmark output (committed evidence)
└── tmp/           # in-progress output (gitignored)

Organization contract

Packaged modules (tracks, generators, checks) belong under src/sdif_benchmarks/.
Reusable helpers belong under src/sdif_benchmarks/ — e.g. formats.py, infra.py, report.py.
Each track writes scratch output to tmp/<track>/; completed evidence goes to results/<track>/.
Canonical semantic sources belong in the sdif core repo's examples/golden/, unless SDIF_BENCHMARK_GOLDEN_DIR overrides.
Optional external tools (TOON, tiktoken) must degrade gracefully.
Claims must name the tokenizer and model coverage that produced them.
Retrieval accuracy must use deterministic validators, not subjective LLM judging.

Ecosystem

This GitHub organization hosts the official SDIF ecosystem: the core format, reference tooling, benchmarks, examples, libraries, and editor extensions.

_{PYTHON CLIENT & CLI}

sdif-py

Specification, parser, canonicalizer, and CLI.
The normative reference implementation.

Explore sdif-py →

_{SPECIFICATION (SSOT)}

sdif-spec

Official format specification, canonicalization rules,
and portable conformance test suite.

View specification →

_BENCHMARKS

sdif-benchmarks

Reproducible benchmark datasets and reports comparing SDIF with JSON, YAML, XML, and CSV.

View benchmarks →

_{RUST IMPLEMENTATION}

sdif-rs

Pure Rust parser implementation with a span-annotated AST designed for editor tooling.

Explore sdif-rs →

_{LANGUAGE SERVER (LSP)}

sdif-lsp

LSP language server binary (tower-lsp) providing real-time diagnostics and IDE features.

View sdif-lsp →

_{EDITOR INTEGRATION}

vscode-sdif

VS Code extension client providing syntax highlighting, diagnostics, and LSP configuration.

Open extension →

_{GRAMMAR FOUNDATION}

tree-sitter-sdif

Tree-sitter grammar foundation for syntax highlighting and incremental parsing.

Open grammar →

_{DOCUMENTATION}

sdif-format.github.io

Official documentation website containing specification guides, tutorials, and examples.

Read docs →

_{ORGANIZATION META}

.github

Organization profile, assets, and shared community configuration files.

View profile →

Repository map

Repository	Purpose
`sdif-py`	Core Python parser, validator, canonicalizer, and CLI
`sdif-spec`	Official format specification and conformance test suite (SSOT)
`sdif-benchmarks`	Benchmark datasets, reports, and comparison tooling
`sdif-rs`	Rust parser crate with span-annotated AST
`sdif-lsp`	LSP language server binary
`tree-sitter-sdif`	Tree-sitter grammar foundation for syntax highlighting
`vscode-sdif`	VS Code extension client for SDIF
`sdif-format.github.io`	Public documentation website (Docusaurus)
`.github`	Organization profile, assets, and shared GitHub community files

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
results		results
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Makefile		Makefile
README.md		README.md
manifest.sdif		manifest.sdif
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmark tracks

Quick start

Latest results

Corpus model

Result model

Environment

Project structure

Organization contract

Ecosystem

sdif-py

sdif-spec

sdif-benchmarks

sdif-rs

sdif-lsp

vscode-sdif

tree-sitter-sdif

sdif-format.github.io

.github

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Benchmark tracks

Quick start

Latest results

Corpus model

Result model

Environment

Project structure

Organization contract

Ecosystem

sdif-py

sdif-spec

sdif-benchmarks

sdif-rs

sdif-lsp

vscode-sdif

tree-sitter-sdif

sdif-format.github.io

.github

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages