Skip to content

sdif-format/sdif-benchmarks

Repository files navigation

SDIF Benchmarks

SDIF Benchmarks

Evidence-first benchmarks measuring SDIF against JSON, YAML, XML, CSV Bundle
and other formats from the perspective of AI and LLM developers.

Tracks · Quick start · Latest results · Corpus model · Result model · Environment

Evidence first Shared canonical fixtures Deterministic


Every compared representation is derived from the same canonical JSON source. Claims must name the tokenizer and document coverage that produced them. Optional external tools degrade gracefully.



Benchmark tracks

Token efficiency

Byte and token reduction across shared semantic fixtures. Ranks all formats against JSON Compact as the stable baseline.
Context packing

How many document copies fit inside fixed token budgets (4K, 8K, 32K, 128K). Fit rate and median copies per budget.
Round-trip fidelity

JSON→format→JSON preservation. Scores value, type and structure fidelity. N/A for SDIF AI and TOON.
Delta compactness

Token overhead of re-sending a mutated document. Applies a deterministic mutation to the first 10% of leaf values.
Retrieval accuracy

LLM answer quality by format. Deterministic validators — no LLM judge. Opt-in: requires ANTHROPIC_API_KEY.
Semantic quality

Guards that SDIF preserves relations, rules, schema validation, canonicalization and reversible AI projection boundaries.
Semantic fidelity

Structural recovery after format conversion. Separate axes for relations, rules, tables, and scalar fields. Unparsed formats report not_measured, not zero.
Operability

Static capability matrix across 8 formats: canonical forms, stable hashing, native relation support, rule declaration vs. evaluation, semantic type vocabulary.


Quick start

This repository expects access to the core SDIF repository. By default it looks for it at ../sdif; override this with SDIF_CORE_REPO.

# Token reduction across formats
make benchmark-token

# Context-window fit rate by budget
make benchmark-packing

# JSON→format→JSON round-trip fidelity
make benchmark-roundtrip

# Mutation sensitivity (re-send overhead)
make benchmark-delta

# LLM retrieval accuracy by format — opt-in
SDIF_BENCHMARK_RETRIEVAL=1 ANTHROPIC_API_KEY=<key> make benchmark-retrieval

# Semantic quality checks
make benchmark-quality

# Structural recovery fidelity (semantic fidelity track)
make benchmark-semantic

# Format capability matrix (operability track)
make benchmark-operability

Alternatively, you can run them directly as Python modules or using the CLI command:

# Run the full suite using the CLI entry point
uv run sdif-benchmarks

# Run individual tracks as python modules
uv run python -m sdif_benchmarks.tracks.token_efficiency
uv run python -m sdif_benchmarks.tracks.context_packing
uv run python -m sdif_benchmarks.tracks.roundtrip_fidelity
uv run python -m sdif_benchmarks.tracks.delta_compactness
uv run python -m sdif_benchmarks.tracks.semantic_fidelity
uv run python -m sdif_benchmarks.tracks.operability
uv run python -m sdif_benchmarks.checks.check_semantic_quality


Latest results

Results from the most recent token efficiency run across 21 documents and 3 tokenizers (Estimate, TokenX, tiktoken).

Format Consensus avg rank Median ratio vs JSON Compact Wins (63 pairs)
SDIF AI 1.10 56.8% 57
SDIF 2.60 59.5% 2
CSV Bundle 2.70 61.2% 4
TOON 3.60 63.2% 0
YAML 5.35 95.3% 0
JSON Compact 5.65 100.0% 0
JSON Pretty 7.00 137.3% 0
XML 8.00 171.7% 0

Tokenizer-specific winners:

Tokenizer Winning format Wins
Estimate SDIF AI 19/21
TokenX SDIF AI 20/21
tiktoken SDIF AI 18/21

These results are corpus-dependent. Results for Claude and Llama3 tokenizers require separate opt-in. Full per-document breakdowns live in results/token_efficiency/.



Corpus model

The canonical semantic corpus lives in the core repo's examples/golden/ directory, not duplicated here. This avoids drift between parser fixtures and benchmark fixtures.

Each fixture contains:

../sdif/examples/golden/<fixture>/
├── equivalent.json     # canonical semantic source (benchmark input)
├── source.sdif         # hand-authored or generated SDIF source
├── canonical.sdif      # canonical SDIF form
└── canonical.sha256    # canonical hash evidence

The benchmark path defaults to ../sdif/examples/golden/ and can be overridden with SDIF_BENCHMARK_GOLDEN_DIR.



Result model

Each benchmark run writes scratch output to tmp/<track>/ while running and promotes it to results/<track>/ on success. Failed runs leave tmp/<track>/ for diagnosis without touching the last clean result.

results/<track>/
├── comparison.log       # console output
├── comparison.md        # per-document detail
├── summary.md           # key findings
├── summary.json         # machine-readable summary
├── summary.sdif         # SDIF encoding
├── summary.sdif.ai      # compact AI projection
├── dashboard.html       # self-contained HTML dashboard
└── corpus/              # exact format files measured
    └── <document>/
        ├── json_compact.json
        ├── json_pretty.json
        ├── yaml.yaml
        ├── xml.xml
        ├── csv_bundle.csv
        ├── sdif.sdif
        ├── sdif_ai.sdif.ai
        └── toon.toon    # when TOON is enabled


Environment

Common switches (all tracks):

SDIF_BENCHMARK_OUTPUT_DIR=/tmp/sdif-benchmarks   # redirect all output
SDIF_CORE_REPO=../sdif                            # path to core repo
SDIF_BENCHMARK_GOLDEN_DIR=/tmp/golden-fixtures    # use a custom corpus
SDIF_BENCHMARK_TOON=0                             # disable TOON comparison
SDIF_BENCHMARK_VERBOSE=1                          # print optional-tool diagnostics
SDIF_ENV_OVERRIDE=0                               # keep existing env vars; skip .env

Token efficiency additional switches:

SDIF_TIKTOKEN_ENCODING=cl100k_base    # tiktoken encoding (default)
SDIF_BENCHMARK_TOKENX=0               # disable TokenX estimation
SDIF_BENCHMARK_LLAMA=0                # disable Llama tokenizer
SDIF_BENCHMARK_CLAUDE=1               # enable Claude counting; needs ANTHROPIC_API_KEY

Retrieval accuracy:

SDIF_BENCHMARK_RETRIEVAL=1    # opt-in
ANTHROPIC_API_KEY=<key>       # required

All scripts load .env from the repository root when present, unless SDIF_ENV_OVERRIDE=0.



Project structure

sdif-benchmarks/
├── src/           # packaged source code, helpers, tracks, generators, checks
├── results/       # completed benchmark output (committed evidence)
└── tmp/           # in-progress output (gitignored)


Organization contract

  • Packaged modules (tracks, generators, checks) belong under src/sdif_benchmarks/.
  • Reusable helpers belong under src/sdif_benchmarks/ — e.g. formats.py, infra.py, report.py.
  • Each track writes scratch output to tmp/<track>/; completed evidence goes to results/<track>/.
  • Canonical semantic sources belong in the sdif core repo's examples/golden/, unless SDIF_BENCHMARK_GOLDEN_DIR overrides.
  • Optional external tools (TOON, tiktoken) must degrade gracefully.
  • Claims must name the tokenizer and model coverage that produced them.
  • Retrieval accuracy must use deterministic validators, not subjective LLM judging.


Ecosystem

This GitHub organization hosts the official SDIF ecosystem: the core format, reference tooling, benchmarks, examples, libraries, and editor extensions.

PYTHON CLIENT & CLI

sdif-py

Specification, parser, canonicalizer, and CLI.
The normative reference implementation.

Explore sdif-py →

SPECIFICATION (SSOT)

sdif-spec

Official format specification, canonicalization rules,
and portable conformance test suite.

View specification →

BENCHMARKS

sdif-benchmarks

Reproducible benchmark datasets and reports comparing SDIF with JSON, YAML, XML, and CSV.

View benchmarks →

RUST IMPLEMENTATION

sdif-rs

Pure Rust parser implementation with a span-annotated AST designed for editor tooling.

Explore sdif-rs →

LANGUAGE SERVER (LSP)

sdif-lsp

LSP language server binary (tower-lsp) providing real-time diagnostics and IDE features.

View sdif-lsp →

EDITOR INTEGRATION

vscode-sdif

VS Code extension client providing syntax highlighting, diagnostics, and LSP configuration.

Open extension →

GRAMMAR FOUNDATION

tree-sitter-sdif

Tree-sitter grammar foundation for syntax highlighting and incremental parsing.

Open grammar →

DOCUMENTATION

sdif-format.github.io

Official documentation website containing specification guides, tutorials, and examples.

Read docs →

ORGANIZATION META

.github

Organization profile, assets, and shared community configuration files.

View profile →


Repository map
Repository Purpose
sdif-py Core Python parser, validator, canonicalizer, and CLI
sdif-spec Official format specification and conformance test suite (SSOT)
sdif-benchmarks Benchmark datasets, reports, and comparison tooling
sdif-rs Rust parser crate with span-annotated AST
sdif-lsp LSP language server binary
tree-sitter-sdif Tree-sitter grammar foundation for syntax highlighting
vscode-sdif VS Code extension client for SDIF
sdif-format.github.io Public documentation website (Docusaurus)
.github Organization profile, assets, and shared GitHub community files

License

MIT. See LICENSE.

About

Reproducible benchmarks for SDIF size, token efficiency, latency and format comparison.

Topics

Resources

Stars

Watchers

Forks

Contributors