Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 87 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ the pipelines in each benchmark's TOML entry (or all if unspecified);
| `evmasm` | `"viaIR": false` — EVM assembly codegen |
| `ir` | `"viaIR": true` — IR-based codegen |
| `ir-ssacfg` | `"viaIR": true, "viaSSACFG": true` — SSA-CFG experimental codegen |
| `ir-ethdebug` | `"viaIR": true`, optimizer disabled — unoptimized IR codegen with ETHDebug outputs requested (see [ETHDebug overhead](#ethdebug-overhead)) |

## Metrics

Expand Down Expand Up @@ -142,7 +143,7 @@ solc-bench fetch develop --output ./solc --force

Benchmarks a suite, or a single `.sol`/`.json` `input_file` (which bypasses
the suite and needs no `--benchmark-dir`). Results land in
`bench-results.json` in `--output-dir`.
`bench-results.json` in `--output-dir`, unless `-o/--output-file` is used.

| Flag | Default | Description |
|------|---------|-------------|
Expand All @@ -152,8 +153,9 @@ the suite and needs no `--benchmark-dir`). Results land in
| `--tags TAGS` | (none) | Comma-separated tags, AND'd with `--only` |
| `--iterations N` | `3` | Number of iterations |
| `--output-dir DIR` | current dir | Where to write results + logs |
| `-o, --output-file FILE` | (none) | Write result JSON to a specific file |
| `--stdout` | off | Also print results to stdout |
| `--pipeline P` | (all) | Single pipeline: `evmasm`/`ir`/`ir-ssacfg` |
| `--pipeline P` | (all) | Single pipeline: `evmasm`/`ir`/`ir-ssacfg`/`ir-ethdebug` |
| `--no-optimize` | off | Disable the optimizer |

```bash
Expand All @@ -163,44 +165,102 @@ solc-bench run --solc ./solc contract.sol --pipeline ir # single file

### ETHDebug overhead

`--ethdebug-overhead` measures the extra compilation cost of producing
ETHDebug output with the same compiler. It runs every selected benchmark twice:
`ir` is the unoptimized IR baseline, and `ir-ethdebug` is the same unoptimized
IR compilation with `evm.bytecode.ethdebug`,
`ir-ethdebug` is a regular pipeline: the same unoptimized IR compilation as
`ir` with `--no-optimize`, plus `evm.bytecode.ethdebug`,
`evm.deployedBytecode.ethdebug`, `ethdebug.resources`, and
`ethdebug.compilation` requested. This mode intentionally disables the
optimizer because ETHDebug program output does not support optimization yet,
and skips gas benchmarks because it is intended to measure compilation cost.
The `ir-ethdebug` results also include `ethdebug_size`, the serialized byte
size of all requested ETHDebug artifacts. It is stored as bytes in the result
JSON and rendered as MiB in comparison tables.
`ethdebug.compilation` requested. It requires `--no-optimize` because
ETHDebug program output does not support optimization yet, and gas
benchmarks are skipped because the pipeline measures compilation cost. Its
results include `ethdebug_size`, the serialized byte size of all requested
ETHDebug artifacts, stored as bytes in the result JSON and rendered as MiB in
comparison tables.

Producing datasets and comparing them are orthogonal: each `run` produces one
result dataset, and `compare` runs whatever pairwise comparisons you ask for
with `--vs`. Use `--no-optimize` for the plain `ir` datasets so the baseline
matches the unoptimized IR that `ir-ethdebug` compiles.

ETHDebug overhead of a single compiler:

```bash
solc-bench run --solc ./solc --benchmark-dir ./benchmark_data \
--tags med --iterations 5 --pipeline ir --no-optimize -o ./ir.json
solc-bench run --solc ./solc --benchmark-dir ./benchmark_data \
--tags med --iterations 5 --pipeline ir-ethdebug --no-optimize -o ./ed.json

solc-bench compare ./ir.json ./ed.json --vs ed ir
solc-bench compare ./ir.json ./ed.json --vs ed ir --max-regression cpu_time:30
```

To review an ETHDebug PR against `develop`, produce four datasets and compare
the pairs you care about:

```bash
solc-bench run \
--solc ./solc-develop \
--benchmark-dir ./benchmark_data \
--pipeline ir \
--no-optimize \
--tags med \
--iterations 5 \
-o ./dev-ir.json

solc-bench run \
--solc ./solc-develop \
--benchmark-dir ./benchmark_data \
--pipeline ir-ethdebug \
--no-optimize \
--tags med \
--iterations 5 \
-o ./dev-ed.json

solc-bench run \
--solc ./solc \
--solc ./solc-current \
--benchmark-dir ./benchmark_data \
--pipeline ir \
--no-optimize \
--tags med \
--iterations 5 \
--ethdebug-overhead \
--output-dir ./ethdebug-overhead
-o ./feat-ir.json

solc-bench compare ./ethdebug-overhead/bench-results.json --pipelines ir-ethdebug:ir
solc-bench compare ./ethdebug-overhead/bench-results.json --pipelines ir-ethdebug:ir --max-regression cpu_time:30
solc-bench run \
--solc ./solc-current \
--benchmark-dir ./benchmark_data \
--pipeline ir-ethdebug \
--no-optimize \
--tags med \
--iterations 5 \
-o ./feat-ed.json

solc-bench compare \
./dev-ir.json ./dev-ed.json ./feat-ir.json ./feat-ed.json \
--vs feat-ir dev-ir \
--vs feat-ed dev-ed \
--vs dev-ed dev-ir \
--vs feat-ed feat-ir
```

### `solc-bench compare <baseline> [target]`
This reports `ir` across branches, `ir-ethdebug` across branches, ETHDebug
overhead on `develop`, and ETHDebug overhead on the feature branch.

### `solc-bench compare <results...>`

Compares two result files (cross-version), or two pipelines within one file
via `--pipelines TARGET:REF`. The output shows each metric's signed percent
delta; every metric is lower-is-better, so negative is an improvement. The
`winner` column names the better side, but shows `~noise` unless the gap
passes a Welch t-test and exceeds 0.10% (statistically real and large enough
to act on). `--per-function` adds a per-function gas delta table when both
files have gas data.
Compares two result files (cross-version), two pipelines within one file via
`--pipelines TARGET:REF`, or any number of named result datasets via repeated
`--vs TARGET REF` pairs. `--vs` references the datasets defined by the
positional files — a single-pipeline file by its label, a multi-pipeline file
by `LABEL:PIPELINE` — and takes no path itself. The output shows each metric's
signed percent delta; every metric is lower-is-better, so negative is an
improvement. The `winner`
column names the better side, but shows `~noise` unless the gap passes a
Welch t-test and exceeds 0.10% (statistically real and large enough to act
on). `--per-function` adds a per-function gas delta table when both files
have gas data.

| Flag | Default | Description |
|------|---------|-------------|
| `--pipelines TARGET:REF` | cross-version | Compare two pipelines in one file (e.g. `ir:evmasm`) |
| `--vs TARGET REF` | off | Compare two named datasets; repeatable |
| `--format table`/`json` | `table` | Output format |
| `--output FILE` | (none) | Write comparison JSON to file |
| `--per-function STAT` | `median` | Per-function gas deltas: `min`/`mean`/`median`/`max` |
Expand All @@ -211,6 +271,8 @@ files have gas data.
```bash
solc-bench compare baseline/bench-results.json target/bench-results.json --per-function
solc-bench compare bench-results.json --pipelines ir:evmasm --plot diff.png
solc-bench compare dev-ir.json feat-ir.json --vs feat-ir dev-ir
solc-bench compare dev=dev/bench-results.json feat=feat/bench-results.json --vs feat:ir dev:ir
```

### `solc-bench extract`
Expand Down
75 changes: 51 additions & 24 deletions src/solc_bench/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from solc_bench.config import (
DEFAULT_PIPELINES,
DEFAULT_RESULT_FILENAME,
load_benchmarks,
)
from solc_bench.gas import ensure_project, run_gas_benchmark
Expand Down Expand Up @@ -35,6 +36,16 @@ def perf_available():
return False


def _ru_maxrss_mib(ru_maxrss):
"""Normalize resource.ru_maxrss to MiB.

Linux reports ru_maxrss in KiB, while macOS reports it in bytes.
"""
if sys.platform == "darwin":
return ru_maxrss / (1024 * 1024)
return ru_maxrss / 1024


class Benchmark:
"""Runs solc and collects all metrics."""

Expand Down Expand Up @@ -101,7 +112,7 @@ def invoke_solc(self, input_file):
metrics = {
"cpu_time": rusage.ru_utime + rusage.ru_stime,
"wall_time": wall_time,
"peak_rss": rusage.ru_maxrss / 1024, # KiB -> MiB
"peak_rss": _ru_maxrss_mib(rusage.ru_maxrss),
"exit_code": proc.returncode,
}

Expand All @@ -114,11 +125,19 @@ def invoke_solc(self, input_file):
class BenchmarkSuite:
"""Orchestrates benchmarks across pipelines and inputs."""

def __init__(self, solc, iterations, output_dir, keep_inputs=False):
def __init__(
self,
solc,
iterations,
output_dir,
keep_inputs=False,
output_file=None,
):
self.solc_version = get_solc_version(solc)
self.benchmark = Benchmark(solc)
self.output_dir = Path(output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True)
self.output_file = Path(output_file) if output_file else None
self.iterations = iterations
self.keep_inputs = keep_inputs
self.results = {}
Expand Down Expand Up @@ -186,7 +205,7 @@ def _write_error_log(self, result, name, pipeline):
log_path.write_text("\n".join(error_messages), encoding="utf-8")
return str(log_path)

def run_file(self, input_file, pipeline, no_optimize, ethdebug_overhead=False):
def run_file(self, input_file, pipeline, no_optimize):
"""Run benchmark on a single .sol or .json input file.

pipeline is a pipeline name (str) or None for all pipelines.
Expand All @@ -195,7 +214,6 @@ def run_file(self, input_file, pipeline, no_optimize, ethdebug_overhead=False):
pipeline_runs = self._pipeline_runs(
[pipeline] if pipeline else DEFAULT_PIPELINES,
no_optimize,
ethdebug_overhead,
)

for label, solc_settings, ethdebug in pipeline_runs:
Expand All @@ -214,7 +232,6 @@ def run_suite(
pipeline,
no_optimize,
tags=None,
ethdebug_overhead=False,
):
"""Run configured benchmarks from benchmarks.toml.

Expand Down Expand Up @@ -253,7 +270,7 @@ def run_suite(
pipelines = config.get("pipelines", DEFAULT_PIPELINES)

gas_project_dir = None
if config.get("gas") and not ethdebug_overhead:
if config.get("gas"):
try:
gas_project_dir = ensure_project(
benchmark_dir,
Expand All @@ -270,15 +287,18 @@ def run_suite(
for label, solc_settings, ethdebug in self._pipeline_runs(
pipelines,
no_optimize,
ethdebug_overhead,
):
with override_json_settings(
input_file,
solc_settings,
ethdebug,
) as tmp_file:
self.run_pipeline(
tmp_file, name, label, solc_settings, gas_project_dir
tmp_file,
name,
label,
solc_settings,
None if ethdebug else gas_project_dir,
)

if (selected or tag_set) and not matched_any:
Expand All @@ -288,21 +308,28 @@ def run_suite(
)

@staticmethod
def _pipeline_runs(pipelines, no_optimize, ethdebug_overhead=False):
if not ethdebug_overhead:
return [
(p, resolve_solc_settings(p, no_optimize), False)
for p in pipelines
]

return [
("ir", resolve_solc_settings("ir", True), False),
(
"ir-ethdebug",
resolve_solc_settings("ir", True, ethdebug=True),
True,
),
]
def _pipeline_runs(pipelines, no_optimize):
runs = []
for pipeline in pipelines:
if pipeline == "ir-ethdebug":
# ETHDebug program output does not support the optimizer yet;
# resolve_solc_settings requires --no-optimize for this pipeline.
runs.append(
(
pipeline,
resolve_solc_settings("ir", no_optimize, ethdebug=True),
True,
)
)
Comment on lines +314 to +323

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this'll silently change behavior once it does support it, doesn't it? wouldn't it be better to just throw if ethdebug is requested but nooptimize is False?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, makes sense to raise if requested without no_optimize

else:
runs.append(
(
pipeline,
resolve_solc_settings(pipeline, no_optimize),
False,
)
)
return runs

def write_results(self, stdout=False):
"""Write results JSON to output dir, optionally also to stdout."""
Expand All @@ -313,7 +340,7 @@ def write_results(self, stdout=False):
output = reporter.build_result_json(
self.results, self.solc_version, self.iterations
)
result_path = self.output_dir / "bench-results.json"
result_path = self.output_file or self.output_dir / DEFAULT_RESULT_FILENAME
reporter.write_result_json(output, result_path, stdout=stdout)


Expand Down
Loading