bench: Compare ETHDebug overhead across branches by djolertrk · Pull Request #13 · argotorg/solc-bench

djolertrk · 2026-06-11T13:25:49Z

No description provided.

djolertrk · 2026-06-11T13:26:02Z

djolertrk · 2026-06-11T13:27:45Z

Usage:

  uv run solc-bench run \
    --solc /path/to/develop/solc/solc \
    --benchmark-dir ./benchmark_data \
    --tags med \
    --iterations 5 \
    --ethdebug-overhead \
    --output-dir ./ethdebug-overhead-develop

  uv run solc-bench run \
    --solc /path/to/current/solc/solc \
    --benchmark-dir ./benchmark_data \
    --tags med \
    --iterations 5 \
    --ethdebug-overhead \
    --output-dir ./ethdebug-overhead-current

Then compare:

  uv run solc-bench compare \
    ./ethdebug-overhead-develop/bench-results.json \
    ./ethdebug-overhead-current/bench-results.json \
    --ethdebug-branches \
    --baseline-label develop \
    --target-label feature/internal-spec

The output:

Benchmark          Metric         feature/internal-spec ir-ethdebug  develop ir-ethdebug   Δ ir-ethdebug  feature/internal-spec ir  develop ir           Δ ir    feature/internal-spec overhead  develop overhead  Δ overhead
-----------------  -------------  ---------------------------------  --------------------  -------------  ------------------------  -------------------  ------  ------------------------------  ----------------  ----------
prb-math-4.1.1     cpu_time       7.8683s ± 0.2517s                  7.6545s ± 0.2175s     +2.79%         6.8656s ± 0.0663s         6.7416s ± 0.1290s    +1.84%  +14.61%                         +13.54%           +1.07 pp
                   creation_size  592,270 ± 0                        592,270 ± 0           0.0%           592,270 ± 0               592,270 ± 0          0.0%    0.0%                            0.0%              0.0 pp
                   peak_rss       2169072 ± 118595 MiB               2175808 ± 158756 MiB  -0.31%         1351712 ± 33354 MiB       1348928 ± 26619 MiB  +0.21%  +60.47%                         +61.3%            -0.83 pp
                   runtime_size   589,640 ± 0                        589,640 ± 0           0.0%           589,640 ± 0               589,640 ± 0          0.0%    0.0%                            0.0%              0.0 pp
                   wall_time      7.8968s ± 0.3190s                  7.6873s ± 0.2856s     +2.73%         6.9299s ± 0.0886s         6.7658s ± 0.1669s    +2.43%  +13.95%                         +13.62%           +0.33 pp
                   ethdebug_size  44.80 MiB ± 0.00 MiB               44.76 MiB ± 0.00 MiB  +0.08%         n/a                       n/a                  N/A     N/A                             N/A               N/A

forge-std-1.16.1   cpu_time       7.9549s ± 0.0795s                  7.9994s ± 0.0906s     -0.56%         6.8365s ± 0.0742s         6.9321s ± 0.0928s    -1.38%  +16.36%                         +15.4%            +0.96 pp
                   creation_size  725,280 ± 0                        725,280 ± 0           0.0%           725,280 ± 0               725,280 ± 0          0.0%    0.0%                            0.0%              0.0 pp
                   peak_rss       1568768 ± 36955 MiB                1612816 ± 63618 MiB   -2.73%         841536 ± 3475 MiB         825392 ± 75444 MiB   +1.96%  +86.42%                         +95.4%            -8.98 pp
                   runtime_size   698,851 ± 0                        698,851 ± 0           0.0%           698,851 ± 0               698,851 ± 0          0.0%    0.0%                            0.0%              0.0 pp
                   wall_time      7.9942s ± 0.1036s                  8.0431s ± 0.0981s     -0.61%         6.8578s ± 0.0758s         6.9591s ± 0.1023s    -1.46%  +16.57%                         +15.58%           +0.99 pp
                   ethdebug_size  44.94 MiB ± 0.00 MiB               44.88 MiB ± 0.00 MiB  +0.13%         n/a                       n/a                  N/A     N/A                             N/A               N/A

morpho-blue-1.0.0  cpu_time       11.1818s ± 0.2293s                 11.0610s ± 0.1896s    +1.09%         9.5450s ± 0.1405s         9.6678s ± 0.1083s    -1.27%  +17.15%                         +14.41%           +2.74 pp
                   creation_size  1,261,421 ± 0                      1,261,421 ± 0         0.0%           1,261,421 ± 0             1,261,421 ± 0        0.0%    0.0%                            0.0%              0.0 pp
                   peak_rss       1893616 ± 71660 MiB                1703536 ± 21941 MiB   +11.16%        957168 ± 28410 MiB        948432 ± 46672 MiB   +0.92%  +97.84%                         +79.62%           +18.22 pp
                   runtime_size   1,251,246 ± 0                      1,251,246 ± 0         0.0%           1,251,246 ± 0             1,251,246 ± 0        0.0%    0.0%                            0.0%              0.0 pp
                   wall_time      11.3639s ± 0.2788s                 11.1654s ± 0.2197s    +1.78%         9.5693s ± 0.1873s         9.7115s ± 0.1305s    -1.46%  +18.75%                         +14.97%           +3.78 pp
                   ethdebug_size  57.38 MiB ± 0.00 MiB               57.29 MiB ± 0.00 MiB  +0.15%         n/a                       n/a                  N/A     N/A                             N/A               N/A

solmate-6          cpu_time       4.7389s ± 0.0616s                  4.6239s ± 0.0854s     +2.49%         4.0220s ± 0.0828s         3.9918s ± 0.1613s    +0.76%  +17.83%                         +15.84%           +1.99 pp
                   creation_size  540,758 ± 0                        540,758 ± 0           0.0%           540,758 ± 0               540,758 ± 0          0.0%    0.0%                            0.0%              0.0 pp
                   peak_rss       985184 ± 29731 MiB                 968240 ± 3574 MiB     +1.75%         385376 ± 5713 MiB         372192 ± 1248 MiB    +3.54%  +155.64%                        +160.15%          -4.51 pp
                   runtime_size   523,577 ± 0                        523,577 ± 0           0.0%           523,577 ± 0               523,577 ± 0          0.0%    0.0%                            0.0%              0.0 pp
                   wall_time      4.7608s ± 0.0685s                  4.6550s ± 0.0867s     +2.27%         4.0436s ± 0.1090s         4.0118s ± 0.1698s    +0.79%  +17.74%                         +16.03%           +1.71 pp
                   ethdebug_size  30.92 MiB ± 0.00 MiB               30.89 MiB ± 0.00 MiB  +0.07%         n/a                       n/a                  N/A     N/A                             N/A               N/A

clonker · 2026-06-11T14:56:46Z

I am not a fan of introducing yet another two flags here. Basically what we want is four different result datasets on

develop without ethdebug
develop with ethdebug
feat without ethdebug
feat with ethdebug

The current approach conflates the concepts of which datasets and which pairwise comparisons over them to run. These should be orthogonal, IMO.

Ideally we have something like

solc-bench run --solc ./solc-develop --pipeline ir          -o dev-ir.json
solc-bench run --solc ./solc-develop --pipeline ir-ethdebug -o dev-ed.json
solc-bench run --solc ./solc-feat    --pipeline ir          -o feat-ir.json
solc-bench run --solc ./solc-feat    --pipeline ir-ethdebug -o feat-ed.json

to produce data and then compare whatever

solc-bench compare dev-ir.json dev-ed.json feat-ir.json feat-ed.json \
  --vs feat-ir dev-ir   \   # delta ir across branches
  --vs feat-ed dev-ed   \   # delta ir-ethdebug across branches
  --vs dev-ed  dev-ir   \   # overhead on develop
  --vs feat-ed feat-ir      # overhead on feat

I understand this is not (just) on you, still, this PR adds onto it.

ir-ethdebug is a regular pipeline now, so the run mode that bundled an unoptimized ir baseline and an ir-ethdebug run into one result file is redundant and conflates dataset selection with a special mode. Each run produces exactly the dataset its flags describe, and compare runs the pairwise comparisons requested via --vs: solc-bench run --solc ./solc --pipeline ir --no-optimize -o ir.json solc-bench run --solc ./solc --pipeline ir-ethdebug -o ed.json solc-bench compare ir.json ed.json --vs ed ir

djolertrk · 2026-06-12T12:56:04Z

I am not a fan of introducing yet another two flags here. Basically what we want is four different result datasets on

develop without ethdebug

develop with ethdebug

feat without ethdebug

feat with ethdebug

The current approach conflates the concepts of which datasets and which pairwise comparisons over them to run. These should be orthogonal, IMO.

Ideally we have something like
solc-bench run --solc ./solc-develop --pipeline ir          -o dev-ir.json
solc-bench run --solc ./solc-develop --pipeline ir-ethdebug -o dev-ed.json
solc-bench run --solc ./solc-feat    --pipeline ir          -o feat-ir.json
solc-bench run --solc ./solc-feat    --pipeline ir-ethdebug -o feat-ed.json
to produce data and then compare whatever
solc-bench compare dev-ir.json dev-ed.json feat-ir.json feat-ed.json \
  --vs feat-ir dev-ir   \   # delta ir across branches
  --vs feat-ed dev-ed   \   # delta ir-ethdebug across branches
  --vs dev-ed  dev-ir   \   # overhead on develop
  --vs feat-ed feat-ir      # overhead on feat
I understand this is not (just) on you, still, this PR adds onto it.

@clonker sure, I agree with this. Please check it now:

Comparison: feat-ir vs dev-ir

Benchmark          Metric         feat-ir              dev-ir               Δ%       winner
-----------------  -------------  -------------------  -------------------  -------  ------
prb-math-4.1.1     cpu_time       7.4697s ± 0.1706s    7.2882s ± 0.1161s    +2.49%   ~noise
                   creation_size  592,270 ± 0          592,270 ± 0          0.0%     tie
                   peak_rss       1283664 ± 18758 MiB  1300368 ± 89882 MiB  -1.28%   ~noise
                   runtime_size   589,640 ± 0          589,640 ± 0          0.0%     tie
                   wall_time      7.6108s ± 0.2776s    7.3393s ± 0.1567s    +3.7%    ~noise

forge-std-1.16.1   cpu_time       7.3679s ± 0.1564s    7.5212s ± 0.0777s    -2.04%   ~noise
                   creation_size  725,280 ± 0          725,280 ± 0          0.0%     tie
                   peak_rss       835728 ± 26586 MiB   757408 ± 40316 MiB   +10.34%  ~noise
                   runtime_size   698,851 ± 0          698,851 ± 0          0.0%     tie
                   wall_time      7.4271s ± 0.1986s    7.6105s ± 0.0980s    -2.41%   ~noise

morpho-blue-1.0.0  cpu_time       10.3606s ± 0.2164s   10.2282s ± 0.1335s   +1.29%   ~noise
                   creation_size  1,261,421 ± 0        1,261,421 ± 0        0.0%     tie
                   peak_rss       925232 ± 11855 MiB   885600 ± 40382 MiB   +4.48%   ~noise
                   runtime_size   1,251,246 ± 0        1,251,246 ± 0        0.0%     tie
                   wall_time      10.5072s ± 0.6548s   10.3461s ± 0.1563s   +1.56%   ~noise

solmate-6          cpu_time       4.2158s ± 0.1264s    4.1597s ± 0.0604s    +1.35%   ~noise
                   creation_size  540,758 ± 0          540,758 ± 0          0.0%     tie
                   peak_rss       387104 ± 4444 MiB    373008 ± 1400 MiB    +3.78%   dev-ir
                   runtime_size   523,577 ± 0          523,577 ± 0          0.0%     tie
                   wall_time      4.2523s ± 0.1784s    4.1910s ± 0.0587s    +1.46%   ~noise

Comparison: feat-ed vs dev-ed

Benchmark          Metric         feat-ed               dev-ed                Δ%      winner
-----------------  -------------  --------------------  --------------------  ------  ------
prb-math-4.1.1     cpu_time       8.6285s ± 0.1877s     8.3880s ± 0.2046s     +2.87%  ~noise
                   creation_size  592,270 ± 0           592,270 ± 0           0.0%    tie
                   ethdebug_size  44.80 MiB ± 0.00 MiB  44.76 MiB ± 0.00 MiB  +0.08%  ~noise
                   peak_rss       1984880 ± 121188 MiB  1867712 ± 87870 MiB   +6.27%  ~noise
                   runtime_size   589,640 ± 0           589,640 ± 0           0.0%    tie
                   wall_time      8.8527s ± 0.2882s     8.4716s ± 0.2892s     +4.5%   ~noise

forge-std-1.16.1   cpu_time       8.7960s ± 0.4597s     8.3687s ± 0.0730s     +5.11%  ~noise
                   creation_size  725,280 ± 0           725,280 ± 0           0.0%    tie
                   ethdebug_size  44.94 MiB ± 0.00 MiB  44.88 MiB ± 0.00 MiB  +0.13%  dev-ed
                   peak_rss       1609776 ± 81821 MiB   1571776 ± 121598 MiB  +2.42%  ~noise
                   runtime_size   698,851 ± 0           698,851 ± 0           0.0%    tie
                   wall_time      8.9539s ± 1.0165s     8.4514s ± 0.0777s     +5.95%  ~noise

morpho-blue-1.0.0  cpu_time       11.8989s ± 0.3171s    11.5593s ± 0.1716s    +2.94%  ~noise
                   creation_size  1,261,421 ± 0         1,261,421 ± 0         0.0%    tie
                   ethdebug_size  57.38 MiB ± 0.00 MiB  57.29 MiB ± 0.00 MiB  +0.15%  dev-ed
                   peak_rss       1853424 ± 87738 MiB   1832544 ± 49277 MiB   +1.14%  ~noise
                   runtime_size   1,251,246 ± 0         1,251,246 ± 0         0.0%    tie
                   wall_time      12.0327s ± 0.3882s    11.7122s ± 0.2232s    +2.74%  ~noise

solmate-6          cpu_time       4.8951s ± 0.0925s     4.9858s ± 0.0576s     -1.82%  ~noise
                   creation_size  540,758 ± 0           540,758 ± 0           0.0%    tie
                   ethdebug_size  30.92 MiB ± 0.00 MiB  30.89 MiB ± 0.00 MiB  +0.07%  ~noise
                   peak_rss       988512 ± 30735 MiB    971360 ± 2383 MiB     +1.77%  ~noise
                   runtime_size   523,577 ± 0           523,577 ± 0           0.0%    tie
                   wall_time      4.9308s ± 0.1116s     5.0301s ± 0.0607s     -1.98%  ~noise

Comparison: dev-ed vs dev-ir

Benchmark          Metric         dev-ed                dev-ir               Δ%        winner
-----------------  -------------  --------------------  -------------------  --------  ------
prb-math-4.1.1     cpu_time       8.3880s ± 0.2046s     7.2882s ± 0.1161s    +15.09%   dev-ir
                   creation_size  592,270 ± 0           592,270 ± 0          0.0%      tie
                   peak_rss       1867712 ± 87870 MiB   1300368 ± 89882 MiB  +43.63%   dev-ir
                   runtime_size   589,640 ± 0           589,640 ± 0          0.0%      tie
                   wall_time      8.4716s ± 0.2892s     7.3393s ± 0.1567s    +15.43%   dev-ir
                   ethdebug_size  44.76 MiB ± 0.00 MiB  n/a                  N/A       n/a

forge-std-1.16.1   cpu_time       8.3687s ± 0.0730s     7.5212s ± 0.0777s    +11.27%   dev-ir
                   creation_size  725,280 ± 0           725,280 ± 0          0.0%      tie
                   peak_rss       1571776 ± 121598 MiB  757408 ± 40316 MiB   +107.52%  dev-ir
                   runtime_size   698,851 ± 0           698,851 ± 0          0.0%      tie
                   wall_time      8.4514s ± 0.0777s     7.6105s ± 0.0980s    +11.05%   dev-ir
                   ethdebug_size  44.88 MiB ± 0.00 MiB  n/a                  N/A       n/a

morpho-blue-1.0.0  cpu_time       11.5593s ± 0.1716s    10.2282s ± 0.1335s   +13.01%   dev-ir
                   creation_size  1,261,421 ± 0         1,261,421 ± 0        0.0%      tie
                   peak_rss       1832544 ± 49277 MiB   885600 ± 40382 MiB   +106.93%  dev-ir
                   runtime_size   1,251,246 ± 0         1,251,246 ± 0        0.0%      tie
                   wall_time      11.7122s ± 0.2232s    10.3461s ± 0.1563s   +13.2%    dev-ir
                   ethdebug_size  57.29 MiB ± 0.00 MiB  n/a                  N/A       n/a

solmate-6          cpu_time       4.9858s ± 0.0576s     4.1597s ± 0.0604s    +19.86%   dev-ir
                   creation_size  540,758 ± 0           540,758 ± 0          0.0%      tie
                   peak_rss       971360 ± 2383 MiB     373008 ± 1400 MiB    +160.41%  dev-ir
                   runtime_size   523,577 ± 0           523,577 ± 0          0.0%      tie
                   wall_time      5.0301s ± 0.0607s     4.1910s ± 0.0587s    +20.02%   dev-ir
                   ethdebug_size  30.89 MiB ± 0.00 MiB  n/a                  N/A       n/a

Comparison: feat-ed vs feat-ir

Benchmark          Metric         feat-ed               feat-ir              Δ%        winner
-----------------  -------------  --------------------  -------------------  --------  -------
prb-math-4.1.1     cpu_time       8.6285s ± 0.1877s     7.4697s ± 0.1706s    +15.51%   feat-ir
                   creation_size  592,270 ± 0           592,270 ± 0          0.0%      tie
                   peak_rss       1984880 ± 121188 MiB  1283664 ± 18758 MiB  +54.63%   feat-ir
                   runtime_size   589,640 ± 0           589,640 ± 0          0.0%      tie
                   wall_time      8.8527s ± 0.2882s     7.6108s ± 0.2776s    +16.32%   feat-ir
                   ethdebug_size  44.80 MiB ± 0.00 MiB  n/a                  N/A       n/a

forge-std-1.16.1   cpu_time       8.7960s ± 0.4597s     7.3679s ± 0.1564s    +19.38%   feat-ir
                   creation_size  725,280 ± 0           725,280 ± 0          0.0%      tie
                   peak_rss       1609776 ± 81821 MiB   835728 ± 26586 MiB   +92.62%   feat-ir
                   runtime_size   698,851 ± 0           698,851 ± 0          0.0%      tie
                   wall_time      8.9539s ± 1.0165s     7.4271s ± 0.1986s    +20.56%   ~noise
                   ethdebug_size  44.94 MiB ± 0.00 MiB  n/a                  N/A       n/a

morpho-blue-1.0.0  cpu_time       11.8989s ± 0.3171s    10.3606s ± 0.2164s   +14.85%   feat-ir
                   creation_size  1,261,421 ± 0         1,261,421 ± 0        0.0%      tie
                   peak_rss       1853424 ± 87738 MiB   925232 ± 11855 MiB   +100.32%  feat-ir
                   runtime_size   1,251,246 ± 0         1,251,246 ± 0        0.0%      tie
                   wall_time      12.0327s ± 0.3882s    10.5072s ± 0.6548s   +14.52%   ~noise
                   ethdebug_size  57.38 MiB ± 0.00 MiB  n/a                  N/A       n/a

solmate-6          cpu_time       4.8951s ± 0.0925s     4.2158s ± 0.1264s    +16.11%   feat-ir
                   creation_size  540,758 ± 0           540,758 ± 0          0.0%      tie
                   peak_rss       988512 ± 30735 MiB    387104 ± 4444 MiB    +155.36%  feat-ir
                   runtime_size   523,577 ± 0           523,577 ± 0          0.0%      tie
                   wall_time      4.9308s ± 0.1116s     4.2523s ± 0.1784s    +15.95%   feat-ir
                   ethdebug_size  30.92 MiB ± 0.00 MiB  n/a                  N/A       n/a

and I run (for different cases) it as:

solc-bench run \
    --solc /path/to/solidity/build_develop/solc/solc \
    --benchmark-dir ./benchmark_data \
    --pipeline ir \
    --no-optimize \
    --tags med \
    --iterations 5 \
    -o dev-ir.json

clonker

Thanks! A few comments regarding things I noticed when doing a quick pass. I'll want to do a more thorough one later.

clonker · 2026-06-14T12:19:01Z

+            if pipeline == "ir-ethdebug":
+                # ETHDebug program output does not support the optimizer yet,
+                # so this pipeline always compiles unoptimized IR.
+                runs.append(
+                    (
+                        pipeline,
+                        resolve_solc_settings("ir", True, ethdebug=True),
+                        True,
+                    )
+                )


this'll silently change behavior once it does support it, doesn't it? wouldn't it be better to just throw if ethdebug is requested but nooptimize is False?

yeah, makes sense to raise if requested without no_optimize

clonker · 2026-06-14T12:20:33Z

            self.results, self.solc_version, self.iterations
        )
-        result_path = self.output_dir / "bench-results.json"
+        result_path = self.output_file or self.output_dir / "bench-results.json"


is there anything that relies on the output file being named bench-results.json?

No :) introducing DEFAULT_RESULT_FILENAME, but also considering --output-file (if used)

clonker · 2026-06-14T12:22:01Z

+    if args.output_file and args.output_dir == ".":
+        output_dir = str(Path(args.output_file).parent)


doesn't this break or at least lead to inconsistent behavior if i request an output dir like ./subdir?

Yeah, good catch. Fixing it.

clonker · 2026-06-14T12:24:21Z

+
+
+def _result_label_and_path(spec):
+    if "=" in spec:


this seems brittle. what precisely is the spec? can we find a better variable name? what if the path or the label contain = themselves?

spec is a positional results arg, either PATH or LABEL=PATH, so I renamed it result_arg and added a doc spelling that out. The = ambiguity was a bug: a path that happened to contain = got silently mis-split into a bogus label + path. Now I only treat = as the separator when the arg isn't an existing file, so real paths with = load fine; a label still can't contain =, which I documented.

clonker · 2026-06-14T12:30:30Z

        raise FileExistsError(
            f"results file already exists: {result_path} "
-            "(remove it or choose a different --output-dir)"
+            "(remove it or choose a different output path)"


could we reference to the actual args here instead of removing --output-dir?

yes, thanks

clonker · 2026-06-14T12:32:28Z

+            "Compare two named datasets as TARGET vs REF. Repeat to compare "
+            "multiple pairs. Single-pipeline result files use their inferred "
+            "label; multi-pipeline files expose LABEL:PIPELINE datasets."


so now we have two ways of referencing datasets, via the filename, via label:pipeline. does label:pipeline still need a path qualifier?

No path needed, since the path is given once as the positional arg (optionally LABEL=PATH), and --vs just references the name it got. Reworded the help to say that.

clonker · 2026-06-14T12:33:44Z

        help="Compare two pipelines in one file: TARGET:REF (e.g. ir:evmasm)",
    )
+    cmp_parser.add_argument(
+        "--vs",


compare a.json b.json c.json --vs a b loads c.json and ignores it, there's no validation

Fixed, thanks

msooseth · 2026-06-17T12:27:04Z

Something seems off here? I added this Welch t-test with AI (my bad, sorry), but this:

                   peak_rss       835728 ± 26586 MiB   757408 ± 40316 MiB   +10.34%  ~noise

seems off? 10% change being "noise" seems wrong. Do you know what could be going on there?

msooseth · 2026-06-17T12:37:32Z

                   peak_rss       835728 ± 26586 MiB   757408 ± 40316 MiB   +10.34%  ~noise
seems off? 10% change being "noise" seems wrong. Do you know what could be going on there?

Thinking a bit more about this -- maybe we should print the MEDIAN not the AVERAGE, and then the difference should also be over the MEDIAN. I have a feeling we'd get better values? What's n by default right now? Would be best to make it an odd number, like 3 or 5, or 7.

djolertrk · 2026-06-18T12:15:32Z

                   peak_rss       835728 ± 26586 MiB   757408 ± 40316 MiB   +10.34%  ~noise
seems off? 10% change being "noise" seems wrong. Do you know what could be going on there?
Thinking a bit more about this -- maybe we should print the MEDIAN not the AVERAGE, and then the difference should also be over the MEDIAN. I have a feeling we'd get better values? What's n by default right now? Would be best to make it an odd number, like 3 or 5, or 7.

Hmmm, yeah, good catch. What was weird here is that the table was showing medians, but the Welch t-test was testing means. So the +10.34% number wasn’t actually the value used for the ~noise decision. For that sample, the means are 821085 vs 782778, so the mean delta is +4.89%, and Welch gives p ~= 0.12, so ~noise is reasonable there. n defaults to 3, and this run used 5. I changed the comparison output to show mean values/deltas so it matches the t-test, and I made the output say n=... so this is less confusing.

Comparison: feat-ir vs dev-ir

Benchmark          Metric         feat-ir              dev-ir               Δ%      winner
-----------------  -------------  -------------------  -------------------  ------  ------
prb-math-4.1.1     cpu_time       7.4721s ± 0.1706s    7.2767s ± 0.1161s    +2.69%  ~noise
                   creation_size  592,270 ± 0          592,270 ± 0          0.0%    tie
                   peak_rss       1277715 ± 18758 MiB  1271731 ± 89882 MiB  +0.47%  ~noise
                   runtime_size   589,640 ± 0          589,640 ± 0          0.0%    tie
                   wall_time      7.6527s ± 0.2776s    7.3529s ± 0.1567s    +4.08%  ~noise

forge-std-1.16.1   cpu_time       7.4001s ± 0.1564s    7.5066s ± 0.0777s    -1.42%  ~noise
                   creation_size  725,280 ± 0          725,280 ± 0          0.0%    tie
                   peak_rss       821085 ± 26586 MiB   782778 ± 40316 MiB   +4.89%  ~noise
                   runtime_size   698,851 ± 0          698,851 ± 0          0.0%    tie
                   wall_time      7.4746s ± 0.1986s    7.6138s ± 0.0980s    -1.83%  ~noise

morpho-blue-1.0.0  cpu_time       10.3839s ± 0.2164s   10.2912s ± 0.1335s   +0.9%   ~noise
                   creation_size  1,261,421 ± 0        1,261,421 ± 0        0.0%    tie
                   peak_rss       930016 ± 11855 MiB   890010 ± 40382 MiB   +4.5%   ~noise
                   runtime_size   1,251,246 ± 0        1,251,246 ± 0        0.0%    tie
                   wall_time      10.7231s ± 0.6548s   10.3930s ± 0.1563s   +3.18%  ~noise

solmate-6          cpu_time       4.2701s ± 0.1264s    4.1882s ± 0.0604s    +1.96%  ~noise
                   creation_size  540,758 ± 0          540,758 ± 0          0.0%    tie
                   peak_rss       385434 ± 4444 MiB    373802 ± 1400 MiB    +3.11%  dev-ir
                   runtime_size   523,577 ± 0          523,577 ± 0          0.0%    tie
                   wall_time      4.3283s ± 0.1784s    4.2186s ± 0.0587s    +2.6%   ~noise

Please check it.

bench: Compare ETHDebug overhead across branches

17d8e2a

djolertrk added 2 commits June 12, 2026 01:25

bench: Compare arbitrary result datasets

aaf6ee7

clonker reviewed Jun 14, 2026

View reviewed changes

djolertrk added 7 commits June 14, 2026 16:22

bench: Require --no-optimize for the ir-ethdebug pipeline

335cc11

bench: Name the offending flag when the results file exists

2d1f45b

bench: Co-locate logs with -o only when --output-dir is unset

57a55c1

bench: Centralize the default bench-results.json filename

5be762b

bench: Disambiguate LABEL=PATH compare specs from real paths

669fa60

bench: Clarify how --vs references named datasets

19a8bf3

bench: Reject unused result files in --vs compares

afd1125

djolertrk requested a review from clonker June 15, 2026 11:25

djolertrk added 2 commits June 17, 2026 16:02

bench: Judge significance by a df-aware p-value, not a fixed |t|

db36ffa

bench: Show comparison values as mean to match the t-test

a1cd5a9

bench: Normalize peak RSS units on macOS

1272e49

		if args.output_file and args.output_dir == ".":
		output_dir = str(Path(args.output_file).parent)

Conversation

djolertrk commented Jun 11, 2026

Uh oh!

djolertrk commented Jun 11, 2026

Uh oh!

djolertrk commented Jun 11, 2026

Uh oh!

clonker commented Jun 11, 2026

Uh oh!

djolertrk commented Jun 12, 2026

Uh oh!

clonker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msooseth commented Jun 17, 2026

Uh oh!

msooseth commented Jun 17, 2026

Uh oh!

djolertrk commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants