Skip to content

perf: restore realistic benchmark suite#1223

Merged
mldangelo-oai merged 2 commits intomainfrom
mdangelo/codex/restore-realistic-benchmarks
May 3, 2026
Merged

perf: restore realistic benchmark suite#1223
mldangelo-oai merged 2 commits intomainfrom
mdangelo/codex/restore-realistic-benchmarks

Conversation

@mldangelo-oai
Copy link
Copy Markdown
Contributor

Summary

  • restore the PR benchmark workflow and report generator
  • replace helper-level scan probes with persona-oriented end-to-end workloads
  • condense docs/agents/performance-audit.md into a current maintainer guide

Validation

  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run --with pytest-benchmark pytest tests/benchmarks/test_scan_benchmarks.py tests/benchmarks/test_picklescan_benchmarks.py --benchmark-json=/tmp/modelaudit-benchmark-head.json -q
  • uv run python scripts/benchmark_report.py --current /tmp/modelaudit-benchmark-head.json --summary-file /tmp/modelaudit-benchmark-summary.md
  • uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1
  • npx prettier --check .github/workflows/perf.yml .github/workflows/README.md scripts/README.md docs/agents/performance-audit.md CHANGELOG.md

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

Workflow run and artifacts

Base branch does not include the benchmark suite yet; showing current results only.

Performance Benchmarks

Captured 12 benchmark results across 10 workloads.
Aggregate median across all benchmarks: 637.55ms.

Slowest benchmarks:

  • tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository at 254.56ms (mixed-model-repository, release-candidate, size=547.3 KiB, files=32)
  • tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot at 196.85ms (duplicate-heavy-registry, registry-snapshot, size=915.2 KiB, files=13)
  • tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake at 79.77ms (suspicious-pickle-intake, suspicious-intake, size=183.8 KiB, files=4)
Workload Benchmark Target Size Files Median Mean Rounds
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 254.56ms 268.32ms 3
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 196.85ms 196.66ms 3
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 79.77ms 93.68ms 3
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 37.72ms 37.82ms 3
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 34.30ms 34.27ms 3
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 18.13ms 18.66ms 6
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 15.12ms 15.16ms 6
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 413.1us 411.9us 6
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 363.1us 358.6us 6
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 109.6us 109.8us 6
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 106.1us 107.5us 6
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 106.0us 109.4us 6

@mldangelo-oai mldangelo-oai merged commit 9c36efb into main May 3, 2026
27 checks passed
@mldangelo-oai mldangelo-oai deleted the mdangelo/codex/restore-realistic-benchmarks branch May 3, 2026 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant