perf: restore realistic benchmark suite by mldangelo-oai · Pull Request #1223 · promptfoo/modelaudit

mldangelo-oai · 2026-05-03T15:08:40Z

Summary

restore the PR benchmark workflow and report generator
replace helper-level scan probes with persona-oriented end-to-end workloads
condense docs/agents/performance-audit.md into a current maintainer guide

Validation

PROMPTFOO_DISABLE_TELEMETRY=1 uv run --with pytest-benchmark pytest tests/benchmarks/test_scan_benchmarks.py tests/benchmarks/test_picklescan_benchmarks.py --benchmark-json=/tmp/modelaudit-benchmark-head.json -q
uv run python scripts/benchmark_report.py --current /tmp/modelaudit-benchmark-head.json --summary-file /tmp/modelaudit-benchmark-summary.md
uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1
npx prettier --check .github/workflows/perf.yml .github/workflows/README.md scripts/README.md docs/agents/performance-audit.md CHANGELOG.md

github-actions · 2026-05-03T15:09:36Z

Workflow run and artifacts

Base branch does not include the benchmark suite yet; showing current results only.

Performance Benchmarks

Captured 12 benchmark results across 10 workloads.
Aggregate median across all benchmarks: 637.55ms.

Slowest benchmarks:

tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository at 254.56ms (mixed-model-repository, release-candidate, size=547.3 KiB, files=32)
tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot at 196.85ms (duplicate-heavy-registry, registry-snapshot, size=915.2 KiB, files=13)
tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake at 79.77ms (suspicious-pickle-intake, suspicious-intake, size=183.8 KiB, files=4)

Workload	Benchmark	Target	Size	Files	Median	Mean	Rounds
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	254.56ms	268.32ms	3
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	196.85ms	196.66ms	3
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	79.77ms	93.68ms	3
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	37.72ms	37.82ms	3
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	34.30ms	34.27ms	3
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	18.13ms	18.66ms	6
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	15.12ms	15.16ms	6
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	413.1us	411.9us	6
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	363.1us	358.6us	6
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	109.6us	109.8us	6
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	106.1us	107.5us	6
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	106.0us	109.4us	6

perf: restore realistic benchmark suite

b307e08

perf: simplify benchmark labels

05588ec

mldangelo-oai merged commit 9c36efb into main May 3, 2026
27 checks passed

mldangelo-oai deleted the mdangelo/codex/restore-realistic-benchmarks branch May 3, 2026 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: restore realistic benchmark suite#1223

perf: restore realistic benchmark suite#1223
mldangelo-oai merged 2 commits intomainfrom
mdangelo/codex/restore-realistic-benchmarks

mldangelo-oai commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mldangelo-oai commented May 3, 2026

Summary

Validation

Uh oh!

github-actions Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 3, 2026 •

edited

Loading