Skip to content

test(workflows): cover performance benchmark workflow#1226

Open
mldangelo-oai wants to merge 1 commit intomainfrom
test/perf-workflow-guardrails
Open

test(workflows): cover performance benchmark workflow#1226
mldangelo-oai wants to merge 1 commit intomainfrom
test/perf-workflow-guardrails

Conversation

@mldangelo-oai
Copy link
Copy Markdown
Contributor

Summary

  • add YAML parser coverage for the performance benchmark workflow
  • assert base/head worktree comparison, advisory regression reporting, base-missing fallback, and same-repo PR comment behavior
  • allow the workflow guardrail test in reduced Python lanes

Validation

  • UV_CACHE_DIR=/tmp/modelaudit-uv-cache PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_perf_workflow.py -q
  • UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • UV_CACHE_DIR=/tmp/modelaudit-uv-cache uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/

Notes

  • Full non-slow pytest was attempted twice under xdist. Both runs stopped on unrelated existing/local timing or state failures that passed in isolation: tests/test_debug_command.py::TestDebugCommand::test_debug_command_is_fast and packages/modelaudit-picklescan/tests/test_api.py::test_scan_file_leaves_hidden_pickle_like_zip_without_pytorch_metadata_unrecognized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 621.33ms -> 628.02ms (+1.1%).

Workload Benchmark Target Size Files Baseline Current Change Status
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 111.6us 105.8us -5.2% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 102.4us 105.3us +2.8% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 250.48ms 257.19ms +2.7% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 100.3us 102.6us +2.3% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 17.40ms 17.76ms +2.1% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 73.81ms 73.03ms -1.0% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 37.32ms 37.53ms +0.6% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 347.6us 349.6us +0.6% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 33.42ms 33.29ms -0.4% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 14.46ms 14.52ms +0.4% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 418.2us 417.0us -0.3% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 193.35ms 193.62ms +0.1% stable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant