Skip to content
99 changes: 99 additions & 0 deletions scripts/e2e_eval/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,105 @@ uv run python scripts/e2e_eval/run_eval.py --retry-failed
| `--verbose` | off | Print stderr for failed models |
| `--continue` | off | Skip models with existing results |
| `--retry-failed [TYPE ...]` | — | Re-run failed models (implies `--continue`) |
| `--build-only` | off | Build with `--no-compile`, writing each stage's ONNX (no EP needed). Loops the EP matrix when `--ep`/`--device` omitted |

#### `--build-only` — Generate per-stage models (no EP required)

`--build-only` runs config + build with `--no-compile`, writing each stage's ONNX —
`export.onnx`, `optimized.onnx`, `quantized.onnx`. Because compile is skipped, this
needs **no execution-provider hardware** and runs on any CPU machine. Perf and accuracy
phases are skipped.

When `--ep`/`--device` are **omitted**, every model is built once per EP in the
build-only matrix, each into a `<ep>_<device>/` subdir:

| Label | EP | Device |
|---|---|---|
| `qnn_npu` | qnn | npu |
| `qnn_gpu` | qnn | gpu |
| `ov_cpu` | openvino | cpu |
| `ov_npu` | openvino | npu |
| `ov_gpu` | openvino | gpu |
| `mlas_cpu` | cpu (MLAS) | cpu |
| `dml_gpu` | dml | gpu |
| `vitisai_npu` | vitisai | npu |

Precision per combo follows the eval policy: NPU defaults to `w8a16`, CPU/GPU omit the
flag (winml auto), and native-quant EPs (VitisAI) are built unquantized (`--no-quant`).
When `--ep` or `--device` is pinned, a single build is written directly into
`<output-dir>/models/<slug>/`.

```bash
# Build all EP-matrix variants for P0 models (8 builds per model)
uv run python scripts/e2e_eval/run_eval.py --build-only --priority P0

# Pin a single EP/device (no matrix; writes directly to model dir)
uv run python scripts/e2e_eval/run_eval.py --build-only --hf-model microsoft/resnet-50 --ep qnn --device npu
```

Composite models (multiple sub-components) are built into per-component subdirectories
under each EP subdir.

**Export dedup**: the `export.onnx` stage is EP/device-independent, so it is identical
across all matrix combos. It is stored once under `<model_dir>/_shared/export.onnx`
and removed from each `<ep>_<device>/` subdir, keeping only one copy on disk.

#### Streaming upload to the Azure Artifacts feed (`--upload`)

Running the full matrix over many models fills the local disk fast. `--upload`
publishes each model's artifacts to the **`Modelkit`** Azure Artifacts feed
(Universal Package) as soon as its combos are built, then deletes the local copy —
so peak disk stays at roughly one model's matrix.

- **Auth**: uses `az login` (Entra ID) — no PAT. The script verifies the
`azure-devops` az extension is installed (auto-adds it) and that you're logged in;
if not, it aborts (so disk isn't silently filled).
- **Package**: one package `winml-cli-models`, **one version per model**, named
`0.0.0-<run-stamp>-<model-slug>` where the run-stamp is a date (default today,
`YYYYMMDD`). e.g. `0.0.0-20260609-microsoft-resnet-50-image-classification`
(the `0.0.0-` core keeps it valid SemVer 2.0; the stamp+slug are the
pre-release segment). The shared run-stamp prefix groups a batch together.
- A `build_only_uploads.json` manifest (version → run-stamp → combos → status) is
written in the output dir; it drives `--continue`.

```bash
# Build the matrix and stream each model to the feed, deleting locals
uv run python scripts/e2e_eval/run_eval.py --build-only --upload --priority P0

# Resume an interrupted batch: same run-stamp + --continue skips models already
# uploaded (per the manifest) without rebuilding them.
uv run python scripts/e2e_eval/run_eval.py --build-only --upload --continue \
--run-stamp 20260609 --priority P0

# --upload-skip-existing: if the feed already has a version (e.g. manifest lost),
# treat the publish conflict as done and delete the local copy.
uv run python scripts/e2e_eval/run_eval.py --build-only --upload --upload-skip-existing

# Upload but keep local copies (debug)
uv run python scripts/e2e_eval/run_eval.py --build-only --upload --keep-local
```

Download a specific model's specific file later with `--file-filter`:

```bash
az artifacts universal download \
--organization https://dev.azure.com/microsoft --project windows.ai.toolkit \
--scope project --feed Modelkit --name winml-cli-models \
--version 0.0.0-20260609-microsoft-resnet-50-image-classification \
--path ./out --file-filter 'qnn_npu/quantized.onnx'
```

| Upload flag | Default | Description |
|---|---|---|
| `--upload` | off | Publish each model dir to the feed, then delete it locally |
| `--run-stamp` | today (`YYYYMMDD`) | Version prefix; pass the same stamp + `--continue` to resume |
| `--continue` | off | Skip models already uploaded for this run-stamp (no rebuild) |
| `--feed` | `Modelkit` | Azure Artifacts feed name |
| `--feed-org` | `https://dev.azure.com/microsoft` | Azure DevOps org URL |
| `--feed-project` | `windows.ai.toolkit` | Project for the project-scoped feed |
| `--package-name` | `winml-cli-models` | Universal Package name |
| `--keep-local` | off | Upload but do not delete the local dir |
| `--upload-skip-existing` | off | Treat an existing feed version as done (feed-based resume) |

### `generate_report.py` — Regenerate Reports

Expand Down
Loading
Loading