Add centralized tolerance registry for backend test harness

## Summary

ExecuTorch's numerical accuracy validation is currently fragmented — tolerances are scattered across 50+ files with 8+ different comparison mechanisms and no central standard. This issue tracks the implementation of a centralized tolerance registry for the backend test harness.

See the full cross-backend audit in [this comment on #13347](https://github.com/pytorch/executorch/issues/13347#issuecomment-4536276554).

## Problem

- Each backend defines its own tolerance defaults (ranging from `atol=1e-8` to `atol=1e-1`)
- At least 8 different comparison mechanisms are used across backends
- Only 4 of 15+ backends use the shared `Tester` harness
- FP16/BF16 handling has no shared convention
- `ErrorStatistics` already captures accuracy metrics (SNR, MAE, max error) but they aren't tracked over time

## Proposal

Add a centralized tolerance registry to `backends/test/harness/` that documents backend defaults, provides a lookup API, and integrates with the existing `Tester` harness — without breaking any existing tests.

## Implementation Plan

### Phase 1 — Tolerance registry (purely additive)

- [ ] Create `backends/test/harness/tolerance.py`
  - `ToleranceConfig` dataclass (atol, rtol, qtol)
  - `BACKEND_TOLERANCES` registry with defaults extracted from current tests
  - `get_tolerance(backend, dtype, quantized, op)` lookup with fallback chain: op-specific → dtype-specific → backend default → global default
- [ ] Create `backends/test/harness/tests/test_tolerance.py` with unit tests
- [ ] Separate FP16 and BF16 as distinct dtype keys

### Phase 2 — Integration with shared Tester harness

- [ ] Add optional `backend` param to `Tester.__init__()`
- [ ] `run_method_and_compare_outputs()` uses registry when atol/rtol not explicitly provided
- [ ] Fully backward-compatible — explicit values always win
- [ ] Update XNNPACK tester as proof of concept

### Phase 3 — Backend migration

- [ ] Wire ARM tester to registry
- [ ] Wire Cortex-M tester to registry
- [ ] Wire Samsung tester to registry

### Phase 4 — CI tracking

- [ ] Output `ErrorStatistics` as JSON CI artifacts
- [ ] Document tolerance decisions per backend

## Related Issues

- #13347 — [Delegate Testing] Determine tolerance / numerical accuracy validation strategy (umbrella discussion)
- #18424 — ATOL/RTOL configurable per-test for Cortex-M BundleIO (complementary, different layer)
- #14023 — ExecuTorch Test Infrastructure (umbrella tracking)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add centralized tolerance registry for backend test harness #19910

Summary

Problem

Proposal

Implementation Plan

Phase 1 — Tolerance registry (purely additive)

Phase 2 — Integration with shared Tester harness

Phase 3 — Backend migration

Phase 4 — CI tracking

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add centralized tolerance registry for backend test harness #19910

Description

Summary

Problem

Proposal

Implementation Plan

Phase 1 — Tolerance registry (purely additive)

Phase 2 — Integration with shared Tester harness

Phase 3 — Backend migration

Phase 4 — CI tracking

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions