Summary
ExecuTorch's numerical accuracy validation is currently fragmented — tolerances are scattered across 50+ files with 8+ different comparison mechanisms and no central standard. This issue tracks the implementation of a centralized tolerance registry for the backend test harness.
See the full cross-backend audit in this comment on #13347.
Problem
- Each backend defines its own tolerance defaults (ranging from
atol=1e-8 to atol=1e-1)
- At least 8 different comparison mechanisms are used across backends
- Only 4 of 15+ backends use the shared
Tester harness
- FP16/BF16 handling has no shared convention
ErrorStatistics already captures accuracy metrics (SNR, MAE, max error) but they aren't tracked over time
Proposal
Add a centralized tolerance registry to backends/test/harness/ that documents backend defaults, provides a lookup API, and integrates with the existing Tester harness — without breaking any existing tests.
Implementation Plan
Phase 1 — Tolerance registry (purely additive)
Phase 2 — Integration with shared Tester harness
Phase 3 — Backend migration
Phase 4 — CI tracking
Related Issues
Summary
ExecuTorch's numerical accuracy validation is currently fragmented — tolerances are scattered across 50+ files with 8+ different comparison mechanisms and no central standard. This issue tracks the implementation of a centralized tolerance registry for the backend test harness.
See the full cross-backend audit in this comment on #13347.
Problem
atol=1e-8toatol=1e-1)TesterharnessErrorStatisticsalready captures accuracy metrics (SNR, MAE, max error) but they aren't tracked over timeProposal
Add a centralized tolerance registry to
backends/test/harness/that documents backend defaults, provides a lookup API, and integrates with the existingTesterharness — without breaking any existing tests.Implementation Plan
Phase 1 — Tolerance registry (purely additive)
backends/test/harness/tolerance.pyToleranceConfigdataclass (atol, rtol, qtol)BACKEND_TOLERANCESregistry with defaults extracted from current testsget_tolerance(backend, dtype, quantized, op)lookup with fallback chain: op-specific → dtype-specific → backend default → global defaultbackends/test/harness/tests/test_tolerance.pywith unit testsPhase 2 — Integration with shared Tester harness
backendparam toTester.__init__()run_method_and_compare_outputs()uses registry when atol/rtol not explicitly providedPhase 3 — Backend migration
Phase 4 — CI tracking
ErrorStatisticsas JSON CI artifactsRelated Issues