Skip to content

Add centralized tolerance registry for backend test harness #19910

@RiyaP2508

Description

@RiyaP2508

Summary

ExecuTorch's numerical accuracy validation is currently fragmented — tolerances are scattered across 50+ files with 8+ different comparison mechanisms and no central standard. This issue tracks the implementation of a centralized tolerance registry for the backend test harness.

See the full cross-backend audit in this comment on #13347.

Problem

  • Each backend defines its own tolerance defaults (ranging from atol=1e-8 to atol=1e-1)
  • At least 8 different comparison mechanisms are used across backends
  • Only 4 of 15+ backends use the shared Tester harness
  • FP16/BF16 handling has no shared convention
  • ErrorStatistics already captures accuracy metrics (SNR, MAE, max error) but they aren't tracked over time

Proposal

Add a centralized tolerance registry to backends/test/harness/ that documents backend defaults, provides a lookup API, and integrates with the existing Tester harness — without breaking any existing tests.

Implementation Plan

Phase 1 — Tolerance registry (purely additive)

  • Create backends/test/harness/tolerance.py
    • ToleranceConfig dataclass (atol, rtol, qtol)
    • BACKEND_TOLERANCES registry with defaults extracted from current tests
    • get_tolerance(backend, dtype, quantized, op) lookup with fallback chain: op-specific → dtype-specific → backend default → global default
  • Create backends/test/harness/tests/test_tolerance.py with unit tests
  • Separate FP16 and BF16 as distinct dtype keys

Phase 2 — Integration with shared Tester harness

  • Add optional backend param to Tester.__init__()
  • run_method_and_compare_outputs() uses registry when atol/rtol not explicitly provided
  • Fully backward-compatible — explicit values always win
  • Update XNNPACK tester as proof of concept

Phase 3 — Backend migration

  • Wire ARM tester to registry
  • Wire Cortex-M tester to registry
  • Wire Samsung tester to registry

Phase 4 — CI tracking

  • Output ErrorStatistics as JSON CI artifacts
  • Document tolerance decisions per backend

Related Issues

Metadata

Metadata

Assignees

Labels

triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions