Skip to content

analyze: detect Gemm→Reshape→Transpose hybrid-unfold pattern; warn before applying highdimRTR #921

@DingmaomaoBJTU

Description

@DingmaomaoBJTU

Summary

winml analyze should detect the Gemm→Reshape→Transpose pattern that signals a CNN-ViT hybrid unfold block and emit a warning when highdimRTR_lowdimRTR optimization is requested, because this flag causes measurable regression on such architectures.


Background: What is highdimRTR and why does it regress?

highdimRTR_lowdimRTR is an ORT optimization pass that simplifies Reshape→Transpose→Reshape (RTR) chains from high-dimensional to lower-dimensional layout — beneficial for pure-ViT Attention patterns.

However, on CNN-ViT hybrid architectures (e.g., MobileViT), it backfires:

Model Architecture highdimRTR effect
DINOv2-small Pure ViT +38% speedup
MobileViT-small CNN-ViT hybrid -19% regression (QNN NPU) / -6.9% (QNN GPU)

Root cause

MobileViT's CNN encoder uses an unfold operation (sliding window patch extraction) implemented in ONNX as:

Conv → Gemm → Reshape → Transpose → ...

The Gemm→Reshape→Transpose sequence is the architectural fingerprint of this unfold block. When highdimRTR runs on these RTR chains, it inserts ~36 spurious Reshape nodes after the Gemm layers instead of simplifying them, creating unnecessary layout round-trips that increase memory traffic on both the HTP (NPU) and DX12 compute pipeline (GPU).

Verified results from QNN catalog sweep (research/autoconfig — 3×500 iters, Phase C confirmed):

  • MobileViT QNN NPU: h9 highdimRTR → median 31.8ms vs baseline 26.6ms = -19.5% regression, DISCARD
  • MobileViT QNN GPU: h9 highdimRTR-6.9% regression (cross-EP, same mechanism)
  • DINOv2 QNN NPU: h9 highdimRTR+38.1% speedup (pure-ViT, no Gemm-unfold blocks)

What the static analyzer should do

Detection

Add a graph-level pattern check in analyze_insight.py (or equivalent in the static analyzer pipeline) to count Gemm→Reshape→Transpose chains:

gemm_unfold_count = 0
for node in graph.node:
    if node.op_type == "Reshape":
        pred = producer.get(node.input[0])
        if pred and pred.op_type in ("Gemm", "MatMul"):
            # Check if this Reshape feeds a Transpose
            consumer = _single_consumer(node)
            if consumer and consumer.op_type == "Transpose":
                gemm_unfold_count += 1

Warning / skip hint

When gemm_unfold_count > 0 and the model also has RTR chains (highdimRTR candidate):

  • Surface as a FusionCandidate with risk="HIGH" and tag highdimRTR_risky
  • In the autoconfig sweep: add to skip_set so the sweep skips h_highdimRTR for this model
  • In winml analyze output: print a warning, e.g.:
⚠  Detected 12 Gemm→Reshape→Transpose unfold blocks (CNN-ViT hybrid pattern).
   highdimRTR_lowdimRTR optimization inserts spurious Reshape nodes after
   Gemm layers on this architecture → confirmed -19% regression on QNN NPU.
   Recommendation: skip this optimization flag for this model.

Discriminator logic (architecture classification)

This ties into the broader need for architecture-aware optimization gating:

Graph signature Architecture class highdimRTR recommendation
Dense Transpose (≥49 nodes), no Gemm-unfold Pure ViT (DINOv2, ViT-B) ✅ Candidate (+38%)
Gemm→Reshape→Transpose blocks present CNN-ViT hybrid (MobileViT) ❌ SKIP (confirmed -19% NPU)
Sparse Transpose, Gemm-dominated Pure CNN (ResNet) Neutral (test, low priority)

Acceptance Criteria

  • analyze_insight.py (or static analyzer) counts Gemm→(Reshape→)Transpose unfold blocks
  • When count > 0: FusionCandidate with tag highdimRTR_risky added to results
  • catalog_qnn_sweep.py (and equivalent sweep scripts) consume this hint → skip h_highdimRTR for affected models
  • winml analyze CLI output includes a human-readable warning for this pattern
  • Test: MobileViT-small is classified as highdimRTR_risky; DINOv2-small is NOT
  • No false positives on pure-transformer models (BERT, GPT-2)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — minor bug or non-critical improvementgraph-optimizerGraph optimizer modulestatic-analyzerStatic analyzer moduletriagedIssue has been triaged

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions