analyze: detect Gemm→Reshape→Transpose hybrid-unfold pattern; warn before applying highdimRTR

## Summary

`winml analyze` should detect the `Gemm→Reshape→Transpose` pattern that signals a **CNN-ViT hybrid unfold block** and emit a warning when `highdimRTR_lowdimRTR` optimization is requested, because this flag causes measurable regression on such architectures.

---

## Background: What is highdimRTR and why does it regress?

`highdimRTR_lowdimRTR` is an ORT optimization pass that simplifies `Reshape→Transpose→Reshape` (RTR) chains from high-dimensional to lower-dimensional layout — beneficial for pure-ViT Attention patterns.

**However, on CNN-ViT hybrid architectures (e.g., MobileViT), it backfires:**

| Model | Architecture | highdimRTR effect |
|---|---|---|
| DINOv2-small | Pure ViT | **+38% speedup** |
| MobileViT-small | CNN-ViT hybrid | **-19% regression (QNN NPU) / -6.9% (QNN GPU)** |

### Root cause

MobileViT's CNN encoder uses an **unfold operation** (sliding window patch extraction) implemented in ONNX as:

```
Conv → Gemm → Reshape → Transpose → ...
```

The `Gemm→Reshape→Transpose` sequence is the architectural fingerprint of this unfold block. When `highdimRTR` runs on these RTR chains, it **inserts ~36 spurious Reshape nodes** after the Gemm layers instead of simplifying them, creating unnecessary layout round-trips that increase memory traffic on both the HTP (NPU) and DX12 compute pipeline (GPU).

**Verified results from QNN catalog sweep (research/autoconfig — 3×500 iters, Phase C confirmed):**
- MobileViT QNN NPU: `h9 highdimRTR` → median 31.8ms vs baseline 26.6ms = **-19.5% regression, DISCARD**
- MobileViT QNN GPU: `h9 highdimRTR` → **-6.9% regression** (cross-EP, same mechanism)
- DINOv2 QNN NPU: `h9 highdimRTR` → **+38.1% speedup** (pure-ViT, no Gemm-unfold blocks)

---

## What the static analyzer should do

### Detection

Add a graph-level pattern check in `analyze_insight.py` (or equivalent in the static analyzer pipeline) to count `Gemm→Reshape→Transpose` chains:

```python
gemm_unfold_count = 0
for node in graph.node:
    if node.op_type == "Reshape":
        pred = producer.get(node.input[0])
        if pred and pred.op_type in ("Gemm", "MatMul"):
            # Check if this Reshape feeds a Transpose
            consumer = _single_consumer(node)
            if consumer and consumer.op_type == "Transpose":
                gemm_unfold_count += 1
```

### Warning / skip hint

When `gemm_unfold_count > 0` and the model also has RTR chains (highdimRTR candidate):

- Surface as a `FusionCandidate` with `risk="HIGH"` and tag `highdimRTR_risky`
- In the autoconfig sweep: **add to `skip_set`** so the sweep skips `h_highdimRTR` for this model
- In `winml analyze` output: print a warning, e.g.:

```
⚠  Detected 12 Gemm→Reshape→Transpose unfold blocks (CNN-ViT hybrid pattern).
   highdimRTR_lowdimRTR optimization inserts spurious Reshape nodes after
   Gemm layers on this architecture → confirmed -19% regression on QNN NPU.
   Recommendation: skip this optimization flag for this model.
```

---

## Discriminator logic (architecture classification)

This ties into the broader need for architecture-aware optimization gating:

| Graph signature | Architecture class | highdimRTR recommendation |
|---|---|---|
| Dense Transpose (≥49 nodes), no Gemm-unfold | Pure ViT (DINOv2, ViT-B) | ✅ Candidate (+38%) |
| Gemm→Reshape→Transpose blocks present | CNN-ViT hybrid (MobileViT) | ❌ SKIP (confirmed -19% NPU) |
| Sparse Transpose, Gemm-dominated | Pure CNN (ResNet) | Neutral (test, low priority) |

---

## Acceptance Criteria

- [ ] `analyze_insight.py` (or static analyzer) counts `Gemm→(Reshape→)Transpose` unfold blocks
- [ ] When count > 0: `FusionCandidate` with tag `highdimRTR_risky` added to results
- [ ] `catalog_qnn_sweep.py` (and equivalent sweep scripts) consume this hint → skip `h_highdimRTR` for affected models
- [ ] `winml analyze` CLI output includes a human-readable warning for this pattern
- [ ] Test: MobileViT-small is classified as `highdimRTR_risky`; DINOv2-small is NOT
- [ ] No false positives on pure-transformer models (BERT, GPT-2)

---

## Related

- #180 — RTR pattern matcher: should unmergeable high-dim RTR patterns be surfaced? (companion question — the current issue is about pre-detection of the *source* pattern before rewrite is attempted)
- Empirical data: `research/autoconfig/ep_knowledge/qnn_npu.json` → `npu-010`, `research/autoconfig/ep_knowledge/qnn_gpu.json` → `gpu-008`
- Sweep results: `research/autoconfig/catalog-qnn-sweep/apple--mobilevit-small/results.json` (h9)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analyze: detect Gemm→Reshape→Transpose hybrid-unfold pattern; warn before applying highdimRTR #921

Summary

Background: What is highdimRTR and why does it regress?

Root cause

What the static analyzer should do

Detection

Warning / skip hint

Discriminator logic (architecture classification)

Acceptance Criteria

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	Architecture	highdimRTR effect
DINOv2-small	Pure ViT	+38% speedup
MobileViT-small	CNN-ViT hybrid	-19% regression (QNN NPU) / -6.9% (QNN GPU)

Graph signature	Architecture class	highdimRTR recommendation
Dense Transpose (≥49 nodes), no Gemm-unfold	Pure ViT (DINOv2, ViT-B)	✅ Candidate (+38%)
Gemm→Reshape→Transpose blocks present	CNN-ViT hybrid (MobileViT)	❌ SKIP (confirmed -19% NPU)
Sparse Transpose, Gemm-dominated	Pure CNN (ResNet)	Neutral (test, low priority)

analyze: detect Gemm→Reshape→Transpose hybrid-unfold pattern; warn before applying highdimRTR #921

Description

Summary

Background: What is highdimRTR and why does it regress?

Root cause

What the static analyzer should do

Detection

Warning / skip hint

Discriminator logic (architecture classification)

Acceptance Criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions