[Misc] Reject ndarray struct passed to template-annotated kernel param by hughperkins · Pull Request #608 · Genesis-Embodied-AI/quadrants

hughperkins · 2026-05-01T18:37:02Z

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

Passing structs containing ndarrays through qd.template() bypasses argument pruning — every ndarray field gets registered regardless of whether the kernel uses it — and inflates the cached launch context, causing a measured 42% launch overhead on real workloads. Raise a clear QuadrantsCompilationError guiding users to use a concrete dataclass type annotation instead. Co-authored-by: Cursor <cursoragent@cursor.com>

claude

Additional findings (outside current diff — PR may have been updated during review):

🟡 python/quadrants/lang/ast/ast_transformers/function_def_transformer.py:192-198 — This PR removes the only writer of ctx.global_context.struct_ndarray_launch_info (the inner _register_ndarray helper) and the only writer of ctx.global_context.ndarray_to_any_array (the cache[key] = arr line in the same helper), leaving roughly half a dozen named pieces of infrastructure as dead/unreachable code: _struct_ndarray_launch_info_by_key (kernel.py:324), the launch-time read at kernel.py:435-437, the entire stale-cache guard at kernel.py:466-481, the _set_struct_ndarray_args call at kernel.py:511-513, the static helpers _resolve_struct_ndarray (kernel.py:596) and _set_struct_ndarray_args (kernel.py:606), the struct_ndarray_launch_info field on ASTTransformerGlobalContext (ast_transformer_utils.py:185), _promote_ndarray_if_declared (ast_transformer.py:649), and the ndarray_to_any_array lookup branches in impl.py:212, function_def_transformer.py:276, and function_def_transformer.py:298. Not a functional bug — failing paths still fail and succeeding paths still succeed — but a meaningful chunk of code is now unreachable, and the comment at impl.py:207-208 ("AnyArray cache populated by _predeclare_struct_ndarrays during kernel compilation") is now actively misleading. Suggest removing this infrastructure as part of this PR or a follow-up before it lands.
Extended reasoning...

What is dead and why

This PR replaces the body of _predeclare_struct_ndarrays with code that only raises QuadrantsCompilationError. The two helpers it removed — the inner _register_ndarray and the implicit cache[key] = arr / launch_info.append(...) writes — were the sole producers of two pieces of compile-context state:
1. ctx.global_context.struct_ndarray_launch_info (initialized to [] in ast_transformer_utils.py:185)
2. ctx.global_context.ndarray_to_any_array (initialized to {} in ast_transformer_utils.py:184)
After this PR, grep confirms there are zero remaining writers to either. So both are permanently empty for the lifetime of any kernel compile.

The infrastructure left dangling

struct_ndarray_launch_info was consumed by an entire stale-cache-guard subsystem in Kernel:
- kernel.py:324 — self._struct_ndarray_launch_info_by_key: dict[CompiledKernelKeyType, list] = {} is still initialized.
- kernel.py:435-437 — at materialize-time, self._struct_ndarray_launch_info_by_key[key] = getattr(ctx.global_context, "struct_ndarray_launch_info", []). The getattr always returns [] now.
- kernel.py:466-481 — the mutable-ndarray stale-cache guard. The outer if self._struct_ndarray_launch_info_by_key: (dict-truthy) is still True after at least one kernel materializes, but the inner if struct_nd_info: (list-truthy) at 468 is always False because the value is []. So self._mutable_nd_cached_val is always assigned [], and the if self._mutable_nd_cached_val: block at 477-481 is unreachable.
- kernel.py:511-513 — struct_nd_info = self._struct_ndarray_launch_info_by_key.get(key) always returns [] (falsy), so _set_struct_ndarray_args is never called.
- kernel.py:596 _resolve_struct_ndarray and kernel.py:606 _set_struct_ndarray_args — only invoked from kernel.py:480 and kernel.py:513 respectively, both unreachable per above. The static methods themselves are dead.
- ast_transformer_utils.py:185 — the field initializer is now write-only-by-default and read-but-always-empty.
ndarray_to_any_array was consumed at three sites:
- impl.py:209-220 — subscript checks the cache to promote a bare Ndarray to an AnyArray proxy. The gc.ndarray_to_any_array.get(id(value)) lookup at 212 always returns None, so the only outcome for a bare Ndarray reaching this code path is the Ndarray ... not registered as a kernel parameter raise at 216-220.
- ast_transformer.py:649-658 — _promote_ndarray_if_declared is now a no-op for every input (it always returns value unchanged).
- function_def_transformer.py:276, 298 — _transform_func_arg looks up promoted ndarrays for qd.Tensor / dataclass-field cases. promoted is always None so the ternary always falls through to the original data/data_child.
The misleading comment

impl.py:207-208 still says:

# Ndarray from struct template field: resolve via the AnyArray cache populated by _predeclare_struct_ndarrays
# during kernel compilation.

This comment is now actively wrong: _predeclare_struct_ndarrays no longer populates that cache (it raises). A future reader stepping through subscript will follow the comment expecting a cache hit, then be confused that get(...) returns None for every input.

Step-by-step proof for one path

Take the launch hot path for any kernel that compiles successfully under this PR:
1. materialize runs _predeclare_struct_ndarrays, which either raises (template struct contains an ndarray) or returns silently without writing to either context dict.
2. After successful materialization, kernel.py:435-437 runs: self._struct_ndarray_launch_info_by_key[key] = getattr(ctx.global_context, "struct_ndarray_launch_info", []) → assigns [].
3. On launch_kernel, kernel.py:466 if self._struct_ndarray_launch_info_by_key: is True (dict has the key), so we enter the branch.
4. kernel.py:467 struct_nd_info = self._struct_ndarray_launch_info_by_key.get(key) → [].
5. kernel.py:468 if struct_nd_info: is False (empty list), so we hit the else at 472 → self._mutable_nd_cached_val = [].
6. kernel.py:477 if self._mutable_nd_cached_val: is False, so kernel.py:478-481 (the _resolve_struct_ndarray call site) is skipped.
7. kernel.py:511 struct_nd_info = self._struct_ndarray_launch_info_by_key.get(key) → [] again.
8. kernel.py:512 if struct_nd_info: is False, so kernel.py:513 (the _set_struct_ndarray_args call site) is skipped.
9. _resolve_struct_ndarray and _set_struct_ndarray_args therefore never execute. They are unreachable.
Severity & fix

This is not a behavioral bug — the kernel still launches correctly, and the new raise covers the path that this infrastructure used to serve. It is, however, a meaningful amount of unreachable code (one instance dict, one global-context field, one static helper, one set-args helper, one lookup helper, three call sites, and a misleading docstring comment) that AGENTS.md asks PRs to keep tidy. Suggested cleanup:
- Delete _struct_ndarray_launch_info_by_key, _mutable_nd_cached_key, _mutable_nd_cached_val, _resolve_struct_ndarray, _set_struct_ndarray_args, and the kernel.py:435-437 / 466-481 / 511-513 read sites.
- Delete ASTTransformerGlobalContext.struct_ndarray_launch_info and ndarray_to_any_array (or keep the latter and add a # always empty after PR #608 comment if you want to defer).
- Delete _promote_ndarray_if_declared and its ndarray_to_any_array-using callers in function_def_transformer.py and the subscript lookup in impl.py — replace impl.py:209-220 with the bare raise since that is the only outcome now.
- Either fix or delete the impl.py:207-208 comment.
Either do this in this PR or open an immediate follow-up so the cleanup does not get lost.
🟡 python/quadrants/lang/ast/ast_transformers/function_def_transformer.py:209-216 — Error message in _predeclare_struct_ndarrays hard-codes "annotated as qd.template()" and advises "Use a concrete struct annotation instead of qd.template()", but the entry guard at lines 238-240 also accepts qd.Tensor annotations. When a user writes x: qd.Tensor and passes a struct/dataclass containing a bare Ndarray attribute, the raised message references an annotation the user never wrote. Fix by formatting the wording from the actual arg_meta.annotation (e.g. emit "qd.Tensor" vs "qd.template()").
Extended reasoning...

What's wrong

_predeclare_struct_ndarrays builds a list of parameters to walk using:
```
is_template = anno is annotations.template or isinstance(anno, annotations.template)
is_tensor_anno = anno is _TensorClass
if not (is_template or is_tensor_anno):
    continue
```
So both qd.template()-annotated and qd.Tensor-annotated parameters can reach _walk_obj. But the two new raise QuadrantsCompilationError(...) sites inside _walk_obj (the dataclass branch at L209-216 and the __dict__ branch at L226-233) hard-code:

Kernel parameter '{param_name}' is annotated as qd.template(), but ...

…and the remediation hint says:

Use a concrete struct annotation (e.g. a @dataclass type hint) instead of qd.template() for struct parameters that contain ndarrays.

When the user actually wrote x: qd.Tensor, both the diagnosis ("annotated as qd.template()") and the advice ("instead of qd.template()") are factually wrong and will confuse anyone debugging.

Reachability — step-by-step proof
1. User writes a kernel def k(x: qd.Tensor): ... and passes a dataclass instance whose field is a bare qd.ndarray (i.e. not a qd.Tensor wrapper).
2. _template_mapper_hotpath._extract_arg routes non-Ndarray values for qd.Tensor annotations through _TENSOR_T_FIELD_MARKER, so the kernel proceeds to _predeclare_struct_ndarrays.
3. In the loop, anno is _TensorClass ⇒ is_tensor_anno=True. The guard passes.
4. val = ctx.py_args[i] is the dataclass; it is not a _TensorClass (so no unwrap), not an _ndarray.Ndarray (so no continue), is a dataclass ⇒ _walk_obj(val, i, ()) runs.
5. _walk_obj finds the Ndarray field and raises with the hard-coded "annotated as qd.template()" message — even though the user wrote qd.Tensor.
Addressing the refutation

One verifier argued the qd.Tensor branch is unreachable in practice, claiming a non-Tensor struct passed to a qd.Tensor-annotated arg would error elsewhere first. That objection only covers the Tensor → Field._unwrap() sub-path (where Field's __dict__ indeed contains no Ndarrays). It does not cover the more direct path above, where a plain dataclass-with-ndarray value is passed to x: qd.Tensor and routes through _TENSOR_T_FIELD_MARKER. Two other verifiers traced exactly this path and confirmed reachability. Even if reachability were marginal, the is_tensor_anno branch was deliberately added to the guard — emitting a wrong-annotation message in any branch the guard explicitly admits is a self-inconsistency.

Impact

Error-message correctness only; no behavioral or correctness regression. Just user confusion in the qd.Tensor case.

Fix

Compute an annotation string once per param (e.g. anno_str = "qd.Tensor" if is_tensor_anno else "qd.template()") and pass it into _walk_obj, then format both raise sites with that string. The remediation hint should likewise be conditional (or simply drop the "instead of qd.template()" clause).
_{🔬 also observed by verify-runtime}
🔴 python/quadrants/lang/ast/ast_transformers/function_def_transformer.py:207-216 — The new rejection in _predeclare_struct_ndarrays fires for qd.Tensor-annotated dataclass fields whose runtime value is NDARRAY-backed, regressing the previously supported pattern exercised by tests/python/test_tensor_annotation_in_func.py::test_tensor_struct_field_kernel_data_oriented[ndarray] and ::test_tensor_struct_vector_field_roundtrip[ndarray] (both call qd.tensor(..., backend=qd.Backend.NDARRAY) and stash the wrapper on a frozen dataclass passed via qd.template()). The check unwraps the Tensor wrapper before the isinstance(_ndarray.Ndarray) test, so the wrapper is indistinguishable from a raw qd.ndarray. Fix: skip when the value was a qd.Tensor wrapper (capture wrap state before _unwrap()), or only reject when the field is a bare qd.ndarray. The same flaw exists in the non-dataclass __dict__ branch.
Extended reasoning...

What the bug is

FunctionDefTransformer._predeclare_struct_ndarrays now rejects any template-annotated struct that contains an Ndarray impl. Inside _walk_obj (lines 207-216), the code does:
```
child = getattr(obj, field.name)
if isinstance(child, _TensorClass):
    child = child._unwrap()
if isinstance(child, _ndarray.Ndarray):
    raise QuadrantsCompilationError(...)
```
The order is the problem: _unwrap is called before the isinstance(Ndarray) test, so a qd.Tensor wrapper holding an Ndarray impl is treated identically to a bare qd.ndarray.

Why qd.Tensor was previously fine

qd.tensor(dtype, shape=..., backend=Backend.NDARRAY) returns _wrap_impl(impl.ndarray(...)) (python/quadrants/_tensor.py:228), i.e. a Tensor wrapper whose _impl is a ScalarNdarray / VectorNdarray / MatrixNdarray (all subclasses of _ndarray.Ndarray). Pre-PR, this path went through _register_ndarray and worked via ndarray_to_any_array resolution. The FIELD-backend variant accidentally still works post-PR only because Tensor._unwrap() of a FIELD-backed tensor returns a Field (not an Ndarray), so the new isinstance check skips it. The new rejection therefore silently changes the meaning of a qd.Tensor-typed dataclass field based on the runtime backend the caller picked — clearly unintended.

Step-by-step proof using an existing test

tests/python/test_tensor_annotation_in_func.py:56 (test_tensor_struct_field_kernel_data_oriented parametrized with backend=Backend.NDARRAY):
1. t = qd.tensor(qd.i32, shape=(6,), backend=qd.Backend.NDARRAY) → _wrap_impl(impl.ndarray(...)) → a Tensor wrapper, t._impl is a ScalarNdarray.
2. @dataclasses.dataclass(frozen=True) class S: vals: qd.Tensor; s = S(vals=t).
3. @qd.kernel def fill(st: qd.template()): ...; fill(s).
4. In _predeclare_struct_ndarrays: anno = qd.template() → is_template = True; val = s is a dataclass → _walk_obj(s, 0, ()) runs.
5. Field vals → child = t (_TensorClass); child = child._unwrap() returns the ScalarNdarray impl.
6. isinstance(child, _ndarray.Ndarray) is True → raises QuadrantsCompilationError.
The same chain breaks test_tensor_struct_vector_field_roundtrip[ndarray] (tests/python/test_tensor_annotation_in_func.py:226) and any other [ndarray] parametrization that stuffs an NDARRAY-backed qd.Tensor into a dataclass / data_oriented struct passed via qd.template(). The PR does not modify the test file, so CI on the NDARRAY parametrizations regresses.

Inconsistency that confirms the bug is unintended

At the top level in this same function (lines 242-246) the code explicitly handles a Tensor wrapper holding an Ndarray and continues without raising — the rejection only fires inside structs. There is no good reason for the same wrapper-around-Ndarray to be allowed at the top level but rejected one level of nesting deeper.

Impact and fix

Impact: blocks the PR, since two existing tests fail on the [ndarray] parametrization, and any user code following the documented qd.Tensor-as-struct-field pattern (pinned as a stork-23 invariant in the test module's docstring) breaks based on the runtime backend choice.

Fix: distinguish qd.Tensor wrappers from raw qd.ndarray by capturing the wrap state before unwrapping, e.g.:
```
child = getattr(obj, field.name)
was_tensor_wrapper = isinstance(child, _TensorClass)
if was_tensor_wrapper:
    child = child._unwrap()
if not was_tensor_wrapper and isinstance(child, _ndarray.Ndarray):
    raise QuadrantsCompilationError(...)
```
The same change is needed in the non-dataclass __dict__ branch (lines 226-233). Equivalently, you could test field.type is qd.Tensor and only reject when the annotation is qd.ndarray. Either preserves the user's annotation-level intent rather than gating the check on the runtime backend.

github-actions · 2026-05-01T20:17:50Z

Coverage Report (`a44badcdd`)

File	Coverage	Missing
🟢 `python/quadrants/lang/ast/ast_transformers/function_def_transformer.py`	100%
🟢 `tests/python/test_tensor_annotation_in_func.py`	100%

Diff coverage: 100% · Overall: 67% · 31 lines, 0 missing

Full annotated report

The 5 ndarray-backend variants of struct-via-template tests now assert that passing a struct containing ndarrays through qd.template() raises QuadrantsCompilationError. Field-backend variants are unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

hughperkins · 2026-05-02T05:23:38Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 22d4341ceb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-02T05:25:59Z

+            raise QuadrantsCompilationError(
+                f"Kernel parameter '{param_name}' is annotated as qd.template(), but "
+                f"'{param_name}.{attr_path}' is a qd.ndarray. Passing ndarrays through "
+                f"template structs is not supported because it bypasses argument pruning "
+                f"and degrades launch performance. Use a concrete struct annotation "
+                f"(e.g. a @dataclass type hint) instead of qd.template() for struct "


Document new template-struct ndarray compile failure

This change introduces a user-visible API behavior change by turning previously valid qd.template() struct usage with qd.ndarray fields into a hard QuadrantsCompilationError in _predeclare_struct_ndarrays, but the commit does not update any user docs under docs/. Per AGENTS.md ("Keep user-facing docs in sync with public API changes"), this should be documented so users understand the new restriction and migration path.

Useful? React with 👍 / 👎.

github-actions · 2026-05-02T06:23:41Z

Coverage Report (`22d4341ce`)

File	Coverage	Missing
🟢 `python/quadrants/lang/ast/ast_transformers/function_def_transformer.py`	100%
🟢 `tests/python/test_tensor_annotation_in_func.py`	100%

Diff coverage: 100% · Overall: 67% · 31 lines, 0 missing

Full annotated report

hughperkins force-pushed the hp/reject-ndarray-in-template branch from ae92ccc to 2003fbb Compare May 1, 2026 18:37

hughperkins marked this pull request as draft May 1, 2026 18:40

hughperkins force-pushed the hp/reject-ndarray-in-template branch from 2003fbb to 337795f Compare May 1, 2026 18:40

claude Bot reviewed May 1, 2026

View reviewed changes

hughperkins force-pushed the hp/reject-ndarray-in-template branch from a44badc to 733a160 Compare May 2, 2026 05:21

Merge branch 'main' into hp/reject-ndarray-in-template

22d4341

chatgpt-codex-connector Bot reviewed May 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] Reject ndarray struct passed to template-annotated kernel param#608

[Misc] Reject ndarray struct passed to template-annotated kernel param#608
hughperkins wants to merge 3 commits intomainfrom
hp/reject-ndarray-in-template

hughperkins commented May 1, 2026

Uh oh!

claude Bot left a comment

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

hughperkins commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hughperkins commented May 1, 2026

Brief Summary

Walkthrough

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

What is dead and why

The infrastructure left dangling

The misleading comment

Step-by-step proof for one path

Severity & fix

What the bug is

Why qd.Tensor was previously fine

Step-by-step proof using an existing test

Inconsistency that confirms the bug is unintended

Impact and fix

Uh oh!

github-actions Bot commented May 1, 2026

Coverage Report (a44badcdd)

Uh oh!

hughperkins commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 2, 2026

Coverage Report (22d4341ce)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Why `qd.Tensor` was previously fine

Coverage Report (`a44badcdd`)

Coverage Report (`22d4341ce`)