IRGen: fix EnumConstantDecl miscompile + verifier; symex: structural Concat fold for byte shadows#588
Merged
Merged
Conversation
…HECKs A DeclRefExpr to an EnumConstantDecl was routed through EmitRValue's lvalue fallback, then through EmitLValue's local-VarDecl fallback, which silently default-inserted into object_to_alloca_ and returned 0 — the entry IMPLICIT_GOTO's slot. The resulting MEMORY/LOAD_LE_* [%term] reads address 0x0 at runtime with no diagnostic at codegen time. EmitRValue now handles EnumConstantDecl before the lvalue fallback, emitting CONST with the enumerator's integer value via pasta::EnumConstantDecl::InitializerValue(). EmitLValue's DeclRefExpr arm enumerates known decl kinds explicitly and CHECKs on unknown kinds or missing ALLOCAs. Generate()'s parameter-ALLOCA registration uses try_emplace + CHECK to trap duplicate emission. EmitInstruction adds a choke-point CHECK that no non-terminator opcode consumes a terminator as a value operand. EmitLoadFromLValue adds the same check at the lvalue site for a tighter stack. A new VerifyIR() post-pass runs after ComputeRPO in Generate and GenerateGlobalInit. It logs (no abort) operand-references-terminator, LOAD/STORE operand-count mismatches, switch-default-to-slot-0, and missing parent_structure_index — backstop for any malformed IR that slips past the EmitInstruction CHECK. Tests: EnumConstantDecl regression corpus added to tests/symex/c/symex_integration.c, new test_irgen_decl_ref.py (8 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The byte-shadow store/load pipeline used z3.simplify to recover load- modify-store identity. z3 folds Extract(7, 0, X-Y) through byte arithmetic (the low byte has no incoming borrow), but leaves the upper Extracts unfolded because their borrow chain depends on the low byte. After a round trip, the value was Concat(Extract(31, 8, X), 212 + 255*b) — the parent identity for the low byte was destroyed and the full-Concat- collapse rule could no longer recover X. New _z3_concat_fold helper: flattens nested binary Concats z3 stores internally into a single MSB→LSB sequence, merges adjacent same-parent Extracts at contiguous bit ranges, collapses a full-width single Extract to its parent. Pure structural rewrite — never folds Extract through arithmetic. _shadow_write and _shadow_read now use the structural fold instead of z3.simplify. Multi-byte round trips recover the parent expression cleanly. Tests: new test_concat_fold.py covers the fold rules directly, the round-trip via _shadow_write + _shadow_read, partial-overwrite preserving minimal Concat, and an assertion that z3.simplify can't recover the parent (so the structural fold is load-bearing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two adjacent fixes:
IRGen miscompile:
DeclRefExprto anEnumConstantDeclwas routedthrough
EmitRValue's lvalue fallback, then throughEmitLValue'slocal-VarDecl fallback, which silently
operator[]-inserted intoobject_to_alloca_and returned0— the entryIMPLICIT_GOTO's slot.The resulting
MEMORY/LOAD_LE_* [%term]reads address0x0at runtimewith no codegen-time diagnostic. Now lowered to
CONSTdirectly, withchoke-point
CHECKs and a post-passVerifyIRto make a futurerecurrence abort with a live stack.
Symex shadow expression bloat:
z3.simplifyon multi-byte loadConcats folds the low byte through byte arithmetic asymmetrically (the
low byte has no incoming borrow; the high bytes do). For
len -= p[1]with
len = 1492, the round-trip yieldedConcat(Extract(31, 8, X), 212 + 255*b)instead of1492 - zext_32(b),destroying parent identity for downstream loads. Replaced with a
targeted structural Concat-of-consecutive-Extracts fold.
IRGen changes (
bin/Index/IRGen.{cpp,h})EmitRValue: handleEnumConstantDeclbefore the lvalue fallback,emitting
CONSTwith the enumerator'sAPSIntvalue viapasta::EnumConstantDecl::InitializerValue().EmitLValueDeclRefExprarm: explicit decl-kind enumeration;CHECKon unknown kinds or missing ALLOCA registration.
EmitInstructionchoke-pointCHECK: no non-terminator opcode mayconsume a terminator instruction as a value operand. Aborts at the
codegen origin, not post-mortem.
EmitLoadFromLValue: same check at the lvalue site for a tighter stack.Generate()parameter ALLOCAs:try_emplace + CHECK(inserted)trapsduplicate registration.
VerifyIR()post-pass afterComputeRPOin bothGenerateandGenerateGlobalInit.LOG(ERROR)(not fatal) for: operand references aterminator/scope marker,
MEMORYload/store operand-count mismatch,SWITCHdefault to slot 0, missingparent_structure_index. Partial IRsurvives for investigation.
Symex changes (
bindings/Python/symex/dispatch.py)_z3_concat_fold(z3, expr)helper. Flattens nested binary Concats(z3's internal storage), merges adjacent same-parent
Extractranges,collapses a full-width single
Extractto its parent. Pure structuralrewrite — never folds
Extractthrough arithmetic._shadow_writeand_shadow_readnow call_z3_concat_foldinstead ofz3.simplify. Multi-byte round trips recover the parent expression.Tests
tests/symex/c/symex_integration.c: newEnumConstantDeclregressioncorpus (aggregate init, scalar return, comparison, function argument).
tests/symex/test_irgen_decl_ref.py(8 tests, all green).tests/symex/test_concat_fold.py(6 tests, all green) — direct foldrules, full-width round trip, partial-overwrite preserving minimal
Concat, plus an explicit assertion that
z3.simplifycan't recoverthe parent (so the structural fold is load-bearing).
Test results
main, unrelated).main, unrelated).Test plan
z3.simplifyissueand that the structural fold recovers parent identity.
VerifyIRemits noLOG(ERROR)lines on clean IR.🤖 Generated with Claude Code