Skip to content

Closed: superseded#586

Closed
pgoodman wants to merge 7 commits into
mainfrom
symex-hardening-and-irgen-fixes
Closed

Closed: superseded#586
pgoodman wants to merge 7 commits into
mainfrom
symex-hardening-and-irgen-fixes

Conversation

@pgoodman

@pgoodman pgoodman commented May 1, 2026

Copy link
Copy Markdown
Collaborator

Superseded.

pgoodman and others added 7 commits April 29, 2026 10:02
The old shadow keyed on (addr, size) → z3 expr, so a 4-byte symbolic
write followed by a byte-at-a-time read (e.g. memcpy) produced zeros
instead of symbolic values.

Replace with a byte-granular shadow keyed on addr → BitVec(8):
- Write: decompose the z3 value into per-byte Extract() entries.
- Concrete write: erase the covered byte slots so concrete memory
  stays the source of truth; partial overlaps handled naturally.
- Read: if any byte in the range is shadowed, reconstruct via
  z3.Concat over the covered slots (lifting concrete bytes to
  BitVecVal(8) where needed), then z3.simplify to collapse
  Concat(Extract(...)...) back to the original variable.

This means memcpy-style byte-at-a-time loops through symbolic memory
propagate symbolics correctly and trigger the expected branch forks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- _shadow_write: while loops instead of for/range
- _shadow_read: while loop for the fast "any shadowed byte?" scan;
  pre-allocated buf (held on Path, grown lazily) replaces per-read
  list; iterative z3.Concat instead of reversed(pieces)
- Path._shadow_buf: durable working buffer reused across all reads
  within and between steps
- _make_default_mem_read: accepts buf= and forwards to _shadow_read;
  InterceptorPolicy pulls buf from path._shadow_buf

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rete fast path

Write a sentinel byte (0xCD) to concrete memory alongside the shadow-dict
entries when storing a symbolic value. Reads now do concrete_read first,
scan the returned bytes for the sentinel, and only touch the dict on a hit.

Fully-concrete reads (the common case) cost zero dict lookups — just a
byte scan of already-fetched data. False positives (a real 0xCD byte not
from a symbolic write) cause one extra dict miss then return the correct
concrete value.

Also add _SENTINEL_BYTES and _ZERO_BYTES as module-level pre-built tuples
indexed by size, eliminating per-call bytes() allocation on both paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces direct calls to _shadow_write from user code. Infers size
from the BitVec width when not supplied; stamps the sentinel into
concrete memory so reads detect symbolic presence cheaply.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three call sites were catching all exceptions from user-defined handlers
and returning None / setting a terminal state silently:
- intercept.address_for chain call
- observe.address_resolved observer loop
- intercept.indirect_call chain call

All three are pure Python call sites (no C++ boundary), so exceptions
propagate naturally. The C++ boundary handlers in dispatch.py already
use _record_handler_error + abort_requested() correctly and are unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Instruction.cpp: add variable name annotation to GLOBAL_PTR_*,
THREAD_LOCAL_PTR_*, and FUNC_PTR_* in format(), matching what CALL
already does for its target. Output now reads e.g. GLOBAL_PTR_64 @g_buf.

dispatch.py: _shadow_read was returning a z3 concrete expression for
real 0xCD bytes in memory that had no shadow entry (false positives).
Add a second scan after the sentinel scan: if no sentinel-marked byte
has an actual shadow dict entry, return None and let the concrete path
handle it unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ards

Substantial set of correctness and ergonomics fixes accumulated while
working through real-world symbolic-execution traces:

z3 expression hygiene
- New `_z3_resize(val, target_bits, signed=False)` and
  `_z3_byte_at(val, i)` helpers in dispatch.py — both no-op when the
  operand is already the right width, killing the redundant
  `Extract(7, 0, 8bit)` and `ZeroExt(0, ...)` wrappers that were
  accumulating across decompose/recompose cycles.
- `_z3_binary` and `_z3_unary` now resize both operands to the
  opcode's declared width via `_dispatch_op_sized`, so an upstream
  cast that left an operand wider/narrower than its companion no
  longer crashes z3 with a sort mismatch (e.g. BIT_OR_16 with a
  ZeroExt'd 32-bit rhs).
- Width-coerce both compare operands to the larger width when they
  don't match.
- Every cast path goes through `_z3_resize` (so a same-width cast is
  identity), and every binary/unary/cast/ptr-add result is run
  through `z3.simplify` before storage. `_shadow_write` simplifies
  the value before per-byte decomposition so load/modify/store
  cycles collapse instead of compounding.

Path API
- `path.write_memory(addr, value, size, byte_order=Endian.BIG)` now
  handles plain ints (the docstring claimed it did but the code
  didn't), accepts an explicit `byte_order` / `endian` override
  (both spellings work to avoid keyword churn), and falls back to
  the path's byte order otherwise.
- `path.write_symbolic` gains the same `byte_order=` kwarg.
- `Endian` moved out of `events.py` into the leaf `_types.py`
  alongside the type-check tuples; `events.py` re-exports for
  back-compat.

PythonPolicy NULL boundary guards
- `or_none(p)` helper substitutes `Py_None` for a real C `NULL` at
  every `PyObject_CallFunction(... "O" ...)` call site (mem_read /
  write, exec_symbolic_load / store, binary_op, unary_op, compare,
  cast, ptr_add / diff / offset, is_true, resolve_branch,
  resolve_call args). A default-constructed `SharedPyPtr` carries
  a real null pointer (vs `default_value()` which carries `Py_None`)
  — without the guard, a substrate path that bypasses
  `make_default()` and stuffs that into the value cache would crash
  the step with `SystemError: NULL object passed to Py_BuildValue`.
  Surface the bad value as Python `None` instead so it's handleable.

IRGen — IRBlock.parent_function() always resolves
- Implicit switch default now creates its own `SWITCH_DEFAULT` block
  and associates it with the implicit-default `SWITCH_CASE`
  structure. `IRSwitchCaseStructure::target_block()` reads the
  structure's first child block, so without an associated block the
  case's target was `{}` (entity id 0) and `decide_switch` skipped
  it — leaving the orphaned callee frame on the call stack while
  the caller's work items executed. Re-indexing fixes traces that
  showed `[default -> %0]`.
- `SwitchToDeadBlock()`'s synthetic `UNREACHABLE` block now
  associates with the surrounding scope. Same for the Duff's-device
  `LOOP_CONDITION` and `LOOP_EXIT` blocks inside `emit_case_bodies`,
  which were the only loop blocks not being associated.
- `COMPENSATION` blocks created during `InsertGotoCompensationBlocks`
  (a post-emit pass where `current_structure_index_` is already
  popped) now anchor explicitly to the goto's common-ancestor scope
  via a new `AssociateBlockWithStructure(block, struct)` overload,
  with `func_.body_scope_index` as the fallback.
- Safety-net pass at the end of `Generate()` /
  `GenerateGlobalInit()` attaches any remaining orphan block (e.g.
  a forward-referenced LABEL whose label was never defined) to the
  function body scope, so `IRBlock::parent_function()` is now
  guaranteed to resolve for every block in every function.

Interpreter
- Reverted `decide_switch`'s temporary UNREACHABLE termination for
  a `[default -> %0]` switch with a non-matching concrete selector
  — the right fix is in IRGen above; the interpreter once again
  just returns and lets the (now valid) default block be entered.

Other
- Per-byte symbolic shadow is now dict-based for detection (no
  sentinel scan in ConcreteMemory), removing the cross-path
  poisoning where one path's 0xCD stamp leaked into a sibling's
  read.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@pgoodman pgoodman requested a review from kumarak May 1, 2026 02:51
@pgoodman pgoodman closed this May 1, 2026
@pgoodman pgoodman changed the title Symex hardening: z3 simplify pipeline, IRGen parent_function, NULL guards Closed: superseded May 1, 2026
@kumarak kumarak deleted the symex-hardening-and-irgen-fixes branch May 1, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant