Skip to content

Prove revert-on-reject state transitions#389

Open
diegonehab wants to merge 146 commits into
mainfrom
feature/revert-root-hash
Open

Prove revert-on-reject state transitions#389
diegonehab wants to merge 146 commits into
mainfrom
feature/revert-root-hash

Conversation

@diegonehab

@diegonehab diegonehab commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Part of #388 (the issue stays open for the Dave integration items). The companion PR regenerates the Solidity side: cartesi/machine-solidity-step#98

Problem

When the guest rejects an advance-state input, the canonical machine state goes back to
the pre-input state, but dispute verifiers could not prove that transition. The
pre-input root hash was poked into the machine by the host through a separate,
unverified write_revert_root_hash() call, and no verifier ever substituted it for the
machine root hash after a rejection.

Solution

The first commit implements the write side. send_cmio_response,
log_send_cmio_response, and verify_send_cmio_response take the revert root hash as
their first argument, threaded as const_machine_hash_view through every interface
layer (machine, i_machine, local and jsonrpc machines, the JSON-RPC wire protocol, the
C API, and the Lua bindings) and as a bytes32 alias in the Solidity-translatable
core. The core commits the hash into the revert_root_hash shadow leaf as a single
full-leaf logged write right after the iflags.Y precondition check, so the value
becomes part of the proof. The record/replay rx-buffer write logic is hoisted into
label-parameterized helpers shared with the new accessor.

The second commit implements the read side. uarch_reset_state resets the uarch as
before, then reads iflags.Y, conditionally reads htif.tohost, decodes the
dev/cmd/reason fields, and on a rejected manual yield calls the new revert_state()
accessor, which reads the revert leaf (one logged access carrying the value) and
switches the canonical root hash atomically. The replay accessor replaces its context
root hash, so verify_reset_uarch is untouched. The record accessor reports the
resulting hash through a caller-owned context, so log_reset_uarch needs no branch.
For the ZK path, replay_step_state_access::finish() applies the same rule, reading
the revert leaf from the shadow registers page that every step log already contains,
and log_step records the substituted hash in the log header. The risc0 guest inherits
the behavior with no source changes. Unknown yield reasons (GIO domains, accepted,
exception) cause no substitution and no error. The physical machine is never reverted,
a rejected log_reset_uarch still resets the uarch.

cm.h gains the HTIF tohost/fromhost field shifts and masks, exposed through the Lua
module so the machine-solidity-step constants generator can query them.

Tests

  • machine-bind (local and jsonrpc) covers the new access shapes, the rejected reset and
    step round trips against the revert hash, tampered revert leaf values, and the
    accepted-reason control case.
  • The corrupted-log specs cover the new revert-hash write access of the send.
  • The uarch json log generator emits a rejected-reset fixture consumed by the
    machine-solidity-step Foundry tests.
  • Verified locally: all Lua suites, C API tests, the 54 uarch tests, uarch.bin
    cross-build, and the risc0 cpp guest cross-build.

The dtc interrupt-provider checker warns that interrupt providers must
declare both #address-cells and #interrupt-cells. The per-CPU
riscv,cpu-intc node only set #interrupt-cells, surfacing a lint
warning whenever the DTB was decompiled (e.g. via dtc -I dtb -O dts
/sys/firmware/fdt inside a running machine). The node has no
addressable children, so #address-cells = <0> is the right value and
Linux's behaviour is unchanged. The DTB bytes do change, so every
freshly-instantiated machine's state hash shifts going forward.
Expose the revert_root_hash shadow register (at AR_SHADOW_REVERT_ROOT_HASH_START)
through all API layers: machine, i_machine/local_machine (NVI pattern), JSON-RPC
client/server, cm.h C API, and Lua bindings. Previously the only way to reach this
register from Lua or the C API was write_memory/read_memory against the magic offset.
Add cm_pma_did enum to cm.h with all device IDs (empty, memory, shadow
state, flash drive, CLINT, HTIF, PLIC, CMIO RX/TX, uarch shadow, VirtIO,
NVRAM). Static asserts in cm.cpp verify the values against pmas-defines.h.
Expose all constants as cartesi.PMA_*_DID in the Lua API via clua-cartesi.cpp.
Before: DTB_BOOTARGS_ROOT glued the kernel `root=` and `init=`
arguments into one string, and DTB_BOOTARGS_INIT was the combined
default for dtb.bootargs. cm.h mirrored that shape with CM_ prefix.

Two issues. The atom set (CONSOLE / UIO / ROOT / INIT in cm.h) was
asymmetric: ROOT covered two kernel arguments, so splicing one
without the other required parsing. And the project-wide <NAME>_INIT
convention ("initial value of NAME", as in PC_INIT, MCYCLE_INIT,
MARCHID_INIT) collides with any symmetrisation that puts INIT on the
init= atom.

After: one atom per kernel argument, suffixed _PART; DTB_BOOTARGS_INIT
returns to meaning the combined initial value.

  DTB_BOOTARGS_CONSOLE_PART  "quiet earlycon=sbi console=hvc0 "
  DTB_BOOTARGS_UIO_PART      "uio_pdrv_genirq.of_id=generic-uio "
  DTB_BOOTARGS_ROOT_PART     "root=/dev/pmem0 rw"
  DTB_BOOTARGS_INIT_PART     "init=/usr/sbin/cartesi-init"
  DTB_BOOTARGS_INIT          (CONSOLE_PART UIO_PART ROOT_PART INIT_PART)

cm.h mirrors with CM_ prefix; cm.cpp's static_asserts match the new
set. clua-cartesi.cpp exposes the four _PART atoms alongside the
unchanged DTB_BOOTARGS_INIT. cartesi-machine.lua's
--no-root-flash-drive handler now strips both ROOT_PART and
INIT_PART (the previous single gsub against the glued DTB_BOOTARGS_ROOT
covered both); the spec-cm-cli.lua assertion that `pmem0` and `init=`
are absent after the flag continues to pass.
- Add docgen.lua filter that executes {pipe=sh} blocks and spans,
  splicing command stdout back into the document.
- Add host-replacements/_build_all.sh orchestrator that runs every
  expensive helper once and caches outputs under $CACHE_DIR, so every
  block/span reduces to a `cat` of a pre-computed file.
- Rewrite README.md.template: %machine.* placeholders replaced with
  pipe spans (or printf forms for mixed-content backtick spans).
- Simplify Dockerfile around cartesi/machine-emulator:$TAG: drop the
  panpipe stage, install pandoc + net-tools + genext2fs, symlink
  doc-local images into /usr/share/cartesi-machine/images so the
  emulator finds them at its default paths.
- Update Makefile rule to invoke pandoc with --lua-filter docgen.lua.
- Remove panpipe.diff.

Compatibility fixes uncovered along the way:
- Lua 5.4: cartesi.machine(require "X") now passes require's two
  return values; split into a local + call in every helper script.
- Use sha256sum (Debian) instead of shasum.
- Add `printf --` terminator so spans starting with `--flag=...`
  aren't parsed as printf options.
Mirror the DEV_ENV_HAS_TOOLCHAIN dispatch pattern from src/Makefile:
- Add DEV_ENV_HAS_DOCGEN ?= no to doc/Makefile. The README.md rule
  delegates to playground-exec when unset, and falls through to the
  direct pandoc invocation (_README.md.local) when set to "yes".
- Set ENV DEV_ENV_HAS_DOCGEN=yes in doc/Dockerfile so the recursive
  make README.md call inside the container takes the local branch
  without looping.
- Fix playground-exec: use $(PLAYGROUND_IMAGE):$(TAG) (was
  cartesi/machine-emulator:linux-env), mount doc/ at /work (was
  /opt/cartesi/machine-emulator, leaving the Dockerfile symlinks
  dangling), rename hostname to playground.
- Install make in the playground image.
- Drop the empty images: phony target.
A Makefile under doc/recipes/ replaces the 226-line bash orchestrator.
Required cache files are auto-detected from the template, and script-
to-script dependencies are auto-derived by grep'ing each recipe for
$CACHE_DIR reads. Default convention is one script per cache key
(cache.<key>.sh -> $CACHE_DIR/<key>.out); five multi-output workloads
(rolling-ioctl-echo-loop, rolling-calc-template, remote, remote-end,
lua.remote) use grouped targets.

Retarget to cartesi/playground:0.5.0, the pre-uarch version the docs
were written for. Switch pandoc to the markdown reader (2.9.2.1 lacks
gfm+attributes), convert template fence attrs to pandoc syntax, and
revert addresses to the long-form layout used by 0.5.0.

Move CACHE_DIR under doc/recipes/cache so it persists via the existing
$(CURDIR) bind mount instead of the ephemeral /tmp/ path.
Add a `cache=FILE` attribute handled by the lua filter that reads
$CACHE_DIR/FILE and substitutes the contents - collapsing the most
common pipeline pattern from a 3-line ```{pipe=sh}\ncat ...\n```
fence to a 2-line ```{cache=X.out}\n``` fence (42 block sites).

Inline forms use an empty bracketed Span: []{cache=X.out} for a
plain substitution, [...]{cache=X.out} for a value with a literal
trailing ellipsis. The Span body becomes the suffix and the whole
thing renders as inline Code, so hashes etc. stay monospace inline
(27 inline sites).

Inline `pipe=sh` Code now keeps its Code wrapper instead of being
replaced by a plain Str, so the few remaining inline shell
substitutions also render as monospace.

Pandoc 2.9.2.1's gfm writer renders unattributed CodeBlocks as
4-space-indented; give classless substitutions a synthetic .text
class so they emit as fenced ``` blocks consistently.

recipes/Makefile auto-detection picks up `cache=KEY.out` references
alongside the existing `$CACHE_DIR/KEY.out` reads.
Each cache.*.sh now declares its own outputs and reads as bash arrays
before sourcing lib.sh, which emits a self-contained make rule to stdout
when invoked with --emit-deps. The Makefile shrinks to ~25 lines: one
pattern rule generates each .d sidecar, a second pattern rule runs
single-output scripts, and -include pulls everything in. docgen.lua
emits template.d as a standard make rule augmenting the all target's
prereqs directly.

Also replace the &&-chained subshells in 20 scripts with flat command
sequences: set -euo pipefail (inherited from lib.sh) handles
abort-on-error correctly, and trap '...' EXIT guarantees artifact
cleanup on both success and failure.
- Fold strip-ansi.sh's sed/tr pipe into docgen.lua's read_cache(); cache
  .out files now hold raw output and 34 cache.*.sh scripts drop the
  trailing | bash strip-ansi.sh pipe.
- Move `cd "$HERE"` from 20 cache scripts into lib.sh, after the
  --emit-deps early exit.
- Add .DELETE_ON_ERROR: to recipes/Makefile so a failed recipe does not
  leave a stale target that Make treats as up-to-date; drop the
  $@.tmp && mv atomic-write workaround from the %.sh.d rule.
- Add .NOTPARALLEL: to recipes/Makefile because cache scripts share
  files in $HERE and bind fixed localhost ports, so make -jN would
  corrupt state or fail with EADDRINUSE.
- Inline the _README.md.local helper target into README.md and add a
  clean target in doc/Makefile.
- Add `script=<KEY> [name=<NAME>] [subst=VAR->KEY,...]` attribute to
  docgen.lua. It extracts a `# docs:begin/end` region from
  `recipes/cache.<KEY>.sh`, strips the trailing `> "$out" 2>&1` redirect,
  and substitutes `$VAR` references with the value of `cache/<KEY>.out`.
  Eliminates the duplication where the README hand-typed a command above
  the cached output of a script that ran a different command.
- Rename the existing `cache=<KEY>.out` attribute to `output=<KEY>`. The
  `.out` extension was an internal cache-layout detail.
- Promote RECIPES_DIR to the canonical, configurable env var; CACHE_DIR
  is unconditionally derived as `$(RECIPES_DIR)/cache`. Both Makefiles
  and docgen.lua now share this anchor.
- Add `# docs:begin/end` markers and convert 16 README blocks to the new
  form (limit-exec, nothing, persistent-flash, persistent-machine,
  persistent-stored-hash, proofs-pristine-run, rarely-append-bootargs-
  loglevel, rarely-append-bootargs-single-id, rarely-id, rarely-periodic-
  hashes, rarely-step, remote, rolling-ioctl-echo-loop, state-hashes-
  limit-exec, state-hashes-no-limit, templates-store).
- persistent-{machine,stored-hash} now name the store directory after the
  final state hash (`machine-$hash`) so the displayed
  `--load="machine-<HASH>"` matches what actually runs.
Continues the script= migration. Each conversion replaces a hand-typed
fenced bash block in README.md.template with a script= attribute that
extracts the displayed command from the same cache.<KEY>.sh file that
produces the cached output, eliminating prose/recipe drift.

- ls: trivial single-region wrap.
- flash: recipe is now self-contained — runs mkdir foo and writes
  bar.txt at build time inside named "setup" / "run" regions, so the
  README displays the same multi-step pipeline the prose previously
  hand-typed. The pre-existing recipes/foo/ orphan is removed; trap
  cleans up foo, foo.tar, foo.ext2.
- templates-run: split into templates-run.sh (cartesi-machine
  invocation, persists output.raw which is gitignored) and a new
  templates-run-result.sh (depends on it via reads=, runs the
  displayed lua5.3 line). The long decimal result, previously
  hand-typed in the template as a constant, is now driven by the
  cached output of templates-run-result.
- proofs-output-run: wrap the post-setup truncate-through-cartesi-
  machine section. Template's leading "rm -f output.raw" is dropped
  (recipe trap handles cleanup).
- interactive-ls: intentionally not converted; recipe heredoc form
  would mislead readers about how to use interactive mode. Header
  comment in the recipe documents the decision.

Also fix bugs in three cache recipes

- templates-hash: redirections were in the wrong order
  (2>&1 > "$out"), letting stderr leak to the terminal and
  capturing only stdout. Fixed to > "$out" 2>&1.

- remote-begin-client: the recipe ran --remote-shutdown so the
  background server would terminate, but the prose at this section
  shows --no-remote-destroy. The captured "shutting down" message
  was therefore wrong relative to what the README displays. Folded
  the capture into remote-end.sh (which already runs the
  prose-correct first invocation) and removed the divergent recipe.

- rolling-calc-template: the printf-built JSON missed a comma after
  the "msg_sender" line; same typo lived in the prose heredocs.
  rollup-memory-range happened to accept both, but the input is
  still invalid JSON. Added the comma in all four locations.
Convert templates-hash, templates-run-result, proofs-input-json,
remote-end (begin/end-client), rolling-calc-template
(template/client), and rolling-ioctl-echo-loop (encode) blocks to
script= filter invocations using # docs:begin/end region markers.

Also fix the parallel JSON missing-comma bug in
rolling-ioctl-echo-loop msg_sender (matching the Tier 3
rolling-calc-template fix), align payload strings with prose
("hello from input N!" with trailing !), and make
persistent-flash.sh self-contained so it no longer depends on
flash.sh leaving foo/ behind.
Bash bodies move from doc/recipes/*.sh into the doc template itself
as keyed code blocks. Each block carries its dependencies, outputs,
and render directive (replace=output|source|null) inline -- no more
matching identifiers between *.sh files and the template, no more
.null declarations, no more separate inputs=/outputs= attributes
layered on the reference site.

docgen.lua walks the template in document order, hashing each resolved
body, writing recipes/cache/<hash>/script.sh, and emitting a Make 3.81
fragment with primary+sibling rules for multi-output scripts. The $K
substitution resolves to the dep's cache directory using longest-key
matching against the in-memory key map, so dotted/dashed keys parse
unambiguously. Cache invalidation is structural: a body change
rehashes it; a dep change rehashes consumers because dep paths are
textually substituted into the body before hashing.

The lib.sh preamble is gone -- docgen.lua prepends the preamble to
each cached script directly. The 74 doc/recipes/*.sh files and lib.sh
are deleted.
docgen.lua

- Replace the single-pass walk with two passes. Pass 1 records every
  key= block body into `pending` via pandoc.walk_block. Pass 2 walks
  in document order; each ref=K (or subst=...->K) calls
  ensure_defined(K), which lazily defines K via DFS through depends=.
  Both ref= and depends= are now order-independent within the
  template -- a key= block may sit anywhere relative to the prose
  and the inline references that use it.

- Make replace= required on every key=/ref= site. No more implicit
  "source" / "output" defaults that hid intent.

- Document the full pipeline in the file header: dry-run with
  DEPS_FILE emits a self-contained makefile fragment; the outer
  Makefile includes it so `make` populates the cache via topological
  execution of cache/<hash>/script.sh recipes; the real run reads
  the cached outputs and renders the final document.

- Simplify the script PREAMBLE: drop the $HERE indirection. Scripts
  now cd into their own cache directory and use a relative `out`
  variable, which matches what authors actually write in block bodies.

README.md.template

- Replace seven hardcoded hashes (three 64-hex command outputs, four
  trunc-8 inline references) with derived ref= blocks driven by
  merkle-tree-hash recipes. With the order-independent filter, every
  key= definition now lives at its visual location.

- Consolidate the "create the calculator template" setup that had
  been duplicated across seven keys. The canonical instance lives on
  templates-store as outputs=out,calculator-template; consumers
  reach the artifact via depends=templates-store/calculator-template
  plus a one-line ln -s symlink prelude. This caught a latent bug
  where the duplicated copies had been silently omitting
  --append-rom-bootargs="single=yes".

- Convert the remaining inline backtick-form ref placeholders to the
  empty-Span form for consistency.

Build infrastructure

- doc/Makefile: add a run-playground target for an interactive shell
  in the playground container.
- doc/recipes/Makefile: drop .NOTPARALLEL -- recipes are independent
  and safe to run in parallel under make.
- doc/recipes/find-htif-putchar.lua: switch to
  require"config.nothing-to-do" to match the directory layout.
- doc/recipes/config/nothing-to-do.lua: regenerate as a full dumped
  config (baseline refresh).
Validate that every base key (key=, depends=, subst=, ref=) matches
[a-zA-Z_][a-zA-Z0-9_]* at parse time, so dots and hyphens are caught
immediately with a named-offender error rather than failing later with a
confusing pattern mismatch.

Add a postamble in build_script_content that checks each declared output
exists after the body runs.  Without this, a missing "> "$out"" redirect
produces either a cryptic "cannot open .../out" from the downstream
consumer or a "No rule to make target" from make -- both blaming the
wrong place.  The postamble exits with "key=<name>: declared output
"<sub>" was not created" pointing at the producing block.  The postamble
is only in the generated script; sources[key] (used by replace=source)
holds the substituted body only, so rendered source blocks are unaffected.
Replaces the old "every block redirects to $out, strip_redirect scrubs
it from rendered source" scheme with a uniform addressing system and
runner-side capture.

docgen.lua
- PREAMBLE now self-captures all three streams via process substitution:
    exec > >(tee stdout >> both) 2> >(tee stderr >> both)
  so script bodies are pure code in their own language. The combined
  `both` file is roughly time-ordered (two parallel tees, not byte-exact
  kernel ordering); good enough for current consumers, swap to script(1)
  if a block ever needs precise interleaving.
- The PREAMBLE EXIT trap closes FDs 1/2 *before* `wait` so the tees see
  EOF and exit; without the close, `wait` deadlocks (trap runs while the
  shell still holds the tee pipes open, tees never EOF).
- Reserved names {stdout, stderr, both, source, null} cannot appear in
  outputs=. Single addressing scheme covers display, references, and
  substitutions:
    replace=<thing>            -- own stream/artifact/source
    replace=K/<thing>          -- cross-block
    replace=null               -- drop
    subst=VAR->K[/<thing>]     -- substitute in rendered source
  `ref=` and `block=` are folded into the slash-form. `subst=` and
  `depends=` defaults updated (both -> stream, stdout -> stream prereq).
- emit_rule emits stdout/stderr/both as siblings; declared artifacts
  remain. strip_redirect, resolve_ref, sub_to_filename, and the per-out
  variable emission are gone.

README.md.template
- Big-bang migration of all ~88 key blocks and ~60 inline references:
    cmd > "$out" 2>&1     -> cmd, plus replace=both
    cmd > "$out"          -> cmd, plus replace=stdout
    cmd 2> "$out"         -> cmd, plus replace=stderr  (rarely_step)
    > "$out_X" 2>&1       -> > X 2>&1 (cwd is the cache dir)
    ref=K replace=output  -> replace=K/<thing>
    block=R replace=source -> replace=source/R
- Multi-output blocks (cmdline_remote, cmdline_remote_end,
  rolling_ioctl_echo_loop, rolling_calc_template) wrap their per-region
  cartesi-machine commands in `{ ... } > <artifact> 2>&1` braces with
  # docs:begin/end *inside* the braces, so the redirect stays out of the
  rendered source without needing strip_redirect.
- Cycle/hash readers that flow through last-cycles.sh / find-hash.sh now
  read /both (cartesi-machine prints these to stderr); blocks whose own
  primary output is on stdout (printf, find-lua-val.sh, ...) keep
  /stdout.

Verified by `make README.md TAG=0.5.0` from a clean cache; rendered
prose differs from the previous render only in expected places (cycle
counts, e2ls cosmetics for the new mtime/owner-pinned foo_ext2 recipe,
and the rarely_step splash interleaving noted above).
…unners

Rename doc/docgen.lua to doc/replace.lua and extend it with
multi-language runner support, docs:begin/end null regions,
\$_REPLACE_KEY substitution, and runner-content hashing. Add the
per-language runners doc/run-bash.sh and doc/run-lua.sh, and add
doc/alerts.lua (pandoc filter for GitHub-style alerts). Update
doc/Makefile, doc/README.md.template, and doc/recipes/Makefile
accordingly.
Make `make README.md` work again on the 0.20 branch by rebasing the
docs container on a locally-built emulator and adding an opt-in
mechanism that lets snippets be migrated to the 0.20 surface one at
a time instead of all at once.

Container:
- doc/Dockerfile: FROM cartesi/playground:0.5.0 → cartesi/machine-emulator:$TAG.
  The released cartesi/machine-emulator:0.20.0 binary cannot drive
  tests/dependencies' kernel (built against this branch's unreleased
  emulator with UIO/etc.), so the docs image must layer on top of
  what `build-emulator-image` produces locally.
- doc/Makefile: rename PLAYGROUND_IMAGE → DOCS_IMAGE; TAG default
  0.5.0 → devel (matches root Makefile); build-docs-image now first
  invokes `$(MAKE) -C .. build-emulator-image TAG=$(TAG)`.

Per-block enable/disable:
- doc/replace.lua: add `enabled=yes|no` block attribute and read
  `-M default-replace=` from doc.meta. Disabled blocks render their
  body verbatim (no execution, no region trimming, no cache rules);
  cross-block replace= sites for disabled keys render empty;
  ensure_defined errors if an enabled block depends on a disabled one.
- doc/{Makefile,recipes/Makefile}: pass `-M default-replace=false` to
  every pandoc invocation. With no per-block enabled=, the README
  renders source-only against pandoc alone — the container is only
  needed for blocks explicitly opted in.

First batch of opted-in snippets:
- doc/README.md.template: enabled=yes on `help`, `linux_sha`,
  `rootfs_sha`, `nothing`; image paths /opt/cartesi/share/images →
  /usr/share/cartesi-machine/images. Rewrite cmdline_interactive_ls
  to feed input via --console-io=input_source:from_file: 0.20's `-i`
  defaults console.input_source to from_stdin which calls open_tty()
  and fails in a non-TTY container; from_file goes through fopen()
  and /sbin/init halts the machine when the user shell exits.
Update the Initialization section: drop --rom-image (ROM is gone),
update image paths to /usr/share/cartesi-machine/images/, rename
filename: to data_filename: in --flash-drive, and replace the Emulator
SDK reference with machine-linux-image and machine-guest-tools.  Point
readers to the cartesi/machine-emulator-docs image for sample files.

Enable the ls, foo_ext2, flash, not_shared_flash,
e2ls_not_shared_foo_ext2, shared_flash, and e2ls_shared_foo_ext2
snippets.

Update the Persistent flash drives section: drop the ROM mention,
rename --dump-pmas to --dump-address-ranges.  Add a note explaining
that /sbin/init runs commands as the unprivileged user dapp, that
flash drives are mounted with root ownership by default, and that
user:dapp must be passed to --flash-drive to allow writes (with
--user=root as a non-preferred alternative).  Add user:dapp to the
not_shared_flash and shared_flash examples accordingly.
Update recipes/find-htif-putchar.lua to use the generic read_reg("name")
API introduced in 0.19, replacing the removed per-accessor methods
read_iflags_H/Y(), read_htif_tohost(), and read_mcycle().

Enable config_dump_nothing, cycles_limit_exec, max_mcycle_arg, limit_exec,
and all state_hashes_* blocks. Also add enabled=yes to the inline
[]{replace=...} spans in the surrounding prose.
Enable the persistent_machine_cmd, machine_dirname, persistent_machine,
and persistent_stored_hash blocks (plus the cross-block replace= sites
and inline spans they reference).

Rewrite the prose claim that --store writes the state hash inside the
directory and --load verifies it: in 0.20 the store directory contains
config.json, per-AR data files, and hash_tree.sht/.phtc, but no separate
hash file. --load verifies the archive format version recorded in
config.json instead. The pre-store and post-load state hashes still
match because the same machine state is restored.
Drop --htif-yield-automatic from the example command (automatic yield is
now the default). Rephrase the surrounding prose to reflect the inverted
model: the emulator honors automatic yields by default, and
--no-htif-yield-automatic disables them (with a note that doing so
prevents Rolling Cartesi Machines from generating outputs). Replace the
gRPC interface reference with JSON-RPC. Drop the now-redundant mention
of --htif-yield-manual from the closing sentence.
Replace --append-rom-bootargs="single=yes" with --user=root, switch
flash-drive filename: to data_filename:, swap --replace-flash-drive for
--replace-memory-range using labels, and bump lua5.3 to lua5.4. Add
mke2fs:false,mount:false to the raw flash drives so init does not try
to format and mount them, plus a prose paragraph explaining the default
init behavior. Replace gRPC with JSON-RPC in the interfaces sentence,
and rewrite the --replace-memory-range explanation now that labels
identify drives.

Also port the dtc example.
Break the combined Merkle tree operations section in two. "Slicing and
splicing" keeps the word-leaf state-tree mechanics and precedes its
applications, Template instantiation and Result extraction, now promoted to
siblings. A new "The outputs tree" section explains the keccak256-leaf
outputs tree and its frontier operations (frontier_node, frontier_push_back,
frontier_get_root_hash, frontier, frontier_next_proofs), and precedes Output
verification. Repoint the three cross-references that each meant one half or
the other, and capture the outputs-tree height and leaf size from cm.h.

Also make the verification-game example deterministic: the players are told
apart only by connection order, so wait on an established connection to the
server before launching the dishonest player, in both vg_run and
vg_run_reset.
Add a figure to the outputs tree section showing the three regions
frontier_node reads left to right: the frontier supplying a complete left
subtree (AB), this epoch's new outputs as the active region (C-F), and
pristine padding (GH). The frontier and pristine subtrees enter whole at
aligned boundaries, drawn collapsed, so only the active outputs are
individual leaves, mirroring frontier_node and frontier_next_proofs.

The figure renders from a committed Graphviz source, images/outputs-tree.dot,
into images/outputs-tree.svg. A pattern rule drives dot, and the svg is an
order-only prerequisite of README.md so a render refreshes it only when the
source changes, and never forces a re-render when just the image bytes do.
Add graphviz to the docs image so dot is available in the pinned container.
Regenerate README.md.
Add the physical-memory attributes of each address range to its
address_range_description, so callers can tell memory from devices and
inspect access permissions. The new fields are is_memory, is_device,
is_readable, is_writeable, is_executable, is_read_idempotent,
is_write_idempotent, and driver_id, each read from the matching
address_range accessor. get_address_ranges still returns every range,
devices included, since callers need the whole occupied address space to
check for overlap. The fields flow through to_json, ju_get_opt_field, and
the JSON-RPC AddressRangeDescription schema.

Use the new is_memory flag in --dump-address-ranges to write a file only
for memory ranges. Device ranges are not memory-backed and hash as
pristine, so dumping their bytes produced meaningless files. Update the
cm-cli test to expect a file per memory range and none for devices.
Add two tested SVG figures to the documentation. In "Hash-view of state", a
state hash-tree figure draws a real machine's state as a Merkle tree over the
address space, generated by images/state-tree.lua, which instantiates a
machine and reads its memory ranges. In "The outputs tree", a frontier figure
shows the frontier holding completed left subtrees while the active region and
pristine padding fill the rest, from images/outputs-tree.dot.

Both render through the images/ diagram rules as order-only prerequisites of
README.md, so a render refreshes them without forcing a rebuild. state-tree is
a Lua program that emits SVG directly; outputs-tree pins its node positions and
renders with neato -n.
apt-get source --print-uris emits whichever location currently serves
each file. While a version is still in the pool that is the live mirror
with a SHA256 digest, and once it has been superseded it is the pinned
snapshot host with a SHA512 digest. The committed report therefore
drifted whenever Ubuntu published a security update that pushed an
installed package out of the live pool, even with the snapshot date and
the rootfs unchanged, and the CI sync check failed. The two forms cannot
be reconciled as text because the digest algorithms differ and SHA512
cannot be derived from SHA256.

Normalize every captured source URI to the durable snapshot host and
drop the digest field, which is not required to satisfy the source
distribution obligation. The report now depends only on the committed
ext2 and the snapshot pin, so it stays byte stable until one of those
changes.
…rfaces

The microarchitecture had a dedicated ECALL (function code 3) that let
uarch interpreter code mark a physical page dirty. This was the wrong
abstraction layer. The uarch should not know about dirty-page tracking,
and exposing the call through i_state_access bloated the interface for
every accessor that had nothing to do with dirty pages.

Pull the responsibility up to the machine layer. Introduce
i_accept_dirty_pages as an opt-in capability interface, mirroring
i_accept_counters, and guard the mark call in write_ram_uint64 with
is_an_i_accept_dirty_pages_v. Add machine::mark_write_tlb_dirty_page
so the write-TLB eviction path and the overwrite path both mark the
outgoing page through a single spot. Remove the host-address overload of
machine::mark_dirty_page so all callers pass a physical address.

On the uarch side, delete the ECALL handler, its constants, and all
plumbing that threaded do_mark_dirty_page through every accessor
implementation. Function code 3 is left as an intentional gap in the
ECALL table. Replace the ecall-mark-page-dirty assembly test with
ecall-removed-mark-page-dirty, which issues fn=3 and asserts it traps,
keeping the gap behavior under test.
Address reviewer confusion in the output proofs and output hashes tree
sections without changing any behavior.

Define what an output index is and how an output is verified from a
proof that its hash is the leaf at that index. Note that the report
command takes the same input format as notice but has no index to
print, because reports are not verifiable.

Drop the vague "shared accessor", "running accumulator", and "matching
reader" labels in favor of the plain function names. Spell out the
frontier_node argument placeholders and define is_proof where the
frontier constructor uses it.

Make explicit that the outputs and their proofs live outside the
machine, in the Cartesi Node, and that the machine state commits to
them through the output hashes root hash alone. Add that proofs for a
past epoch are produced when that epoch is finalized and are checked
against the hash settled then, not against a later one.

Rename "outputs tree" to "output hashes tree" and "rolling calculator"
to "Rolling Cartesi Machine calculator" throughout, and unify the root
name as "output hashes root hash", removing the stray "output-hashes
Merkle root" from the guest pseudocode.
The option only writes a file for memory ranges. Device ranges hold no
state and are always pristine, so they are skipped, which makes "memory
ranges" the accurate name. This reverts the rename made earlier in this
unreleased cycle and restores the 0.20.0 spelling.

Rename the option, its handler variable, and the dump function, and align
the help text, the spec-cm-cli coverage, and the doc prose, fixing the
prose that claimed all mapped spans in the address space were dumped.
The static verify_step, verify_step_uarch, verify_reset_uarch, and
verify_send_cmio_response now take their root-hash inputs as
const_machine_hash_view instead of const machine_hash &, matching the
revert-hash parameter that already used the view. A view is a non-owning
borrow that suits a consumed parameter, while owning hashes are still
returned by value or const reference.

The uarch and send-cmio replay context constructors take the view as well,
which lets the verify bodies pass the parameter straight through with no
materialization copy. Comparisons of an owning machine_hash against a view
use std::ranges::equal.

Also rename the freestanding std::vector guard in machine-hash.hpp from the
ZKARCHITECTURE/MICROARCHITECTURE pair to a single NO_STD_VECTOR, defined by
the uarch and risc0 builds, so the include site names the actual constraint
instead of inferring it from the architecture.
mpernambuco
mpernambuco previously approved these changes Jun 26, 2026

@mpernambuco mpernambuco left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Nice call with i_accept_dirty_pages

Comment thread src/machine.cpp
…stdin

htif-console.lua now passes input_source = "from_buffer" in the runtime
config and calls machine:write_console_input to supply the test string,
removing the dependency on host stdin. Because run() breaks early with
BREAK_REASON_CONSOLE_INPUT when the buffer drains, the test loops until
iflags_H is set. run-lua-tests.sh drops the echo pipe that was feeding
stdin to every test script, since only htif-console.lua ever consumed it.
@diegonehab diegonehab force-pushed the feature/revert-root-hash branch from 9b3880f to b1988c5 Compare June 26, 2026 15:28
machine::write_memory has a special case that replaces the entire shadow
state and reinitializes the hot TLB. Pages written by the guest through
the write TLB are dirtied in host memory but only tracked by the write
TLB until the next hash update flushes them into the dirty-page tree.
Resetting the hot TLB and overwriting the shadow TLB discarded that
pending information, so any such page lost its dirty mark. A later
incremental hash then failed to recompute it and produced a root hash
that did not reflect actual memory.

Flush the write TLB dirty pages into the dirty-page tree before replacing
the shadow, matching what update_hash_tree and verify_hash_tree already do.

Add a regression test that runs a small program whose store dirties a page
through the write TLB, overwrites the shadow state, and checks that the
root hash still matches the hash computed directly from memory. The test
fails before this change and passes after.
@diegonehab diegonehab force-pushed the feature/revert-root-hash branch 2 times, most recently from b1a927a to f7ad3ee Compare June 29, 2026 18:29
…tion

The uarch PC coverage merge resolves captured uarch PCs against the
coverage uarch ELF. reset_uarch restores the embedded pristine RAM,
which is generated from the stock uarch-ram.bin. advance_machine_with_uarch_coverage
resets the uarch after every machine instruction, so only the first
instruction of each test ran the coverage binary. Every later
instruction ran the stock binary while its PCs were resolved against
the coverage symbol table. That dropped real coverage of interpret.cpp
branches (for example the verify_cold read and write slow paths) and
credited unrelated lines.

Scope the fix to the PC-collection test instead of changing the global
pristine, so the functional uarch tests keep running the fast stock
uarch. Under coverage=yes the uarch build also emits a pristine ram/hash
pair generated from the instrumented binary. test-coverage-uarch-pcs
relinks cartesi.so with that pair (swapping only the pristine objects,
no instrumented object recompiled) so reset_uarch restores the
instrumented uarch for the whole run. The relink permanently swaps the
embedded pristine in the build tree, so test-coverage-uarch-pcs is moved
to run last among the uarch tests in CI.

Also compile the coverage uarch with -ffunction-sections and -fno-inline
so each function keeps a clean, non-overlapping address range. Without
it, -O0 leaves overlapping symbols and inlined bodies that make
addr2line attribute PCs to the wrong functions.
dont_write_x0.S exercises the rd==0 (discard write) handler path for many
instructions, but not the M-extension division and remainder ops. Their
execute_*<rd_kind::x0> branches were therefore never covered. Add div,
divu, rem, remu and the word variants divw, divuw, remw, remuw with rd=x0,
following the existing pattern. Update the expected cycle count.
The Zcb handlers (c.mul, c.zext/sext, c.not, c.lbu/lhu/lh, c.sb/sh) were
not covered: the upstream rv64uc-p-rvc test predates Zcb, and compressed.S
only covers hint/illegal/trap behavior. Add zcb.S, which value-tests each
Zcb instruction. Results match the non-compressed counterparts, so the
vectors are taken from the corresponding rv64um/rv64ui tests; operands use
the compressed register set (x8-x15).
The FS-is-OFF checks in the FP load, store, fused-multiply, and OP-FP
handlers were never reached by any deliberate test. FSW was only hit
incidentally by the fuzzer seed program. Add a block to illegal_insn.S
that fires each of FLW, FLD, FSW, FSD, FMADD, FMSUB, FNMSUB, FNMADD,
and OP-FP before float state is enabled in mstatus, asserting that every
one traps as an illegal instruction. Update the expected final mcycle for
illegal_insn from 972 to 1080 in the test catalog and fuzz seed corpus
generator to match the nine new cases.
… spec

Walk the shadow register page by address, derive each register name from
get_address_name, and round-trip a 64-bit value through read_reg/write_reg
for every writable register. Assert that x0, mvendorid, marchid, and mimpid
reject writes. Round-trip each htif tohost/fromhost field alias separately.
Add write_word tests asserting rejection of unmapped shadow state addresses,
the protected shadow TLB range, and a read-only nvram range. These tests
cover the exhaustive read_reg/write_reg switch arms and write_word error
paths that were previously unreachable.
The dont_write_x0 test only exercised lb into x0, leaving the
_rd0 dispatch labels for lbu, lh, lhu, lw, lwu, and ld uncovered.
Add each of those loads followed by a bnez x0 guard, and widen
the data datum from .word to .dword so the 8-byte accesses stay
within bounds. Update the expected cycle count to 92.
Under a coverage build, a machine.fork in the remote server left the parent
and child both flushing gcov counters to the same ../src/*.gcda files at exit.
Running concurrently, that races and libgcov prints "Merge mismatch for
function N" to the child's stdout, which a captured-output test then sees as
unexpected data (e.g. "X" vs "Xlibgcov profiling error:..."), failing
test-jsonrpc.

In the child, redirect counters to a per-pid directory via GCOV_PREFIX and
reset them so the child records only its post-fork execution. The coverage
report symlinks each child's .gcno alongside its .gcda and sums the per-child
trees back into the merged .gcov, so no coverage is lost. Active only under
CODE_COVERAGE when the test harness sets CARTESI_COVERAGE_FORK_DIR.

run-gcov.lua gains a generic "extra .gcda" argument for this; the Makefile owns
the per-child symlinking and file discovery. Also fix its header comment, which
claimed the merge takes the maximum count per line when it actually sums them.
@diegonehab diegonehab force-pushed the feature/revert-root-hash branch from a8d01bd to 57937cc Compare July 1, 2026 22:11
Coverage and sanitizer test suites ran serially even though most tests
do not share mutable state. Restructure tests/Makefile so make -j
drives them, and parallelize the gcov merge so the report step scales
with the available cores.

test-lua is split into one make target per lua program and per lester
spec suite, so each runs as its own process and can be scheduled
independently. Every spec gained parse_args/report/exit calls so it
runs standalone. The old test-spec.lua aggregator is gone, and the
installed test-spec wrapper now loops over the spec files instead. A
PARALLEL_TESTS group lists the targets safe to run concurrently.
test-coverage-uarch-pcs swaps the embedded uarch pristine in place to
collect PC coverage, so it must not overlap any other test, and
coverage-report must read every .gcda last. Both are sequenced after
the group with order-only prerequisites. coverage-all chains the
group, PC collection, and report, then relinks the stock pristine (new
swap-uarch-pristine-stock target) so the tree is ready for the next
run.

Forked jsonrpc servers still redirect their gcov counters to a per-pid
directory and reset them inline in the fork handler. run-gcov.lua now
merges those per-child directories back in with a parallel map-reduce.
The gcda files are partitioned across forked workers, each worker
merges its own subset via gcov -t without touching disk, and the
parent reduces the partials the same way. Worker count defaults to the
online CPU count and can be overridden with RUN_GCOV_JOBS.

Under coverage=yes NUM_JOBS is forced to 1 so the lua test runners do
not fork workers that would race the shared .gcda files. Parallelism
now comes entirely from make -j across targets.

Add clean-coverage to remove stale gcda, luacov, and uarch PC data
between runs, used by run-coverage-local.sh. Declare test targets
.PHONY, fix SPEC_FILTER quoting, and make tests that wrote fixed /tmp
paths use mktemp/os.tmpname/per-process temp dirs so concurrent runs
do not collide.

CI runs the coverage job as make -j coverage-all and the sanitize job
as make -j parallel-tests, and drops the dead "Show uncovered lines"
step. The top-level Makefile forwards clean-coverage and documents
coverage-all in help.

This also fixes the bug that aborted parallel coverage runs. An
order-only prerequisite variable that carried its own leading "|"
expanded into a second pipe in a prerequisite list, which make parsed
as a literal prerequisite named "|". The pipe now lives only in the
rules that need it.
@diegonehab diegonehab force-pushed the feature/revert-root-hash branch from 57937cc to 1e7f19a Compare July 1, 2026 22:26
apt-get update --snapshot=X does not pin the installs that follow it.
The snapshot flag is honored only on the command it is passed to, where
apt transiently rewrites the source URIs to the snapshot host. A plain
apt-get install afterwards reads the live lists still sitting in
/var/lib/apt/lists and resolves against the live archive.

As a result the rootfs was assembled from the live archive rather than
the pinned snapshot, baking in package versions absent from that
snapshot. It broke once the live archive advanced past a version still
recorded in the committed image (curl 8.5.0-2ubuntu10.9). The license
report could no longer resolve that source at the pinned snapshot, which
only ever held 8.5.0-2ubuntu10.8, and the docs build failed.

Pass --snapshot on the install commands in both the cross-toolchain and
rootfs stages so the build resolves against the pinned snapshot, the
same way the license scanner already does. Rebuild rootfs-docs.ext2 and
regenerate the license report against the snapshot.
@diegonehab diegonehab force-pushed the feature/revert-root-hash branch from ef36663 to 4382cb5 Compare July 2, 2026 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: Waiting Merge

Development

Successfully merging this pull request may close these issues.

3 participants