Skip to content

zcashd-anchored harness tests bottleneck at ~6.4 s on Orchard proof-system parameter loading #254

@zancas

Description

@zancas

Summary

After the recent harness optimizations (transparent-default miner, generatetoaddress switch, bind-then-release collision pre-check, post-launch_once instrumentation), every zcashd-anchored integration test in zcash_local_net now lands at the same ~6.4 s wall-clock floor. There's no harness-side knob left to push it below that — the cost lives in zcashd's startup, before its RPC binds.

This issue is the harness-side record of where we landed and why; a sibling issue will be filed on zcash/zcash (or wherever zcashd's bug tracker lives) to request the daemon-side change that would unlock further savings.

The data

Most-recent native sequential nextest run, all tests at the floor (within 8 ms of each other):

PASS [   6.380s] launch_zcashd_with_nu6_1_at_height_2
PASS [   6.382s] launch_recovers_from_rpc_port_collision::zcashd
PASS [   6.385s] launch_recovers_from_rpc_port_collision::lightwalletd
PASS [   6.381s] launch_zcashd_custom_activation_heights
PASS [   6.387s] launch_localnet_lightwalletd_zcashd
PASS [   6.381s] launch_zcashd
PASS [   6.388s] launch_localnet_zainod_zcashd

Decomposing one of them via the in-tree tracing::info! instrumentation in Zcashd::launch_once:

Phase ~Duration Notes
Binary spawn + config parse ~0.3 s small
Loading Orchard proof system parameters ~6.0 s dominant term
Sapling parameters ~0 s bundled in the binary
RPC HTTP server bind + 'Done loading' ~0.05 s once params are loaded
Genesis block mine via generatetoaddress (Transparent miner) ~0.2 s no Halo2 cost on transparent coinbase
zcash-cli stop shutdown ~0.5 s graceful shutdown of the daemon
Total ~6.4 s matches observed wall

zcashd's own log confirms it (Zcash Daemon version v6.12.1-840b9ceaf):

0.000s  spawn
...
0.300s  'Loading Orchard parameters'
6.300s  'Loaded proof system parameters in 6.30s seconds.'
6.300s  bind RPC :PORT — completes ~immediately

The proof-param load is before the RPC bind. Everything else is microseconds.

What workload do the tests actually exercise? Validation-only.

A grep across both repos that consume this harness — infras/dev/zcash_local_net/tests/ and the downstream zainos/dev/integration-tests/ — shows zero active call sites that drive zcashd to create shielded proofs:

Probe Active call sites
z_sendmany / z_shieldcoinbase / z_mergetoaddress directly against zcashd 0
Mining to an Orchard or Sapling miner address (-mineraddress=u…, PoolType::ORCHARD/SAPLING in set_test_parameters) 0 in active code paths
quick_shield / from_inputs::quick_send ~25 (zaino integration-tests), but all client-side: zingolib lightclients build the proof locally and submit only the finished transaction via sendrawtransaction. zcashd verifies but does not create.

What zcashd actually does in our suite:

  1. Validate transactions submitted via sendrawtransaction — needs verifying keys (small, fast to load).
  2. Validate blocks during sync — same, verifying keys only.
  3. Mine coinbase outputs to a transparent address — no Halo2/Groth16 proving.

What zcashd never does:

  • Build a shielded proof itself.

So the ~100 MiB of Orchard proving parameters loaded at startup is dead weight for our entire integration-test surface. The proving keys are loaded but never consulted.

Why the harness can't go lower on its own

What's already shipped (multiple commits on the zcash_local_net refactor_and_upgrade branch):

  1. Transparent default minerZcashdConfig::default() mines coinbase to a transparent address. Saves several seconds per test that mines >0 extra blocks.
  2. Switched to generatetoaddress — explicit address argument, no wallet-default lookup, fewer config-file dependencies on the hot path.
  3. Bind-then-release pre-check in launch::with_retry_on_collision — saved ~3 s on the zcashd collision-recovery test (9.2 s → 6.4 s) and ~11.6 s on the lightwalletd collision-recovery test (18.0 s → 6.4 s).
  4. Tightened poll cadences — total gain <100 ms; in the noise.

What was tried and rejected as not viable:

  1. -disablewallet — zcashd 6.12.1 still registers generatetoaddress (and generate) in the wallet RPC group, so launching with -disablewallet strips the methods we need to mine. The harness's Validator::generate_blocks then silently 0-blocks until the chain-poll timeout fires (60 s) and the test panics. Field plumbing left in place as ZcashdConfig::disable_wallet (default false) for future work; today it's a known-broken opt-in.
  2. Skipping txindex / insightexplorer / experimentalfeatures from the config — those gate index builds done later in startup; they don't affect proof-param load.

What we have not tried (because it wouldn't help on the dominant term):

  1. chain_cache reuse — would save genesis re-bootstrap, but proof-param loading happens before any chain state is consulted. No effect on the 6 s floor.
  2. Sequential nextest with warm OS page cache — running zcashd N times in sequence does drop subsequent loads to ~3 s via the kernel page cache. But forcing sequential test execution undoes the parallelism win, so net wall-clock is worse, not better. CI machines on cold caches pay the full 6 s on every test regardless.

What the harness can still do (small wins, mentioned for completeness)

  • Parallelize teardowns — the ~0.5 s zcash-cli stop per test is sequential per-test today. Modest, ~10% of test wall, low risk.
  • Pre-build a chain cache once per workspace and reuse it for tests that need a chain at height N — useful for future tests that mine many blocks; orthogonal to the current floor.

What zcashd-side change would move the floor — and how the harness will use it

Filed-separately issue (link forthcoming): a zcashd CLI flag that skips loading the proving keys entirely — something like -noshieldedproving or a more zcashd-idiomatic name. Behavior:

  • Load verifying keys (small, fast — milliseconds, not seconds). Block validation, transaction validation, sync all work.
  • Skip loading proving keys. The ~6 s Orchard parameter load is not done.
  • Reject the proof-creating RPCs (z_sendmany, z_shieldcoinbase, z_mergetoaddress, mining to a shielded mineraddress, anything else that would need a proof) with a clear error pointing at the flag.
  • Permit everything else: validate transactions, validate blocks, mine to transparent addresses, all RPC-sync-related operations.

For our suite, the flag would drop zcashd cold-start from ~6.4 s → ~0.4 s for the lifecycle tests.

The diff is much smaller than implementing parameter caching: it's a bool config option + skipping a constructor + guards at the proving-RPC call sites. That's why this is the right ask, given the validation-only finding above.

Harness-side opt-in for tests that legitimately need proving

The harness will pair the upstream flag with a ZcashdConfig field:

pub struct ZcashdConfig {
    // ... existing fields ...

    /// When `true` (the default), zcashd is launched with the
    /// upstream "skip proving-key load" flag. Saves ~6 s of cold
    /// start on every spawn. Set to `false` for any test that
    /// drives zcashd to *create* a shielded proof — `z_sendmany`,
    /// `z_shieldcoinbase`, mining to a shielded address, etc.
    /// (No current test in this repo or its downstream consumers
    /// needs to set this to `false`; the field exists so that
    /// future tests which legitimately need proof creation can
    /// opt in without regressing the rest of the suite.)
    pub disable_shielded_proving: bool,
}

Default::default() returns true — the suite-wide default is fast. Tests that exercise zcashd's proof-creation paths set it false explicitly, paying the ~6 s setup cost only on the (small) number of tests that genuinely need it.

This shape parallels the existing ZcashdConfig::disable_wallet field: a knob whose default optimizes for the common case (validation only / no wallet), with an explicit per-test override available when something needs the heavier path.

Tracking

  • Sibling zcashd-side issue: TBD (zancas to file).
  • Harness changes that brought us to this floor: see the refactor_and_upgrade branch on this repo, ending in commit acd5eb0 and the bind-pre-check follow-up.
  • This issue's disable_shielded_proving config-field plan lands in the harness once the upstream flag is available and named.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions