Skip to content

fix(validator): recipe restart-safety on Agave 4.1+ (port range + restart-only flags)#402

Merged
POPPIN-FUMI merged 3 commits into
mainfrom
fix/dynamic-port-range-min
Jun 6, 2026
Merged

fix(validator): recipe restart-safety on Agave 4.1+ (port range + restart-only flags)#402
POPPIN-FUMI merged 3 commits into
mainfrom
fix/dynamic-port-range-min

Conversation

@POPPIN-FUMI

Copy link
Copy Markdown
Contributor

Problem

Agave/Jito 4.1.0+ rejects a --dynamic-port-range smaller than 27 ports:

error: Invalid value for '--dynamic-port-range <MIN_PORT-MAX_PORT>': Port range is too small. Try --dynamic-port-range 8900-8926

The recipe defaults were 26 ports (8000-8025, 8900-8925) — and 8000-8020 (21) for pythnet — so a freshly deployed/upgraded 4.1+ validator crash-loops instantly at start (ExecStart … status=1/FAILURE, ~3ms CPU). Older versions accepted 26 ports, so this only surfaces on the version bump.

Fix

Bump every --dynamic-port-range default (and the init/inventory defaults that feed it) to a 31-port range:

  • 8000-80258000-8030
  • 8900-89258900-8930
  • 8000-80208000-8030 (pythnet)

Covers validator, RPC, and pythnet start templates plus the TypeScript init defaults (initTestnetConfig, initMainnetConfig, addMainnetInventory, the RPC inits, and the AI console tool). Firedancer TOML configs use a separate parser and are left unchanged.

Existing inventories that already set dynamic_port_range explicitly are unaffected; this only changes the fallback default. Hosts with a too-small explicit value should widen it to ≥27 ports.

🤖 Generated with Claude Code

POPPIN-FUMI and others added 3 commits June 6, 2026 14:42
…ave 4.1+

Agave/Jito 4.1.0+ rejects a dynamic-port-range smaller than 27 ports:
  "Invalid value for '--dynamic-port-range': Port range is too small."
The recipe defaults (8000-8025 / 8900-8925 = 26 ports, pythnet 8000-8020 = 21)
caused new validators to crash-loop instantly on start (ExecStart status=1).

Bump every `--dynamic-port-range` default and the init/inventory defaults to a
31-port range (8000-8030 / 8900-8930) across validator, RPC, and pythnet start
templates. Firedancer TOML configs are left as-is (separate parser).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… start scripts

Review follow-up: the mainnet-rpc agave start scripts (index/grpc/tx/main) hardcode
`--dynamic-port-range 8000-8020` (21 ports) instead of using the templated default,
so they were missed by the first pass and would also crash-loop on Agave 4.1+.
Bump to 8000-8030. Firedancer TOML configs (separate parser) remain unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lates

The testnet agave/jito start templates emitted --wait-for-supermajority and
--expected-bank-hash unconditionally with hardcoded defaults. These are only
valid for a coordinated cluster restart; on a normal restart a stale slot/hash
makes the validator hang or panic with a bank-hash mismatch. Gate both behind
`is defined` (matching the generic start-validator.sh.j2) so they appear only
when an operator sets them. --expected-shred-version stays (default 57087).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@POPPIN-FUMI POPPIN-FUMI changed the title fix(validator): widen default dynamic_port_range for Agave 4.1+ (>=27 ports) fix(validator): recipe restart-safety on Agave 4.1+ (port range + restart-only flags) Jun 6, 2026
@POPPIN-FUMI POPPIN-FUMI merged commit e83272f into main Jun 6, 2026
3 checks passed
@POPPIN-FUMI POPPIN-FUMI deleted the fix/dynamic-port-range-min branch June 6, 2026 05:57
POPPIN-FUMI added a commit that referenced this pull request Jun 6, 2026
…ed-version (#405)

* fix(rpc): gate testnet-rpc expected-bank-hash; template hardcoded shred-version

testnet-rpc/start-validator.sh.j2 emitted `--expected-bank-hash <stale> ` and a
hardcoded `--expected-shred-version 57087` unconditionally. The bank hash is a
coordinated-cluster-restart parameter; carrying a stale value into a normal
restart can hang the node or fail with a bank-hash mismatch (same class of bug
already fixed for the testnet *validator* templates in #402). Now:
- expected-shred-version is templated with a default (overridable per host)
- expected-bank-hash is gated behind `is defined` (emitted only when set)

Also template the hardcoded `--expected-shred-version 50093` in the mainnet-rpc
(index/grpc/tx/main + start-validator) and mainnet-validator start scripts so the
shred version is overridable; the default preserves current behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* harden(rpc/validator): gate restart params on truthiness, not just is defined

Review follow-up: `{% if x is defined %}` is true for a declared-but-empty var,
and the example inventories document `expected_bank_hash: ""` /
`wait_for_supermajority: ""` as the commented "unset" form. Uncommenting that
would render `--expected-bank-hash  ` (empty) and break the launch. Gate on
`is defined and x` so empty-string/null also suppresses the flag. Applied to the
new testnet-rpc gate and the existing testnet-validator gates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant