fix(validator): recipe restart-safety on Agave 4.1+ (port range + restart-only flags)#402
Merged
Merged
Conversation
…ave 4.1+ Agave/Jito 4.1.0+ rejects a dynamic-port-range smaller than 27 ports: "Invalid value for '--dynamic-port-range': Port range is too small." The recipe defaults (8000-8025 / 8900-8925 = 26 ports, pythnet 8000-8020 = 21) caused new validators to crash-loop instantly on start (ExecStart status=1). Bump every `--dynamic-port-range` default and the init/inventory defaults to a 31-port range (8000-8030 / 8900-8930) across validator, RPC, and pythnet start templates. Firedancer TOML configs are left as-is (separate parser). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… start scripts Review follow-up: the mainnet-rpc agave start scripts (index/grpc/tx/main) hardcode `--dynamic-port-range 8000-8020` (21 ports) instead of using the templated default, so they were missed by the first pass and would also crash-loop on Agave 4.1+. Bump to 8000-8030. Firedancer TOML configs (separate parser) remain unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lates The testnet agave/jito start templates emitted --wait-for-supermajority and --expected-bank-hash unconditionally with hardcoded defaults. These are only valid for a coordinated cluster restart; on a normal restart a stale slot/hash makes the validator hang or panic with a bank-hash mismatch. Gate both behind `is defined` (matching the generic start-validator.sh.j2) so they appear only when an operator sets them. --expected-shred-version stays (default 57087). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
POPPIN-FUMI
added a commit
that referenced
this pull request
Jun 6, 2026
…ed-version (#405) * fix(rpc): gate testnet-rpc expected-bank-hash; template hardcoded shred-version testnet-rpc/start-validator.sh.j2 emitted `--expected-bank-hash <stale> ` and a hardcoded `--expected-shred-version 57087` unconditionally. The bank hash is a coordinated-cluster-restart parameter; carrying a stale value into a normal restart can hang the node or fail with a bank-hash mismatch (same class of bug already fixed for the testnet *validator* templates in #402). Now: - expected-shred-version is templated with a default (overridable per host) - expected-bank-hash is gated behind `is defined` (emitted only when set) Also template the hardcoded `--expected-shred-version 50093` in the mainnet-rpc (index/grpc/tx/main + start-validator) and mainnet-validator start scripts so the shred version is overridable; the default preserves current behavior. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * harden(rpc/validator): gate restart params on truthiness, not just is defined Review follow-up: `{% if x is defined %}` is true for a declared-but-empty var, and the example inventories document `expected_bank_hash: ""` / `wait_for_supermajority: ""` as the commented "unset" form. Uncommenting that would render `--expected-bank-hash ` (empty) and break the launch. Gate on `is defined and x` so empty-string/null also suppresses the flag. Applied to the new testnet-rpc gate and the existing testnet-validator gates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Agave/Jito 4.1.0+ rejects a
--dynamic-port-rangesmaller than 27 ports:The recipe defaults were 26 ports (
8000-8025,8900-8925) — and8000-8020(21) for pythnet — so a freshly deployed/upgraded 4.1+ validator crash-loops instantly at start (ExecStart … status=1/FAILURE, ~3ms CPU). Older versions accepted 26 ports, so this only surfaces on the version bump.Fix
Bump every
--dynamic-port-rangedefault (and the init/inventory defaults that feed it) to a 31-port range:8000-8025→8000-80308900-8925→8900-89308000-8020→8000-8030(pythnet)Covers validator, RPC, and pythnet start templates plus the TypeScript init defaults (
initTestnetConfig,initMainnetConfig,addMainnetInventory, the RPC inits, and the AI console tool). Firedancer TOML configs use a separate parser and are left unchanged.Existing inventories that already set
dynamic_port_rangeexplicitly are unaffected; this only changes the fallback default. Hosts with a too-small explicit value should widen it to ≥27 ports.🤖 Generated with Claude Code