Skip to content

solx 0.5.0: thin spine — stdlib argparse dispatch, 6–13× faster startup#30

Merged
Shu-Wan merged 14 commits into
mainfrom
v0.5.0-thin-spine
Jun 11, 2026
Merged

solx 0.5.0: thin spine — stdlib argparse dispatch, 6–13× faster startup#30
Shu-Wan merged 14 commits into
mainfrom
v0.5.0-thin-spine

Conversation

@Shu-Wan

@Shu-Wan Shu-Wan commented Jun 11, 2026

Copy link
Copy Markdown
Owner

v0.5.0 — thin spine

Rewrites the CLI's dispatch layer on the Python standard library (argparse,
replacing Typer/click/rich) so startup latency drops to the same order as a
raw SLURM call. The skill no longer steers agents to raw squeue for one-off
reads — solx and raw SLURM reads are now treated as equivalent.

Startup latency

Warm median on a Sol compute node (NFS $HOME, single-file .pyz install):

command raw squeue v0.4.0 v0.5.0 speedup
solx --version 1.35s 0.10s 13×
solx job list 0.08s 2.51s 0.39s 6.4×
solx job time 0.08s 2.51s 0.31s 8.1×

The win is removing the Typer/click/rich import tree: --version/version
short-circuit before the parser tree is built, command bodies import inside
their handlers, and --json/piped runs never load rich.

Changes

  • CLI dispatch is stdlib argparse — entry point solx.main:main
    (replacing solx.cli:app). Command surface, aliases, exit codes, and the
    output contract are unchanged apart from two documented supersets (--json
    placement and -h).
  • Static shell completionssolx completions <bash|zsh|fish> renders
    fully static scripts from one description; nothing execs solx at completion
    time, so the first Tab of a session costs no interpreter start.
  • Behavioral parity matrix (evals/parity/) — 80 cases over the full
    command surface, each run in an isolated fake $HOME against deterministic
    SLURM mocks and compared byte-for-byte against a captured golden run. Used to
    verify the rewrite reproduces v0.4.0 behavior.
  • Docs, changelog, roadmap, and SKILL.md updated for the latency results and
    the solx/raw-SLURM equivalence; ~/.solkeep removal deferred to 1.0.0.

Upgrading

Completion scripts installed as files must be regenerated after upgrading (e.g.
zsh fpath: solx completions zsh > ~/.zfunc/_solx). Scripts from solx ≤ 0.4.0
use the Typer completion protocol, which 0.5.0 answers with zero candidates.
Eval/source install modes regenerate each shell and need no action.

Release

Merge this PR, then push the unprefixed v0.5.0 tag to trigger the CI release.

🤖 Generated with Claude Code

Shu-Wan and others added 14 commits June 9, 2026 14:28
The legacy keep-list fallback stays supported through the 0.x line; one
release was not enough migration runway, so removal now lands with 1.0.0.
The deprecation nudge in solx keep names the new version via
SOLKEEP_REMOVED_IN.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Completion no longer shells back into solx: one data structure describes
the CLI surface (commands, subcommands, flags, descriptions) and
bash/zsh/fish scripts are rendered from it as fully static text, so the
first Tab of a session costs no interpreter start — which on Sol's NFS
home is the difference between instant and a ~1s stall.

The zsh script keeps the dual-mode footer (eval/source registers via
compdef; fpath autoload calls the completer directly) so both install
modes complete on the first Tab. Tests assert the script shapes, that
every command is listed in all three shells, and run each shell's
syntax checker over the emitted script when the shell is installed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
On Sol's NFS home, importing typer alone costs ~1s per invocation and is
the entire startup-latency floor. The new solx.main builds an argparse
tree (allow_abbrev=False everywhere, matching the old no-prefix-match
behavior) and keeps module-level imports to the stdlib plus __version__;
command bodies, rich, and pathspec load only inside the handler that
needs them. typer, click, and shellingham drop out of the dependency
set entirely.

Dispatch preserves the v0.4.0 surface byte-for-byte on the parity
matrix: bare --version/version fast path, hidden jobs/ls aliases
rewritten before parsing, help-on-stdout + exit 2 for the root and bare
groups, and a hand-rolled 'job start' tail parser (options consumed
anywhere, first unconsumed token names the template — including after
'--' — first '--' swallowed, everything else passed through to salloc
in order). One deliberate superset: every leaf except 'job start' now
also accepts a trailing --json; after 'job start' it remains salloc
passthrough.

CLI dispatch tests are ported off CliRunner onto main([...]) +
SystemExit with the same monkeypatch seams, plus import-hygiene checks
(no typer in sys.modules after a dispatch; importing solx.main never
pulls rich). The zipapp entry point follows to solx.main:main.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
zipapp -p makes the artifact directly executable in place
(./dist/solx.pyz) instead of requiring an explicit interpreter.
install.sh strips that build-machine shebang before stamping one
bound to the destination machine's uv-resolved interpreter, so the
installed binary carries exactly one shebang and stays bound to the
interpreter version the bytecode was compiled for. Shebang-less
artifacts still install unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
solx --version and solx version short-circuit in main() before the
parser tree is built, but importing solx.main still paid module-scope
argparse, pathlib, and typing -- on Sol's NFS home, and under CPU
contention, that import cost is most of the command's wall time.
Import argparse and pathlib where the parser tree is built (and in the
one handler that uses Path), and replace typing.TYPE_CHECKING with a
module-level constant, so importing solx.main loads nothing beyond the
interpreter's startup set.

Measured on a 4-core allocation at load ~126: warm venv solx --version
median 0.23s over 56 runs (batch medians 0.11-0.27s), down from
0.41-0.54s before this change. python -X importtime -c "import
solx.main" now lists only the solx package and __future__. Parity
matrix vs golden-v040: 65/67 pass, 2 expected-diff, 0 fail; all 220
tests pass; ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Port the harness that verified the dispatch-layer rewrite into the repo
as a durable eval asset: 67 cases over the full command surface, each in
an isolated fake HOME with deterministic SLURM mocks, captured as
stdout/stderr/exit code and compared byte-for-byte between two solx
builds. Future surface-preserving rewrites (the native single-binary
port is already in development) need the same proof, and the harness is
useless if it lives in /tmp. Goldens stay uncommitted because they are
environment-captured; the README documents the capture/compare workflow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The thin-spine rewrite changed what the docs should tell people, in
three ways:

- Latency guidance inverted. A warm solx job read now costs ~0.13s with
  the .pyz install vs ~0.08s for raw squeue (measured on a Sol compute
  node, warm median of 9), so SKILL.md, references/solx.md, docs/solx.md,
  coverage.md, and the bench script's takeaway stop steering agents to
  raw squeue/scancel for one-off reads; raw commands stay documented as
  equivalents and as the no-solx fallback. The full measured table
  (v0.4.0 vs v0.5.0, venv vs pyz, with the cold-ish and filesystem-
  placement caveats) lives in ROADMAP.md.
- Roadmap: Stage 4 is shipped; Stage 5 is the native single-binary
  rewrite (Rust) targeting v1.0 — cold-start immunity on NFS, no
  Python/uv runtime, single-file install. Decisions confirmed now record
  argparse as the CLI framework, static generated completion scripts,
  and the solkeep removal moving from 0.5.0 to 1.0.0 (every doc that
  named 0.5.0 as the removal release is updated to match
  keep.SOLKEEP_REMOVED_IN).
- Behavior/manual updates: --json is accepted after the subcommand too
  (except job start, whose tail passes through to salloc); completions
  are fully static, with the zsh fpath install mode documented alongside
  eval/source; solx/DEVELOPMENT.md's architecture, aliases, and coverage
  sections describe main.py/_completions.py and the runtime dependency
  list (rich + pathspec). CHANGELOG records all of it under Unreleased.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pin the review-confirmed dispatch divergences with goldens: `--`
shielding for sbatch passthrough, repeated `--`, bundled short flags,
`--dry-run=true`, version-command junk args, `keep -j 0`, `help job`,
and `-h`. The js-* shielding cases are STRICT (byte parity); the
error-wording cases are RELAXED (exit-code parity only, wording may
differ from Click's); `-h` is EXPECTED_DIFF as a documented v0.5.0
superset (v0.4.0 exits 2, v0.5.0 prints help and exits 0).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- job start: after the first `--` no token is parsed as a flag — the
  first leftover still names the template, later `--` tokens forward
  literally — and pre-`--` short bundles (-nn) expand when every letter
  is a known short flag; --dry-run=X is a usage error. Without the
  shield, `gpu -- -n` silently flipped a submit into a dry-run and
  -n/--timeout could not be forwarded to salloc at all.
- version: fast-path only bare --version/version; everything else goes
  through argparse with a deferred --version action, and the version
  subcommand takes no arguments, so junk argv exits 2 again instead of
  printing the version.
- keep: -j/--jobs must be >= 1 (exit 2), restoring the min=1 bound the
  Typer option enforced.
- _SOLX_COMPLETE: exit 0 silently, so completion scripts generated by
  solx <= 0.4.0 (Typer's runtime protocol, persisted by fpath installs)
  get zero candidates instead of parsing help text as completions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Parser/table drift: config edit, completions, version, and help no
longer take a trailing --json (matching the COMMANDS table, which never
offered it — their output is one fixed text); a pinning test now walks
the argparse tree and asserts COMMANDS mirrors it (commands,
subcommands, flag forms, positionals), with the --stage/shell choice
tuples pinned to their owners, so the two can never drift silently
again. The module docstring claims correspondence, not byte-equal help
strings.

bash script: COMPREPLY is filled via mapfile (no IFS word splitting, no
glob expansion — a path with spaces is one candidate) with a guarded
'compopt -o filenames' on the --solkeep/--csv-dir sites so readline
escapes inserted paths; mid-word Tab completes against the part of the
word left of the cursor (COMP_LINE/COMP_POINT); leaf and subcommand
flag lists include -h; group commands offer -h/--help; positional
choices are not re-offered once filled.

zsh: group functions (_solx_job/_solx_config) offer -h/--help.
fish: group level and every leaf offer -h/--help; the completions
shell argument is guarded so bash/zsh/fish are not re-offered.

Functional bash probes (simulated COMP_WORDS/COMP_LINE) cover the
space/glob/mid-word/re-offer cases; fish behavior verified with
complete -C.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CHANGELOG: an Upgrading note — file-installed completion scripts (zsh
fpath) must be regenerated after upgrading because <=0.4.0 scripts use
the _SOLX_COMPLETE runtime protocol that 0.5.0 answers with zero
candidates; a superset entry for -h alongside the --json placement one,
and the parity claim now points at both; --json's no-trailing-flag
commands named; parity matrix case count 67 -> 80 (here and in
evals/parity/README.md).

docs/solx.md: the completions section tells fpath installs to rerun the
redirect after every upgrade, and the scripting section names the four
commands that take no trailing --json.

keep.py: the comment above SOLKEEP_REMOVED_IN said the fallback loses
support 'in this release line' while the constant says 1.0.0; it now
describes the current schedule.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a Highlights speedup table at the top of the unreleased section and
align the latency figures across CHANGELOG and ROADMAP on the
apples-to-apples NFS-home install (both .pyz on ~/.local/bin), the
location install.sh actually writes to. The node-local /tmp figure stays
documented as the best case rather than as the recommended install.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The deferred client side isn't laptop-specific; "local machine" covers
any workstation a user SSHes to Sol from.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a build job that runs build-pyz.sh and uploads dist/solx.pyz, so a
reviewer can install and test a PR's solx on Sol without building it.
DEVELOPMENT.md documents building/installing locally and fetching the
artifact from a PR (install via install.sh, which re-stamps the shebang
for the local interpreter).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Shu-Wan Shu-Wan merged commit aea8688 into main Jun 11, 2026
6 checks passed
@Shu-Wan Shu-Wan deleted the v0.5.0-thin-spine branch June 11, 2026 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant