Skip to content

v1.0: retire Python, ship the native Rust binary#32

Merged
Shu-Wan merged 23 commits into
mainfrom
v1.0-rust
Jun 11, 2026
Merged

v1.0: retire Python, ship the native Rust binary#32
Shu-Wan merged 23 commits into
mainfrom
v1.0-rust

Conversation

@Shu-Wan

@Shu-Wan Shu-Wan commented Jun 11, 2026

Copy link
Copy Markdown
Owner

v1.0 — the native Rust binary is now the only solx

This promotes the Rust rewrite to v1.0 and makes a clean break: the Python implementation is gone and a single static binary is the entire product. Install is download + chmod +x — no Python, no uv, no toolchain.

Built on the 18 solx-rs commits (rebased onto main @ 0.5.1); the commits on top do the promotion, docs refresh, and the ~/.solkeep removal.

Code

  • Renamed the crate solx-rs/solx/ and deleted the Python package (CLI, tests, .pyz build, install.sh, pyproject.toml/uv.lock, uv tool channel).
  • Bumped to 1.0.0 (Cargo.toml + SKILL.md; lock synced).
  • Removed ~/.solkeep end to end. The config [keep] block is the only keep-list source. solx keep never reads a legacy ~/.solkeep; the config import-solkeep subcommand and the keep --solkeep <file> flag are gone; solx init no longer offers to import one. Dead config helpers removed, embedded completions regenerated.

CI

  • Folded Rust lint/test/build into ci.yml (replaces the Python workflow).
  • release.yml builds + publishes the static x86_64-unknown-linux-musl binary on a vX.Y.Z tag, verifying the tag against Cargo.toml + SKILL.md.

Docs

  • Binary-only install everywhere (README, SKILL.md, references/, docs/solx.md); Python badge → Rust.
  • CHANGELOG: proper [1.0.0] entry (Python retired + ~/.solkeep removed). Historical Python/.pyz references kept as release history.
  • ROADMAP made forward-facing: v1.0 recorded concisely as shipped, laptop-side integration promoted from "out of scope" to the next focus.
  • Refreshed DEVELOPMENT.md, docs/coverage.md, eval harness docs + mocks (the fake $HOME now ships a [keep] config instead of a .solkeep), and the parity matrix (solkeep cases/fixtures dropped).
  • Stripped retired-tooling narration (sol_renew.py, deprecation/removal history) from the skill.

Verification

  • cargo fmt --check, cargo clippy -D warnings, cargo test all green: 103 unit + 39 end-to-end tests pass.
  • bash/zsh/fish completion scripts syntax-check clean and emit zero solkeep references.

🤖 Generated with Claude Code

Shu-Wan and others added 23 commits June 10, 2026 22:54
Single [[bin]] crate named solx at version 1.0.0-dev, with the dependency
set the port needs (clap derive for the command tree, serde_json with
preserve_order for byte-stable JSON, toml with preserve_order so [jobs.*]
tables keep file order, ignore for gitignore-semantics matching and
enumeration, csv/filetime/shlex for the keep and config-edit paths).
Cargo.lock is committed and stable is pinned via rust-toolchain.toml so
builds resolve identically on Sol and in CI.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
output.rs renders JSON byte-identically to Python's json.dumps(indent=2)
— two-space indent, \uXXXX escapes for everything outside printable
ASCII, insertion-ordered keys — because agents diff solx output across
implementations and the goldens compare byte-strict. Diagnostics always
go to stderr; prompting is gated on stdin being a TTY, separately from
the stdout format choice.

config.rs ports the TOML schema with the exact user-facing validation
messages, [keep] rules compiled to gitignore matchers rooted at /, the
.solkeep loader/splitter with the order-sensitivity probe, and the
starter-config text verbatim. slurm.rs ports verb-aware jobid resolution
(read/attach verbs auto-pick most recent, stop never auto-picks, inside
an allocation is the default target with a self-action flag), the argv
builders, and salloc execution with a wall-clock timeout.

Unit tests port the Python suite's vectors for row parsing, resolution
branches, argv shapes, durations, validation errors, solkeep import,
and keep matching.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
list/start/stop/jump/time mirror the Python bodies: identical JSON
payloads and stderr strings, exit 1 for runtime failures, exit 2 for
refusals (ambiguous stop, non-interactive without -y). jump exec-replaces
the process with `srun --jobid=N --overlap --pty SHELL`.

`job start` gets a hand-written tail parser because its grammar predates
clap conventions: -n/--dry-run and --timeout V are consumed wherever they
appear before the first `--`, the first `--` is dropped, and the first
unconsumed bare token — even one after `--` that looks like a flag —
becomes the template, with every other leftover token passed through to
salloc in original order. The unit tests pin each branch of that split.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
build_plan reads the Directory column of Sol's warning CSVs (missing
file = empty stage), dedupes across stages, and intersects with the
[keep] rules; only flagged directories are ever renewed. Enumeration is
an in-process walk with every ignore facility disabled so the file set
equals `find DIR -type f` — skipping hidden or git-ignored files would
silently under-protect them. Touch sets atime+mtime to now with
touch -c semantics: a vanished path is a silent skip and nothing is
ever created.

-j runs a worker pool over one task queue holding both enumerate and
touch tasks; a huge directory shards into BATCH-sized touch tasks so the
whole run scales with -j, not the directory count. JSON plan/summary
documents cap inlined lists at JSON_LIST_CAP with exact counts and spill
the complete plan to a temp file. The ~/.solkeep fallback stays, with a
deprecation notice naming 1.0.0 as the removal version.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
main.rs builds the clap tree (job/jobs groups, hidden ls alias,
top-level jump, config show/edit/import-solkeep) with a raw pre-pass for
the pieces clap can't express: eager leading --version printing the bare
version string, leading --json, no-args group help on stdout with exit
2, and interception of `job start` so its tail reaches the hand-written
parser with the `--` separator intact. --json is global, so it is
accepted trailing on every leaf; on `job start` a non-leading --json is
salloc passthrough by design.

init writes the starter config at mode 0600 (interactive walkthrough
picks the shell and offers the ~/.solkeep import; non-interactive runs
write defaults with no prompts), and config import-solkeep performs the
validated, lossiness-checked migration. completions embeds static
bash/zsh/fish scripts from assets/ — the zsh script carries the
dual-mode footer so both eval/source and fpath/autoload installs work.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
tests/cli.rs runs the compiled solx in an isolated fake HOME with the
deterministic SLURM mocks committed under tests/mocks/bin (the crate's
tests are self-contained), asserting stdout, stderr, and exit codes for
the core flows: version, list, time, stop preview/refusal, the start
template/passthrough split, jump exec, keep planning and a real renewal
over stale files, config show key order, config edit argv handling,
init, the import-solkeep lossy refusal, and completions validation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
rust-ci runs fmt --check, clippy -D warnings, and the test suite on
every push/PR touching solx-rs/, with rust-cache keeping the dependency
graph warm. README covers build/install and the output contract;
DEVELOPMENT maps each Rust module to its Python counterpart and explains
the parity-first workflow (the golden matrix is the spec, completion
scripts are synced from the Python generator).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The embedded bash, zsh, and fish completion scripts now match the
golden-v050 `solx completions <shell>` output byte for byte. The zsh
script names its entry function _solx, completes per-subcommand flags
via _arguments, and adds _solx_job/_solx_config helpers for the nested
command groups; the embedded-asset test asserts the matching
`compdef _solx solx` footer.

Verified: cargo test (126 passed) and the parity matrix against
golden-v050 (67/67).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The command reference lives at docs/solx.md in the repo root, not under
the Python package directory, and rust-toolchain.toml selects the stable
channel rather than pinning a version. Also state the no-color rendering
as current behavior.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…r config errors

The keep matcher must make byte-identical include/exclude decisions to the
Python reference, which compiles [keep]/.solkeep rules with pathspec's
GitIgnoreSpec. The ignore-crate matcher diverged on real inputs: it expanded
{a,b} brace alternates (renewing directories the rules never selected — and
skipping a literal {} directory the user opted in), matched every path for a
stray '/' include line, matched unclosed-'[' patterns that git discards, and
dropped a flagged directory written with a trailing slash, silently leaving
opted-in data to age into the purge. src/gitwild.rs is a faithful port of
pathspec 1.1.1 (pattern translation + last-match/exact-over-ancestor
resolution), pinned by vectors generated from the reference implementation.

Config diagnostics now match the reference's plain (non-TTY) renderer, which
strips style-tag lookalikes such as [jobs.default]/[keep] from messages
(output::strip_markup), and TOML parse errors collapse to the one-line
'message (at line L, column C)' form everywhere solx reports them — every
solx diagnostic is a single stderr line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… -j default

An existing warning CSV that can't be opened or decoded now exits 1 with one
line naming the file instead of planning zero directories — for the command
whose job is preventing scratch deletion, a read failure must never look
like 'nothing flagged'. A BOM-prefixed header keeps the BOM as part of the
first header cell's name (the reference CSV reader's behavior), so a BOM'd
Directory header yields no directories rather than a divergent plan.

The full-plan spill goes through the tempfile crate: created 0600 (the
document enumerates the user's flagged scratch layout and lands in shared
/tmp), bounded creation retries instead of an unbounded loop that spins
forever on an unwritable temp dir, and surfaced write errors so a truncated
spill is never advertised as complete.

The -j default derives from ONLINE system CPUs (sysconf(_SC_NPROCESSORS_ONLN),
os.cpu_count semantics) rather than cgroup-aware available_parallelism, so a
run inside a Slurm allocation still defaults to the same worker count as the
reference — not a serial fallback on the exact nodes keep is meant for.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
shlex::try_join quotes any token containing '=' or '%', which broke stderr
byte-parity on every job start whose argv carries --gres/--mem/--cpus-style
tokens (the submitting: line, dry-run renders, and the salloc-timeout Argv
tail). A token is now quoted only when it contains a character outside
[A-Za-z0-9_@%+=:,./-], with single-quote wrapping and '"'"' for embedded
quotes — exactly shlex.join. Pinned with the gpu-template argv from the
parity goldens.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…back

The kernel hostname on Sol compute nodes is the short name (scc041), so a
missing/failing/hanging `hostname -a` plus the bare kernel-name fallback
locked the gate on a genuine Sol node. The fallback now mirrors Python
socket.getfqdn: forward-resolve the kernel hostname, reverse-resolve the
address (gethostbyaddr, whose aliases carry the .sol.rc.asu.edu form), and
take the first dotted name — falling back to the kernel hostname only when
resolution itself fails.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t usage parity

- Only bare '--version' / 'version' print the version: the eager fast path
  is gone, so junk around either form is a clap usage error (exit 2), like
  'version bogus', '--bogus --version', and '--version --bogus'.
- 'help' is solx's own argument-less subcommand (clap's auto help subcommand
  is disabled), so 'solx help job' exits 2 instead of succeeding.
- 'job start --dry-run=VALUE' is a usage error (exit 2,
  "Option '--dry-run' does not take a value.") instead of being read as a
  template name; the tail parser also accepts the bare '-h' help token
  (outside '--', matching the rest of the surface).
- Group help renders with the binary name ('Usage: solx job ...'), and
  'job start --help' is bespoke text documenting -n/--dry-run, --timeout,
  TEMPLATE, and the salloc passthrough under the full 'solx job start'
  usage line. The Sol gate runs before the tail parse, in reference order.
- An unparseable $EDITOR is a runtime failure: exit 1 with one clean line.
- An invocation with _SOLX_COMPLETE set (the runtime-completion callback
  protocol installed completion scripts use) exits 0 silently instead of
  running a command.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The bash, zsh, and fish completion scripts embedded in the Rust binary
must match the output of the v0.5.0 Python CLI, which is the behavioral
spec for the port. Regenerated from the fixed branch A binary.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Split the README for two audiences: users install a prebuilt static
binary from a CI release (no cargo, Python, or uv on the box), while
contributors get a Toolchain on Sol section — sudo-free rustup
user-install, CARGO_TARGET_DIR on node-local storage to avoid NFS
build artifacts, a real-GET crates.io connectivity check (bare HEAD
probes 403), and the glibc-2.28-on-Sol vs musl-in-CI split.
DEVELOPMENT.md cross-references the README so the Sol notes have one
home.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a Highlights speedup table comparing the Rust build against the
v0.5.0 Python baseline (warm medians, NFS home), plus Changed/Added
entries for the rewrite and the prebuilt static-binary release.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a build job that compiles the statically linked
x86_64-unknown-linux-musl target and uploads it, so a reviewer can
download the binary from a PR and run it on Sol with no Rust toolchain
and no glibc-version dependency. DEVELOPMENT.md documents the native and
musl builds and fetching the artifact from a PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Rust rewrite becomes the only solx. Rename the crate solx-rs/ → solx/
and delete the Python package (CLI, tests, .pyz build, install.sh, uv
channel); bump to 1.0.0 (Cargo + SKILL.md).

- CI: fold Rust lint/test/build into ci.yml; release.yml now builds and
  publishes the static musl binary on a vX.Y.Z tag (verifies the tag
  against Cargo.toml + SKILL.md).
- keep: drop the implicit ~/.solkeep fallback — the config [keep] block is
  the only automatic keep-list source; `config import-solkeep` migrates an
  existing file and `--solkeep <file>` is an explicit per-run override.
- Docs: binary-only install (download + chmod) everywhere; CHANGELOG
  [1.0.0]; ROADMAP made forward-facing with the laptop-side promoted to the
  next focus; strip retired-tooling narration from the skill.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per the clean-slate cut, solx no longer touches a legacy ~/.solkeep at all.
The config [keep] block is the single keep-list source.

- Remove the `solx config import-solkeep` subcommand and the `solx keep
  --solkeep <file>` flag; `solx keep` reads only `[keep]` and errors
  (pointing at `solx config edit`) when it's absent. The `solx init`
  walkthrough no longer offers to import ~/.solkeep.
- Drop the now-dead config helpers (load_solkeep, import_solkeep,
  solkeep_is_order_sensitive, render_keep_block) and GitIgnoreSpec::empty;
  simplify starter_config_text.
- Regenerate the embedded bash/zsh/fish completion scripts (no
  import-solkeep / --solkeep).
- Tests: replace the fallback/import tests with a regression test that a
  ~/.solkeep on disk is ignored; rewrite the negation test to build rules
  from the include list.
- Parity matrix: drop the solkeep/import cases and fixtures; the run_case
  helper loses its HOME_SOLKEEP parameter.
- Eval mocks: the fake $HOME now ships a config.toml with a [keep] block
  instead of a ~/.solkeep.
- Docs: purge import-solkeep / --solkeep / migration guidance from the
  manual, skill, and references; the ROADMAP records the removal.

cargo fmt/clippy/test green (103 unit + 39 e2e); bash/zsh/fish completions
syntax-check clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…docs

Address review on PR #32:

- release.yml now runs `cargo test --locked` before building/publishing,
  so a `vX.Y.Z` tag can't ship a binary that skipped the suite (ci.yml
  only runs on main pushes/PRs, not tags). Use `--locked` for
  build/clippy/test in both workflows so CI fails on a stale Cargo.lock
  rather than silently updating it.
- Every documented install snippet now runs `mkdir -p ~/.local/bin`
  before the download — a fresh Sol account may not have the directory,
  which made `curl -fLo ~/.local/bin/solx` (and the `mv` variant) fail.
  README, docs/solx.md, SKILL.md, references/solx.md, solx/README.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The behavioral-parity matrix (evals/parity/) was cross-version migration
scaffolding — it diffed a new build against a captured golden of the prior
implementation. With Python retired there's no second implementation to
compare against, and the crate's own end-to-end + unit tests (which run in
CI) now lock behavior. Remove it; git history keeps it recoverable if a
future refactor wants the capture-and-diff approach again.

- Delete evals/parity/ (harness, fixtures, duplicate SLURM mocks).
- Repoint the "spec" references to solx/tests/cli.rs + unit vectors:
  solx/DEVELOPMENT.md (a behavior contract instead of "parity is the
  spec"), solx/README.md, docs/coverage.md, evals/README.md, CHANGELOG
  v1.0 entry.

Fresh-start docs pass: v1.0 is the starting point, so the user- and
contributor-facing docs no longer narrate the v0.x/Python lineage.

- ROADMAP: replace the version-by-version stages table and the v0.5.0
  Python latency deep-dive with a "What solx does today" overview; drop
  v0.4.0/v0.5.0 stamps from the design principles and confirmed decisions.
- coverage.md: reframe the header around the current Rust suite; drop
  "New in v0.4.0 / Updated for v0.5.0" provenance from the cells.
- solx/README.md + solx/DEVELOPMENT.md: present a native CLI, not a port.
- CHANGELOG keeps the history (per prior call); only dead cross-refs to
  the removed ROADMAP table were trimmed from the 0.5.0 entry.

cargo fmt/clippy/test green (103 unit + 39 e2e); user-facing docs carry no
version lineage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes eval #4 (it encoded the bug: expected -p public for a 4h GPU job) and adds #8 (30-min ablation -> htc), #9 (multi-day -> public/general), #10 (smoke test -> debug QOS on public/general). Adds an L3 l3_sbatch_test_only check that validates the recommended header against the live scheduler, catching invalid combos like -p htc -q debug that regex alone misses. Regexes hardened: canonical 4h forms, a100:1 vs a100:10, day and HH:MM:SS walls.
@Shu-Wan Shu-Wan merged commit f2534c6 into main Jun 11, 2026
3 checks passed
@Shu-Wan Shu-Wan deleted the v1.0-rust branch June 11, 2026 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant