docs(research): /wizard feasibility — AI vs deterministic#67
docs(research): /wizard feasibility — AI vs deterministic#67Hongjiseung-ROK wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a research document evaluating the feasibility of a /wizard command for automated server YAML configuration, proposing a deterministic-first approach for topology and scheduler detection. Reviewers suggested expanding environment variable checks for LSF and Torque, addressing Slurm version compatibility for JSON output, and refining scratch directory detection to avoid using home directories. A suggestion was also provided to include ControlPersist in the SSH configuration guidance for MFA-gated environments.
| - If the request already carries an SSH destination (`user@host`, SSH alias, or existing `SERVER.HOST`) and that destination is not local, select **Mode B**. | ||
| - Else continue. | ||
| 2. Run a local HPC-signature probe. | ||
| - Check for scheduler evidence: `env | egrep '^(SLURM_|PBS_|LSB_|SGE_)'`, `command -v sinfo`, `command -v qstat`, `command -v bqueues`, `command -v qconf`. |
| 2. Run a local HPC-signature probe. | ||
| - Check for scheduler evidence: `env | egrep '^(SLURM_|PBS_|LSB_|SGE_)'`, `command -v sinfo`, `command -v qstat`, `command -v bqueues`, `command -v qconf`. | ||
| - Check for login-node evidence: `type module`, `hostname -f`, `command -v g16`, `command -v orca`, `command -v nciplot`. | ||
| - Check that at least one scheduler read-only query actually works: `sinfo --json`, `qstat -Q -f`, `bqueues -l`, or `qconf -sql`. |
| | `SERVER.NUM_THREADS` | Same as `NUM_CORES`; optionally `lscpu` for Mode A | N — v1 can tie threads to cores | Set equal to `NUM_CORES` | Ask only if site needs a different OpenMP default | | ||
| | `SERVER.SUBMIT_COMMAND` | Derived from confirmed scheduler family | N — this is a fixed mapping | `SLURM→sbatch`, `PBS→qsub`, `LSF→bsub < `, `SGE→qsub` | None needed unless maintainer wants overrides | | ||
| | `SERVER.PROJECT` | No reliable portable read-only probe; optional weak hints from `sacctmgr`, `qmgr`, `groups`, or sample env vars if visible | N — account/project discovery is site-policy data, not something the tool should guess | Leave commented or null | Ask user for allocation/account/project string | | ||
| | `SERVER.SCRATCH_DIR` | `printf '%s\n' "$SCRATCH" "$WORK" "$TMPDIR"`; `test -d ~/scratch -a -w ~/scratch`; `getent passwd "$USER" \| cut -d: -f6`; `df -h` | N — path presence and writability are deterministic | Use the first writable stable user scratch root; else `null` | Ask for scratch root path | |
There was a problem hiding this comment.
getent passwd "$USER" | cut -d: -f6 retrieves the user's home directory. In HPC environments, home directories are often restricted in size or performance and are distinct from the intended high-performance scratch space. It might be safer to treat this as a fallback rather than a primary scratch probe.
| | Unknown SSH host key (`StrictHostKeyChecking` would prompt) | Reject | Stop before the first remote probe; instruct user to trust the host out of band or rerun from a pre-established SSH session | | ||
| | TOFU/`accept-new` host key path already enabled in user SSH config | Warn-pause | Show that the first probe will mutate `known_hosts`; require explicit approval | | ||
| | SSH timeout / unreachable host | Reject | Abort probing; keep no partial YAML updates | | ||
| | MFA- or keyboard-interactive-gated SSH login | Reject | Non-interactive probe mode cannot satisfy MFA; instruct user to use ControlMaster, agent-backed auth, or run `/wizard` from the login node | |
There was a problem hiding this comment.
For MFA-gated logins, suggesting the use of ControlPersist in addition to ControlMaster in the user's SSH config can help maintain the authenticated session across multiple probe commands, reducing the need for repeated MFA prompts.
| | MFA- or keyboard-interactive-gated SSH login | Reject | Non-interactive probe mode cannot satisfy MFA; instruct user to use ControlMaster, agent-backed auth, or run `/wizard` from the login node | | |
| | MFA- or keyboard-interactive-gated SSH login | Reject | Non-interactive probe mode cannot satisfy MFA; instruct user to use ControlMaster/ControlPersist, agent-backed auth, or run /wizard from the login node | |
Summary
This PR adds
docs/research/wizard_feasibility.md, a research-only feasibility report for/wizardserver-YAML autofill across local-login-node and SSH topologies.Verdict
/wizardis feasible with a deterministic-first design: scheduler detection, queue/node parsing, scratch discovery, topology selection, YAML rendering, and cache refresh should stay deterministic, while LLM use is justified only as a narrow tie-breaker for ambiguous human-formatted module output.Field matrix top-line
The report includes a per-field source matrix covering every
SERVER,GAUSSIAN,ORCA, andNCIPLOTfield, with probe commands, AI-needed yes/no, deterministic fallbacks, and user-prompt fallbacks.Recommended sub-PR sequence
/wizardsetup intent + slash commandNotes
bin/plan.mdis intentionally not linked because it does not exist for this task.