Skip to content

docs(research): /wizard feasibility — AI vs deterministic#67

Draft
Hongjiseung-ROK wants to merge 1 commit into
mainfrom
docs/research-wizard-feasibility
Draft

docs(research): /wizard feasibility — AI vs deterministic#67
Hongjiseung-ROK wants to merge 1 commit into
mainfrom
docs/research-wizard-feasibility

Conversation

@Hongjiseung-ROK

Copy link
Copy Markdown
Owner

Summary

This PR adds docs/research/wizard_feasibility.md, a research-only feasibility report for /wizard server-YAML autofill across local-login-node and SSH topologies.

Verdict

/wizard is feasible with a deterministic-first design: scheduler detection, queue/node parsing, scratch discovery, topology selection, YAML rendering, and cache refresh should stay deterministic, while LLM use is justified only as a narrow tie-breaker for ambiguous human-formatted module output.

Field matrix top-line

The report includes a per-field source matrix covering every SERVER, GAUSSIAN, ORCA, and NCIPLOT field, with probe commands, AI-needed yes/no, deterministic fallbacks, and user-prompt fallbacks.

Recommended sub-PR sequence

  • W3a remote/local probe primitives
  • W3b scheduler parsers and normalization
  • W3c software-environment probes
  • W3d YAML render + validation
  • W3e /wizard setup intent + slash command
  • W3f HOST/topology finalization
  • W3g node-refresh sidecar cache

Notes

  • Research only; no code changes beyond this one markdown report.
  • bin/plan.md is intentionally not linked because it does not exist for this task.
  • Tests not run per task constraints.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a research document evaluating the feasibility of a /wizard command for automated server YAML configuration, proposing a deterministic-first approach for topology and scheduler detection. Reviewers suggested expanding environment variable checks for LSF and Torque, addressing Slurm version compatibility for JSON output, and refining scratch directory detection to avoid using home directories. A suggestion was also provided to include ControlPersist in the SSH configuration guidance for MFA-gated environments.

- If the request already carries an SSH destination (`user@host`, SSH alias, or existing `SERVER.HOST`) and that destination is not local, select **Mode B**.
- Else continue.
2. Run a local HPC-signature probe.
- Check for scheduler evidence: `env | egrep '^(SLURM_|PBS_|LSB_|SGE_)'`, `command -v sinfo`, `command -v qstat`, `command -v bqueues`, `command -v qconf`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The environment variable check for LSF should also include the LSF_ prefix (e.g., LSF_ENVDIR), as it is frequently used alongside LSB_. Additionally, some PBS/Torque installations use TORQUE_ prefixes.

2. Run a local HPC-signature probe.
- Check for scheduler evidence: `env | egrep '^(SLURM_|PBS_|LSB_|SGE_)'`, `command -v sinfo`, `command -v qstat`, `command -v bqueues`, `command -v qconf`.
- Check for login-node evidence: `type module`, `hostname -f`, `command -v g16`, `command -v orca`, `command -v nciplot`.
- Check that at least one scheduler read-only query actually works: `sinfo --json`, `qstat -Q -f`, `bqueues -l`, or `qconf -sql`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Note that sinfo --json is only available in Slurm versions 20.11 and later. It is worth noting this version dependency to ensure the probe logic includes a robust text-parsing fallback for older clusters.

| `SERVER.NUM_THREADS` | Same as `NUM_CORES`; optionally `lscpu` for Mode A | N — v1 can tie threads to cores | Set equal to `NUM_CORES` | Ask only if site needs a different OpenMP default |
| `SERVER.SUBMIT_COMMAND` | Derived from confirmed scheduler family | N — this is a fixed mapping | `SLURM→sbatch`, `PBS→qsub`, `LSF→bsub < `, `SGE→qsub` | None needed unless maintainer wants overrides |
| `SERVER.PROJECT` | No reliable portable read-only probe; optional weak hints from `sacctmgr`, `qmgr`, `groups`, or sample env vars if visible | N — account/project discovery is site-policy data, not something the tool should guess | Leave commented or null | Ask user for allocation/account/project string |
| `SERVER.SCRATCH_DIR` | `printf '%s\n' "$SCRATCH" "$WORK" "$TMPDIR"`; `test -d ~/scratch -a -w ~/scratch`; `getent passwd "$USER" \| cut -d: -f6`; `df -h` | N — path presence and writability are deterministic | Use the first writable stable user scratch root; else `null` | Ask for scratch root path |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

getent passwd "$USER" | cut -d: -f6 retrieves the user's home directory. In HPC environments, home directories are often restricted in size or performance and are distinct from the intended high-performance scratch space. It might be safer to treat this as a fallback rather than a primary scratch probe.

| Unknown SSH host key (`StrictHostKeyChecking` would prompt) | Reject | Stop before the first remote probe; instruct user to trust the host out of band or rerun from a pre-established SSH session |
| TOFU/`accept-new` host key path already enabled in user SSH config | Warn-pause | Show that the first probe will mutate `known_hosts`; require explicit approval |
| SSH timeout / unreachable host | Reject | Abort probing; keep no partial YAML updates |
| MFA- or keyboard-interactive-gated SSH login | Reject | Non-interactive probe mode cannot satisfy MFA; instruct user to use ControlMaster, agent-backed auth, or run `/wizard` from the login node |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For MFA-gated logins, suggesting the use of ControlPersist in addition to ControlMaster in the user's SSH config can help maintain the authenticated session across multiple probe commands, reducing the need for repeated MFA prompts.

Suggested change
| MFA- or keyboard-interactive-gated SSH login | Reject | Non-interactive probe mode cannot satisfy MFA; instruct user to use ControlMaster, agent-backed auth, or run `/wizard` from the login node |
| MFA- or keyboard-interactive-gated SSH login | Reject | Non-interactive probe mode cannot satisfy MFA; instruct user to use ControlMaster/ControlPersist, agent-backed auth, or run /wizard from the login node |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant