diff --git a/docs/research/hpc_os_compatibility.md b/docs/research/hpc_os_compatibility.md new file mode 100644 index 000000000..329e174b5 --- /dev/null +++ b/docs/research/hpc_os_compatibility.md @@ -0,0 +1,1043 @@ +# chemsmart HPC OS compatibility audit +- Report date: 2026-05-10 +- Auditor persona: HPC OS Management Specialist +- Audit target: commit `20cbcdb9` +- Current repo head in this worktree: `36f13c6e` +- Scope: `chemsmart agent` multi-tool pipeline added in W1a/W1b/W2a/W2b +- Constraint: no code changes; research only +## Audit framing +- I audited the code and packaging state at commit `20cbcdb9`, because that is +the user-specified pipeline head. +- I also read the current `AGENTS.md` for repo conventions. +- `AGENTS.md` does not exist at commit `20cbcdb9`; I used the current file only +as a workflow/conventions guide, not as evidence about the target pipeline. +- Required local files reviewed: + - `pyproject.toml` at `20cbcdb9` + - `environment.yml` at `20cbcdb9` + - `README.md` at `20cbcdb9` + - `chemsmart/agent/providers.py` at `20cbcdb9` + - `chemsmart/agent/transport.py` at `20cbcdb9` + - `chemsmart/agent/cli.py` at `20cbcdb9` + - `chemsmart/agent/core.py` at `20cbcdb9` + - `chemsmart/agent/tools.py` at `20cbcdb9` + - `chemsmart/agent/tui/app.py` at `20cbcdb9` + - `chemsmart/agent/tui/styles.tcss` at `20cbcdb9` + - `chemsmart/agent/tui/widgets/header.py` at `20cbcdb9` + - `chemsmart/settings/server.py` at `20cbcdb9` + - `chemsmart/settings/submitters.py` at `20cbcdb9` + - current `AGENTS.md` +- Bottom-line upfront: + - ABI-wise, the pipeline is broadly runnable on modern Linux login-node OSes +in scope. + - Operationally, it does **not** run “without issue” on common shared HPC +login nodes. + - The biggest problems are not GLIBC. + - The biggest problems are: + - Python 3.10+ is not the default system Python on EL8, EL9, or SLES 15. + - the documented credential flow is broken by a hard-coded absolute `api.env` path in + `providers.py`. + - `rich` is imported by the agent CLI but is not declared as a direct dependency in + `pyproject.toml`. + - `run_local` and long-lived Textual sessions are poor fits for shared login node + acceptable-use policies at NERSC, TACC, and OLCF. + - outbound HTTPS to `factchat-cloud.mindlogic.ai` is mandatory. + - `submit_hpc` over SSH assumes host aliases derived from the server YAML filename, + which is brittle on real clusters. +## 1. Pipeline Dependency Audit +### 1.1 What the audited pipeline actually needs +From the audited commit, the agent stack is split across four layers. +1. CLI and rendering layer. + - `click` + - `rich` +2. validation and provider layer. + - `pydantic>=2` + - `python-dotenv>=1` + - `openai>=1` + - `anthropic>=0.40` +3. chemistry/job layer. + - `ase==3.24.0` + - `numpy~=1.26.4` + - `scipy==1.15.2` + - `PyYAML` + - existing chemsmart Gaussian/ORCA code +4. TUI extra. + - `textual>=0.80,<1.0` + - `watchdog>=3,<5` + - `pyperclip` +### 1.2 Important packaging observations from the local audit +- `pyproject.toml` at `20cbcdb9` declares: + - `requires-python = "~=3.10"` + - main deps include `anthropic`, `openai`, `python-dotenv`, `pydantic` + - TUI extra includes `textual`, `watchdog`, `pyperclip` +- `environment.yml` pins `python=3.10` and includes `anthropic`, `openai`, +`pydantic`, and `python-dotenv` in the conda dependency list. +- The agent CLI imports `rich` at module import time in +`chemsmart/agent/cli.py`. +- `rich` is **not** declared in the base dependencies or the `agent-tui` extra. +- Therefore: + - a minimal `pip install -e .` may fail to import `chemsmart agent` if `rich` +is not already present from some unrelated package. + - a full `.[agent-tui]` install probably pulls `rich` transitively via +`textual`, but that is accidental, not explicit. +- On clusters, accidental transitive dependencies are a recurring support +headache because module stacks, wheel caches, and offline mirrors are rarely identical +across sites. +### 1.3 Important runtime-path observations from the local audit +- `providers.py` hard-codes: + - `_API_ENV_PATH = "/Users/hongjiseung/developer/chemsmart/api.env"` +- `README.md` instructs the user to: + - `cp api.env.example api.env` +- Those two things do not match. +- On any HPC system, the documented default credential path is therefore broken +unless the user does one of the following: + - exports `ai_api_key` directly in the shell environment, or + - creates the exact hard-coded path, which is unrealistic on shared systems. +- This is not an OS ABI problem. +- It is a portability problem that will show up on every site. +### 1.4 File-system and session-path observations from the local audit +- Agent session root defaults to: + - `~/.chemsmart/agent/sessions` +- Server/user config roots live under: + - `~/.chemsmart/server` + - `~/.chemsmart/gaussian` + - `~/.chemsmart/orca` +- `dry_run_input(job)` writes input files into `job.folder`. +- `run_local(job)` writes: + - `