Skip to content

P-B: classical HCI/SCI baseline ε-sweep for ADR-006 G0 Pareto comparison #54

@thc1006

Description

@thc1006

TL;DR

ADR-006 (docs/decisions/006-compactness-first-hi-nqs-sqd.md) sets a research-grade goal: produce a single (energy, ci_matrix_dim) point that strictly Pareto-dominates the HCI ε-sweep curve on at least one of {N2-CAS(10,20), N2-CAS(10,26)}. To make that comparison we need an HCI/SCI baseline curve. This issue tracks the standalone work to produce it. Decoupled from ADR-006 P-A and P-C — can land alone, blocks G0/G1/G2/G6 of ADR-006 Tier 2.

Why now

We are about to invest H200-hours scaling the NQS to 256/8/8 (Issue #49) and implementing ADR-006 P1 (compactness mode). Without an HCI baseline curve we cannot evaluate G0 (Pareto dominance) — the central research claim of ADR-006. This is a 2-day standalone job that unblocks the rest.

Deliverables

  • experiments/pipelines/baselines/run_hci_sweep.py — main script.
    • Loads integrals via qvartools.molecules.get_molecule(name) so the Pareto comparison is on identical h1e/h2e as our HI-NQS-SQD pipeline.
    • Backends: pyscf.fci.SCI (default, always-available, Schriber-Evangelista), pyscf.fci.selected_ci_spin0 for closed-shell (auto-selected; ~2x speedup), shciscf direct-driver TBD.
    • Output CSV columns: molecule, backend, epsilon, n_dets, ci_matrix_dim_cartesian, e_var, e_pt2, e_total, error_to_ref_mha, wall_time_s, converged, n_cycles, n_orb, n_alpha, n_beta.
    • Reports BOTH n_dets (actual nonzero coefficients selected) and ci_matrix_dim_cartesian (alpha-strs x beta-strs upper bound) so the comparison against our pipeline's logged ci_matrix_dim is honest.
  • experiments/pipelines/baselines/qvartools_hci_one.sh — CPU-only SLURM template, 12h default, fail-fast preamble for missing pyscf/qvartools.
  • experiments/pipelines/baselines/submit_hci_sweep.py — submission script with per-tag time overrides (52Q->24h, 40Q->16h, validation cases->2h).
  • Code review — 2 rounds of adversarial review (B1/B2/S1/S2/S3/S4 fixes applied; final round: "CLEAN — ready to ship").

Sweep matrix

Tag Molecule epsilon grid Walltime
HCI-LIH LiH (12Q) 1e-2..1e-6 2h (validation)
HCI-BEH2 BeH2 (14Q) 1e-2..1e-5 2h (validation)
HCI-N2-40Q N2-CAS(10,20) 1e-3, 5e-4, 2e-4, 1e-4, 5e-5 16h
HCI-N2-52Q N2-CAS(10,26) 1e-3, 5e-4, 2e-4, 1e-4 24h

Strong-correlation cases (Cr2) deliberately excluded for first cut — single-reference N2 is the right test ground for the Pareto claim.

Acceptance criteria

  • All 4 sweep jobs produce hci_sweep.csv with at least 4 finite-energy rows each.
  • LiH/BeH2 validation: at smallest epsilon, error to literature FCI <= 0.5 mHa.
  • N2-CAS(10,20) and N2-CAS(10,26) curves committed to experiments/baselines/ for downstream Pareto plotting.
  • Brief note in ADR-006 G6 confirming P-B done.

Out of scope

  • Direct-integrals SHCI via Dice (deferred until shciscf wrapper exposes it; SCI baseline is sufficient for G0 since both are in the same SCI family with similar Pareto curves).
  • epsilon-2 PT2 corrections (PySCF SCI does not expose them; deferred to a future SHCI-driven version).
  • Cr2 / strong-correlation cases (separate issue if needed).

References

  • ADR-006: docs/decisions/006-compactness-first-hi-nqs-sqd.md Goals, Validation, Hard prerequisites sections
  • Mazzola/Magoulas, Critical Limitations in QSCI, JCTC 2025: arXiv:2501.07231 — establishes 12x compactness gap baseline that we are claiming to beat.
  • Holmes et al., Heat-Bath CI, JCTC 2016: arXiv:1606.07453 — selection rule reference.
  • Schriber and Evangelista, Adaptive CI, JCP 2016 — what PySCF SCI implements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions