TL;DR
Each NQS+SCI method paper reports its own numbers on its own setup. No paper systematically compares wall-time-to-accuracy across HI-NQS-SQD, GTNN-SCI, ADAPT-QSCI, ph-AFQMC+SQD, NNQS-Transformer at fixed compute budget on the same hardware. This is a workshop/benchmark paper opportunity (not a new-method paper).
What's open
Literature is full of "X method achieves Y accuracy" but nobody plots the Pareto frontier at fixed compute:
- Method A: chem-acc in 4h on H100
- Method B: chem-acc in 1h on H200 with 4× memory
- Method C: 5 mHa off in 30min, chem-acc in 8h
Researchers have to reproduce each method themselves, often failing due to insufficient detail or missing dependencies.
Proposal: build the missing benchmark
Test molecules (N2-CAS series, fixed CASSCF orbitals, fixed reference):
- N2-CAS(10,12) — 24Q, Hilbert 627k
- N2-CAS(10,15) — 30Q, Hilbert 9M
- N2-CAS(10,20) — 40Q, Hilbert 240M
- N2-CAS(10,26) — 52Q, Hilbert 4.3B
- Cr2-CAS(12,18) — 36Q (multireference test)
- Cr2-CAS(12,26) — 52Q (harder multireference)
Methods to compare (5 baselines + ours):
- HI-NQS-SQD (qvartools pipeline 010)
- ADAPT-QSCI (JCTC 4c00846)
- GTNN-SCI (JCTC 5c01429)
- NNQS-Transformer / QiankunNet (arXiv:2306.16705)
- ph-AFQMC + SQD trial (JCTC 5c01407)
- HCI gold reference (compactness baseline)
Fixed compute budget: e.g., 4h on H200 with 12 CPUs. Run each method to terminate or budget-exhaust.
Metrics:
- Final E vs HCI gold standard
- Wall time to chem-acc (1.6 mHa)
- Subspace size at termination
- |c|² histogram tail thickness
- Sample efficiency (configs / unique-important-determinants)
Output:
- Pareto frontier plot per system size
- Failure mode analysis per method
- Reproducibility recipe (Docker image + scripts)
Why this is publishable
- Workshop venues (e.g., NeurIPS ML4Sci, ICML SciML, IEEE QCE) actively want benchmark papers
- Reproducibility crisis is real in NQS — multiple implementations claim chem-acc but nobody can reproduce
- Fills a clear gap — single search turns up no equivalent paper
Why this is NOT new method research
Fully clear: this is engineering benchmarking, not algorithmic novelty. Workshop venue not top-tier journal.
Effort: 3-6 months
- ~1 month: implement / wrap each baseline in qvartools
- ~1 month: run benchmarks on H200 cluster (nano4 access)
- ~1 month: analysis + plotting + figure generation
- ~1 month: writing + iterating
Risks
- Each baseline implementation might require significant porting work
- Authors of original methods might dispute "our reimplementation" — mitigate by using their official code where available
- Reviewers might say "not novel enough" — counter with "but rigorous benchmarking is the literature gap we identify"
TL;DR
Each NQS+SCI method paper reports its own numbers on its own setup. No paper systematically compares wall-time-to-accuracy across HI-NQS-SQD, GTNN-SCI, ADAPT-QSCI, ph-AFQMC+SQD, NNQS-Transformer at fixed compute budget on the same hardware. This is a workshop/benchmark paper opportunity (not a new-method paper).
What's open
Literature is full of "X method achieves Y accuracy" but nobody plots the Pareto frontier at fixed compute:
Researchers have to reproduce each method themselves, often failing due to insufficient detail or missing dependencies.
Proposal: build the missing benchmark
Test molecules (N2-CAS series, fixed CASSCF orbitals, fixed reference):
Methods to compare (5 baselines + ours):
Fixed compute budget: e.g., 4h on H200 with 12 CPUs. Run each method to terminate or budget-exhaust.
Metrics:
Output:
Why this is publishable
Why this is NOT new method research
Fully clear: this is engineering benchmarking, not algorithmic novelty. Workshop venue not top-tier journal.
Effort: 3-6 months
Risks