research-direction: cross-method benchmark at fixed compute budget on N2/Cr2 active-space series (only realistic publishable angle)

## TL;DR
Each NQS+SCI method paper reports its own numbers on its own setup. **No paper systematically compares wall-time-to-accuracy across HI-NQS-SQD, GTNN-SCI, ADAPT-QSCI, ph-AFQMC+SQD, NNQS-Transformer at fixed compute budget on the same hardware.** This is a workshop/benchmark paper opportunity (not a new-method paper).

## What's open
Literature is full of "X method achieves Y accuracy" but nobody plots the **Pareto frontier** at fixed compute:
- Method A: chem-acc in 4h on H100
- Method B: chem-acc in 1h on H200 with 4× memory
- Method C: 5 mHa off in 30min, chem-acc in 8h

Researchers have to reproduce each method themselves, often failing due to insufficient detail or missing dependencies.

## Proposal: build the missing benchmark
**Test molecules** (N2-CAS series, fixed CASSCF orbitals, fixed reference):
- N2-CAS(10,12) — 24Q, Hilbert 627k
- N2-CAS(10,15) — 30Q, Hilbert 9M
- N2-CAS(10,20) — 40Q, Hilbert 240M
- N2-CAS(10,26) — 52Q, Hilbert 4.3B
- Cr2-CAS(12,18) — 36Q (multireference test)
- Cr2-CAS(12,26) — 52Q (harder multireference)

**Methods to compare** (5 baselines + ours):
1. HI-NQS-SQD (qvartools pipeline 010)
2. ADAPT-QSCI ([JCTC 4c00846](https://pubs.acs.org/doi/10.1021/acs.jctc.4c00846))
3. GTNN-SCI ([JCTC 5c01429](https://pubs.acs.org/doi/10.1021/acs.jctc.5c01429))
4. NNQS-Transformer / QiankunNet ([arXiv:2306.16705](https://arxiv.org/abs/2306.16705))
5. ph-AFQMC + SQD trial ([JCTC 5c01407](https://pubs.acs.org/doi/10.1021/acs.jctc.5c01407))
6. HCI gold reference (compactness baseline)

**Fixed compute budget**: e.g., 4h on H200 with 12 CPUs. Run each method to terminate or budget-exhaust.

**Metrics**:
- Final E vs HCI gold standard
- Wall time to chem-acc (1.6 mHa)
- Subspace size at termination
- |c|² histogram tail thickness
- Sample efficiency (configs / unique-important-determinants)

**Output**: 
- Pareto frontier plot per system size
- Failure mode analysis per method
- Reproducibility recipe (Docker image + scripts)

## Why this is publishable
1. **Workshop venues** (e.g., NeurIPS ML4Sci, ICML SciML, IEEE QCE) actively want benchmark papers
2. **Reproducibility crisis is real** in NQS — multiple implementations claim chem-acc but nobody can reproduce
3. **Fills a clear gap** — single search turns up no equivalent paper

## Why this is NOT new method research
Fully clear: this is engineering benchmarking, not algorithmic novelty. Workshop venue not top-tier journal.

## Effort: 3-6 months
- ~1 month: implement / wrap each baseline in qvartools
- ~1 month: run benchmarks on H200 cluster (nano4 access)
- ~1 month: analysis + plotting + figure generation
- ~1 month: writing + iterating

## Risks
- Each baseline implementation might require significant porting work
- Authors of original methods might dispute "our reimplementation" — mitigate by using their official code where available
- Reviewers might say "not novel enough" — counter with "but rigorous benchmarking is the literature gap we identify"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research-direction: cross-method benchmark at fixed compute budget on N2/Cr2 active-space series (only realistic publishable angle) #51

TL;DR

What's open

Proposal: build the missing benchmark

Why this is publishable

Why this is NOT new method research

Effort: 3-6 months

Risks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

research-direction: cross-method benchmark at fixed compute budget on N2/Cr2 active-space series (only realistic publishable angle) #51

Description

TL;DR

What's open

Proposal: build the missing benchmark

Why this is publishable

Why this is NOT new method research

Effort: 3-6 months

Risks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions