This repository contains the reference implementation of SMARTERS and provides a reproducible pipeline for:
- Simulating TERS images from Gaussian
.fchkfiles - Training an Attention U-Net model for molecular-structure mask prediction
- Evaluating trained checkpoints on held-out
.npzdatasets
.
├── configs/ # YAML configs for hyperparameter search/training
├── model_checkpoints/ # Provided pretrained checkpoints
├── notebooks/ # Inference notebook and notebook utilities
├── src/ # Models, datasets, trainer, metrics, transforms
├── ters_img_simulator/ # TERS simulation pipeline (.fchk -> .npz)
├── evaluate_model.py # Single-model evaluation entrypoint
├── hyperopt.py # Training + Optuna hyperparameter search entrypoint
├── requirements.txt # Python dependency pins
├── run_evaluate.sh # SLURM wrapper for evaluate_model.py
└── train_parameter_search.sh # SLURM wrapper for hyperopt.py
Install dependencies from requirements.txt:
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txtNote: requirements.txt currently contains two PyYAML pins. Depending on the resolver, this may cause an install conflict and should be harmonized in your local environment.
For most users, the fastest way to inspect model behavior is:
notebooks/inference.ipynb
Run:
jupyter lab notebooks/inference.ipynbThe notebook is the primary interactive entrypoint for single-molecule inference and visualization (TERS visualization, ground truth, prediction, and IoU/Dice score).
Training and evaluation expect directories of .npz files.
Required keys per .npz:
atom_posatomic_numbersfrequenciesspectrums
Typical split layout:
<data_root>/
├── train/*.npz
├── val/*.npz
└── test/*.npz
Dataset implementation:
src/datasets/ters_image_to_image_sh.py
Canonical training command:
python hyperopt.py --config configs/config_hypopt_all_val.yamlWith Weights & Biases logging:
python hyperopt.py --config configs/config_hypopt_all_val.yaml --use_wandbSLURM wrapper usage:
export WANDB_API_KEY=<your_wandb_key>
sbatch train_parameter_search.sh configs/config_hypopt_all_val.yamlTraining outputs are controlled by YAML keys such as save_path and log_path.
Canonical evaluation command:
python evaluate_model.py \
--model <path/to/model.pt> \
--data <path/to/npz_dir> \
--batch_size 32SLURM wrapper usage:
sbatch run_evaluate.sh <model_path> <data_path> [batch_size]Example:
sbatch run_evaluate.sh model_checkpoints/best_model_0.05.pt /path/to/val 32evaluate_model.py computes global Accuracy, Precision, Recall, F1, IoU, and Dice via src/metrics/metrics.py.
Provided checkpoints are in model_checkpoints/:
best_model_0.05.ptbest_model_0.1.ptbest_model_0.5.pt
Suffix convention:
0.05means trained on the dataset variant with0.05RMSD0.1means trained on the dataset variant with0.1RMSD0.5means trained on the dataset variant with0.5RMSD
Example:
python evaluate_model.py \
--model model_checkpoints/best_model_0.05.pt \
--data <path/to/npz_dir> \
--batch_size 32The simulator is documented in ters_img_simulator/README.md.
Use the same Python environment defined in this top-level README (pip install -r requirements.txt), then follow simulator-specific usage and options in the simulator README.
notebooks/inference.ipynb: single-molecule inference and visualization workflownotebooks/utils/: notebook utility modules (data reading, planarity, visualization)
If you use this repository, models, or generated data in your research, please cite:
@misc{sethi2026automatedstructurediscoverytip,
title={Automated structure discovery for Tip Enhanced Raman Spectroscopy},
author={Harshit Sethi and Markus Junttila and Orlando J Silveira and Adam S Foster},
year={2026},
eprint={2602.19932},
archivePrefix={arXiv},
primaryClass={cond-mat.mtrl-sci},
url={https://arxiv.org/abs/2602.19932},
}