This repository provides installation and usage scripts for TRACE (arXiv:2506.09114).
conda env create -f environment.yml
conda activate trace-ragConfigure runtime paths via .env.
Download dataset from Google Drive and unzip the file into the dataset/ directory.
The dataset for the project follows this structure:
dataset/
pretrain/
train_data/
val_data/
test_data/
forecasting/
train.json
val.json
test.json
retrieval/
train.parquet
test.parquet
├── pretrain.py # Stage 1
├── context_align.py # Stage 2
├── forecast_finetune.py # Optional for task-specific finetuning
├── demo.ipynb # Embedding + retrieval demo
├── configs/
│ ├── pretrain.yaml
│ ├── align.yaml
│ └── finetune.yaml
└── src/
├── data/ # Dataset + dataloader
├── models/ # TS encoder / multimodal encoder / retriever
├── tasks/ # Training loops
└── utils/ # Config / metrics / helpers
TRACE uses a two-stage training pipline. Stage 3 serves as an optional stage for task-specific finetuning.
- Stage 1: time-series pretraining
- Stage 2: time-series/text context alignment (embedding + retrieval)
- Stage 3 (optional): forecasting finetuning (with or without RAG)
CUDA_VISIBLE_DEVICES=0,1 torchrun \
--nproc_per_node=2 \
--master-port=<MASTER_PORT_STAGE1> \
pretrain.py \
--config configs/pretrain.yaml \After pretraining, record the run name.
Important:
- This run name is the key link to Step 2.
context_align.pyuses--pretraining_run_nameto locate and override model settings fromresults/wandb_configs/<PRETRAIN_RUN_NAME>.yaml.- Pretraining checkpoints are expected under
results/model_checkpoints/<PRETRAIN_RUN_NAME>/.
CUDA_VISIBLE_DEVICES=0,1 torchrun \
--nproc_per_node=2 \
--master-port=<MASTER_PORT_STAGE2> \
context_align.py \
--config configs/align.yaml \
--pretraining_run_name "<PRETRAIN_RUN_NAME>" \
--cross_attendUse the same <PRETRAIN_RUN_NAME> from Stage 1.
CUDA_VISIBLE_DEVICES=0,1 torchrun \
--nproc_per_node=2 \
--master-port=<MASTER_PORT_FT_WO_RAG> \
forecast_finetune.py \
--config configs/finetune.yaml \
--pretraining_run_name "<PRETRAIN_RUN_NAME>" \
--ts_onlyCUDA_VISIBLE_DEVICES=0,1 torchrun \
--nproc_per_node=2 \
--master-port=<MASTER_PORT_FT_W_RAG> \
forecast_finetune.py \
--config configs/finetune.yaml \
--pretraining_run_name "<PRETRAIN_RUN_NAME>" \
--top_k 1Refer to demo.ipynb for generating embedding bank and cross-modal retrieval.
If you find this work useful, please consider citing our paper:
@article{chen2025trace,
title={Trace: Grounding time series in context for multimodal embedding and retrieval},
author={Chen, Jialin and Zhao, Ziyu and Nurbek, Gaukhar and Feng, Aosong and Maatouk, Ali and Tassiulas, Leandros and Gao, Yifeng and Ying, Rex},
journal={arXiv preprint arXiv:2506.09114},
year={2025}
}
