This project involves training and predicting with a machine learning model using noisy signal data. The steps below outline the required environment setup, dependencies, and instructions for data preparation, model training, and result prediction.
Ensure the following tools are installed and configured in your environment:
- CUDA
- Apex
Install the following Python packages before running the scripts:
pip install timm numpy omegaconf pandas pyfstat pytorch_lightning scikit_learn torch tqdm wandbDownload the raw data (approximately 200GB). This may take some time, so please be patient.
Run the script to generate signal images by adding noise to clean signals.
python scripts/simulate_signals.py resources/competition/timestamps.pklGenerate random Gaussian background noise and combine it with pure signals.
python scripts/synthesize_external_psds.py resources/external/train/signalsConvert HDF5 data files to the required input format for the model.
python extract_psds_from_hdf5.py ../input/train/test_hdf5_directoryTrain the machine learning model using the specified configuration file.
python src/train.py config/convnext_small_in22ft1k.yamlUse the trained model to generate predictions.
python src/predict.py convnext_small_in22ft1k-6f6648-last.pt --use-flip-ttaThis will produce submission1.csv.
Train a second model and generate additional predictions.
python src/g2net-augmentation.pyThis will produce submission2.csv.
Combine the results from both models for final submission.
python src/combine.pyThis will generate the final submission.csv.