Accompanying code for: Subsequence SDTW: Differentiable Alignment with Flexible Boundary Conditions
Johannes Zeitler (johannes.zeitler@audiolabs-erlangen.de)
International Audio Laboratories Erlangen
February 2026
This repository contains code to reproduce all experiments in the paper. The main notebooks are:
- train_strong.ipynb: training with strongly aligned targets
- train_SDTW_noMismatch.ipynb: training with standard SDTW and no boundary mismatch
- train_SDTW.ipynb: training with standard SDTW and boundary mismatch
- train_subSDTW.ipynb: training with subSDTW and boundary mismatch
- train_subSDTW-W.ipynb: training with weighted subSDTW and boundary mismatch
- eval.ipynb: compute evaluation metrics
Additionally, the following files/folders are contained:
- data/: Open-domain subset of the BPSD. It's not sufficient to reproduce the paper results, but it provides a functional codebase. Audio is corrected in tuning to A4=440Hz and resampled to 16kHz flac
- dataset_weakLabels.py: provides dataset class for weakly aligned score-audio pairs for the BPSD dataset
- midi.py: some helper functions for MIDI parsing
- onsets_and_frames/: pytorch onsets-and-frames implementation from https://github.com/jongwook/onsets-and-frames
- prepare_weak_targets.ipynb: pre-compute weak target representations in musical and physical time from the BPSD annotations.
- pretrained_model.pt: A transcriber pretrained on the MAESTRO dataset
- SDTW.py: standard SDTW
- subSDTW.py: subsequence SDTW without weight penalty
- subSDTW_W.py: subsequence SDTW with weight penalty
To reduce the memory footprint of this repository, we do not include all training datasets. The MAESTRO (https://magenta.withgoogle.com/datasets/maestro) and BPSD (https://doi.org/10.5281/zenodo.10847702) datasets need to be acquired separately. For the BPSD dataset, we use an audio version that was corrected to A4=440Hz tuning
please cite our paper
Johannes Zeitler and Meinard Müller. Subsequence SDTW: Differentiable Alignment with Flexible Boundary Conditions. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, 2026.