A Python package to extract MS/MS spectra from Bruker TimsTOF .D folders and convert them to standard formats (MS2, MGF, and mzML).
pip install tdfextractortdfextractor provides two command-line tools for extracting spectra:
Extract MS2 format files (compatible with MS-GF+, Comet, etc.):
ms2-extractor /path/to/sample.d
# shorthand
ms2-ex
ms2-ex /path/to/sample.d --output custom_output.ms2 --min-intensity 100 --min-charge 2
ms2-ex /path/to/directory_with_multiple_d_folders --output /path/to/output_directoryExtract MGF format files
mgf-extractor /path/to/sample.d
#shorthand
mgf-ex
mgf-ex /path/to/sample.d --casanovo # Optimized for Casanovo de novo sequencing
mgf-ex /path/to/directory_with_multiple_d_folders --output /path/to/output_directoryExtract mzML format files (includes both MS1 and MS2 PASEF spectra):
mzml-extractor /path/to/sample.d
# shorthand
mzml-ex /path/to/sample.d
mzml-ex /path/to/sample.d --no-ms1 # MS2 spectra only
mzml-ex /path/to/sample.d --mz-compression zstd --intensity-encoding 32
mzml-ex /path/to/directory_with_multiple_d_folders --output /path/to/output_directoryBoth extractors support flexible output options:
- No output specified: Files are created within each .D folder with auto-generated names
- Specific file path: Use
-o filename.ms2or-o filename.mgffor single .D folder processing - Output directory: Use
-o /path/to/output_dirfor batch processing multiple .D folders - Overwrite protection: Use
--overwriteto replace existing output files
When processing multiple .D folders, the extractors will:
- Automatically find all .D folders in the specified directory
- Create output files with names matching the .D folder names
- Skip existing files unless
--overwriteis specified - Create the output directory if it doesn't exist
Both MS2 and MGF extractors share the same arguments, with only a few format-specific options:
| Argument | Type | Default | Description |
|---|---|---|---|
analysis_dir |
str | - | Path to the .D analysis directory or directory containing .D folders |
-o, --output |
str | <analysis_dir_name>.<ext> |
Output file path or directory |
--remove-precursor |
flag | False | Remove precursor peaks from MS/MS spectra |
--precursor-peak-width |
float | 2.0 | Width around precursor m/z to remove (Da) |
--batch-size |
int | 100 | Batch size for processing spectra |
--top-n-peaks |
int | None | Keep only top N most intense peaks per spectrum |
--min-spectra-intensity |
float | None | Minimum intensity threshold for MS/MS peaks (absolute or 0.0-1.0 for percentage) |
--max-spectra-intensity |
float | None | Maximum intensity threshold for MS/MS peaks (absolute or 0.0-1.0 for percentage) |
--min-spectra-mz |
float | None | Minimum m/z filter for MS/MS peaks |
--max-spectra-mz |
float | None | Maximum m/z filter for MS/MS peaks |
--min-precursor-intensity |
float | None | Minimum precursor intensity filter |
--max-precursor-intensity |
float | None | Maximum precursor intensity filter |
--min-precursor-charge |
int | None | Minimum precursor charge state filter |
--max-precursor-charge |
int | None | Maximum precursor charge state filter |
--min-precursor-mz |
float | None | Minimum precursor m/z filter |
--max-precursor-mz |
float | None | Maximum precursor m/z filter |
--min-precursor-rt |
float | None | Minimum precursor retention time filter (seconds) |
--max-precursor-rt |
float | None | Maximum precursor retention time filter (seconds) |
--min-precursor-ccs |
float | None | Minimum precursor CCS filter |
--max-precursor-ccs |
float | None | Maximum precursor CCS filter |
--min-precursor-neutral-mass |
float | None | Minimum precursor neutral mass filter |
--max-precursor-neutral-mass |
float | None | Maximum precursor neutral mass filter |
--mz-precision |
int | 5 | Number of decimal places for m/z values |
--intensity-precision |
int | 0 | Number of decimal places for intensity values |
--keep-empty-spectra |
flag | False | Write empty spectra to output file |
--overwrite |
flag | False | Overwrite existing output files |
--workers |
int | 1 | Number of worker threads for processing multiple .d folders |
-v, --verbose |
flag | False | Enable verbose logging |
MS2 Extractor Only:
--ip2: Use IP2 preset settings (sets min charge to 2, top 500 peaks)
MGF Extractor Only:
--casanovo: Use Casanovo preset settings (enables precursor removal, top-150 peaks, min intensity 0.01, m/z range 50-2500, min charge 2)
mzML Extractor Only:
| Argument | Type | Default | Description |
|---|---|---|---|
--no-ms1 |
flag | False | Skip MS1 spectra; write only MS2 PASEF spectra |
--mz-compression |
str | zlib |
Compression for m/z arrays (none, zlib, zstd, numpress-linear, numpress-slof, numpress-pic) |
--intensity-compression |
str | zlib |
Compression for intensity arrays |
--mobility-compression |
str | zlib |
Compression for per-peak ion mobility arrays (MS1) |
--mz-encoding |
int | 64 |
Bit width for m/z values (32 or 64) |
--intensity-encoding |
int | 32 |
Bit width for intensity values (32 or 64) |
--centroid-noise-filter |
str | none |
Noise filter before centroiding (none, mad, percentile, histogram, baseline, iterative_median) |
--centroid-mz-tolerance |
float | 8.0 |
m/z tolerance for centroiding |
--centroid-mz-tolerance-type |
str | ppm |
Unit for m/z tolerance (ppm or da) |
--centroid-im-tolerance |
float | 0.05 |
Ion mobility tolerance for centroiding |
--centroid-im-tolerance-type |
str | relative |
Unit for ion mobility tolerance (relative or absolute) |
--centroid-min-peaks |
int | 5 |
Minimum raw peaks required to form a centroided peak |
The --workers argument allows parallel processing of multiple .d folders:
# Process multiple .d folders with 4 worker threads
mgf-ex /path/to/directory_with_multiple_d_folders --workers 4Note: Workers only affect processing when multiple .d folders are being processed simultaneously. Each worker processes one complete .d folder independently.