A Python package for extracting data from Bruker timsTOF data files (.tdf and .tdf_bin). Includes a Numba-accelerated centroiding algorithm for efficient extraction of ion mobility data.
tdfpy provides a high-level Python API for reading Bruker timsTOF .d folders. It handles DDA, DIA, and PRM acquisition modes and exposes familiar Python objects — no need to think about raw PASEF frames or SQLite queries.
- DDA — iterate MS1 frames and precursors (MS2 spectra)
- DIA — iterate MS1 frames and DIA isolation windows
- Centroiding — Numba-accelerated peak merging across the m/z and ion mobility dimensions, returning
(N, 3)arrays of[m/z, intensity, 1/K0] - Lazy spectral access — frame metadata is loaded upfront; raw peak data is only read when you call
.peaksor.centroid()
pip install tdfpyRequires Python 3.12+. The Bruker libtimsdata native library is bundled in the wheel (Linux).
from tdfpy import DDA, DIA, PRM
# DDA acquisition
with DDA("sample.d") as dda:
for frame in dda.ms1:
peaks = frame.centroid() # shape (N, 3): [m/z, intensity, 1/K0]
for precursor in dda.precursors:
print(precursor.largest_peak_mz, precursor.charge)
peaks = precursor.peaks # centroided MS2 via Bruker's algorithm
# DIA acquisition
with DIA("sample.d") as dia:
for frame in dia.ms1:
peaks = frame.centroid()
for window in dia.windows:
print(window.isolation_mz, window.isolation_width)
peaks = window.centroid()
# PRM acquisition
with PRM("sample.d") as prm:
for target in prm.targets:
print(target.monoisotopic_mz, target.charge)
for transition in prm.transitions:
print(transition.isolation_mz, transition.collision_energy)
peaks = transition.peaks # shape (N, 2): [m/z, intensity]Frames, precursors, and windows can be accessed by ID or queried by m/z and retention time:
with DDA("sample.d") as dda:
frame = dda.ms1[1] # by frame ID
precursor = dda.precursors[1] # by precursor ID
# query by m/z and RT window
hits = dda.precursors.query(
mz=1292.63,
mz_tolerance=20.0, # ppm
rt=2400.0, # seconds
rt_tolerance=30.0,
)frame.centroid() and window.centroid() accept parameters to control the peak merging:
peaks = frame.centroid(
mz_tolerance=8, # ppm (default)
mz_tolerance_type="ppm", # or "da"
im_tolerance=0.1, # relative (default); fraction of the 1/K0 value
im_tolerance_type="relative", # or "absolute"
min_peaks=3, # minimum raw peaks to form a centroid
noise_filter=None, # optional: "mad", "percentile", "histogram", etc.
ion_mobility_type="ook0", # or "ccs" / "voltage"
)The noise_filter option estimates a noise threshold from the intensity distribution and
discards centroids below it. In practice this can be too aggressive: intensity-based methods
like "mad" cannot distinguish low-abundance real signal from noise, and will remove both.
A more reliable approach is to increase min_peaks instead. A centroid only forms when at
least min_peaks raw peaks fall within the m/z and ion mobility window. Because electronic
noise is typically a singleton in a single scan, raising min_peaks to 4 or 5 removes
noise without penalising low-abundance peaks that appear consistently across scans.
# Prefer this over noise_filter for removing noise
peaks = frame.centroid(min_peaks=5)
# Use noise_filter only when you have a calibrated threshold or a specific method
# that suits your data, and verify it against a no-filter baseline first.
peaks = frame.centroid(noise_filter="iterative_median", min_peaks=3)You can also call merge_peaks directly on your own arrays:
from tdfpy import merge_peaks
import numpy as np
peaks = merge_peaks(mz_array, intensity_array, ion_mobility_array, mz_tolerance=10)Full documentation at tacular-omics.github.io/tdfpy