Skip to content

tacular-omics/tdfpy

Repository files navigation

TDFpy Logo

A Python package for extracting data from Bruker timsTOF data files (.tdf and .tdf_bin). Includes a Numba-accelerated centroiding algorithm for efficient extraction of ion mobility data.

Python package codecov PyPI version Python 3.12+ License: MIT

Overview

tdfpy provides a high-level Python API for reading Bruker timsTOF .d folders. It handles DDA, DIA, and PRM acquisition modes and exposes familiar Python objects — no need to think about raw PASEF frames or SQLite queries.

  • DDA — iterate MS1 frames and precursors (MS2 spectra)
  • DIA — iterate MS1 frames and DIA isolation windows
  • Centroiding — Numba-accelerated peak merging across the m/z and ion mobility dimensions, returning (N, 3) arrays of [m/z, intensity, 1/K0]
  • Lazy spectral access — frame metadata is loaded upfront; raw peak data is only read when you call .peaks or .centroid()

Installation

pip install tdfpy

Requires Python 3.12+. The Bruker libtimsdata native library is bundled in the wheel (Linux).

Quick Start

from tdfpy import DDA, DIA, PRM

# DDA acquisition
with DDA("sample.d") as dda:
    for frame in dda.ms1:
        peaks = frame.centroid()  # shape (N, 3): [m/z, intensity, 1/K0]

    for precursor in dda.precursors:
        print(precursor.largest_peak_mz, precursor.charge)
        peaks = precursor.peaks  # centroided MS2 via Bruker's algorithm

# DIA acquisition
with DIA("sample.d") as dia:
    for frame in dia.ms1:
        peaks = frame.centroid()

    for window in dia.windows:
        print(window.isolation_mz, window.isolation_width)
        peaks = window.centroid()

# PRM acquisition
with PRM("sample.d") as prm:
    for target in prm.targets:
        print(target.monoisotopic_mz, target.charge)

    for transition in prm.transitions:
        print(transition.isolation_mz, transition.collision_energy)
        peaks = transition.peaks  # shape (N, 2): [m/z, intensity]

Lookups and Queries

Frames, precursors, and windows can be accessed by ID or queried by m/z and retention time:

with DDA("sample.d") as dda:
    frame = dda.ms1[1]           # by frame ID
    precursor = dda.precursors[1]  # by precursor ID

    # query by m/z and RT window
    hits = dda.precursors.query(
        mz=1292.63,
        mz_tolerance=20.0,   # ppm
        rt=2400.0,           # seconds
        rt_tolerance=30.0,
    )

Centroiding Options

frame.centroid() and window.centroid() accept parameters to control the peak merging:

peaks = frame.centroid(
    mz_tolerance=8,               # ppm (default)
    mz_tolerance_type="ppm",      # or "da"
    im_tolerance=0.1,             # relative (default); fraction of the 1/K0 value
    im_tolerance_type="relative", # or "absolute"
    min_peaks=3,                  # minimum raw peaks to form a centroid
    noise_filter=None,            # optional: "mad", "percentile", "histogram", etc.
    ion_mobility_type="ook0",     # or "ccs" / "voltage"
)

Noise filtering vs min_peaks

The noise_filter option estimates a noise threshold from the intensity distribution and discards centroids below it. In practice this can be too aggressive: intensity-based methods like "mad" cannot distinguish low-abundance real signal from noise, and will remove both.

A more reliable approach is to increase min_peaks instead. A centroid only forms when at least min_peaks raw peaks fall within the m/z and ion mobility window. Because electronic noise is typically a singleton in a single scan, raising min_peaks to 4 or 5 removes noise without penalising low-abundance peaks that appear consistently across scans.

# Prefer this over noise_filter for removing noise
peaks = frame.centroid(min_peaks=5)

# Use noise_filter only when you have a calibrated threshold or a specific method
# that suits your data, and verify it against a no-filter baseline first.
peaks = frame.centroid(noise_filter="iterative_median", min_peaks=3)

You can also call merge_peaks directly on your own arrays:

from tdfpy import merge_peaks
import numpy as np

peaks = merge_peaks(mz_array, intensity_array, ion_mobility_array, mz_tolerance=10)

Documentation

Full documentation at tacular-omics.github.io/tdfpy

About

Python package for parsing Bruker timsTOF data with centroiding and noise filtering

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors