-
Notifications
You must be signed in to change notification settings - Fork 52
Add MZMine metabolomics vignette and re-export MZMinetoMSstatsFormat #211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
tonywu1999
merged 4 commits into
devel
from
MSstats/work/20260617_metabolomics_vignette
Jun 24, 2026
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
0952095
Add MZMine metabolomics vignette and re-export converter
swaraj-neu 3369708
Add reference for MSI levels citation
swaraj-neu c0d11a3
Address PR reviews for the metabolomics Vignette
swaraj-neu 51b7657
Address PR review follow-ups in Metabolomics vignette
swaraj-neu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,189 @@ | ||
| --- | ||
| title: "MSstats: Metabolomics workflow with MZMine" | ||
| date: June 17th, 2026 | ||
| --- | ||
|
|
||
|
|
||
| ```{r style, echo = FALSE, results = 'asis'} | ||
| BiocStyle::markdown() | ||
| ``` | ||
|
|
||
| ```{r global_options, include=FALSE} | ||
| knitr::opts_chunk$set(fig.width=10, fig.height=7, warning=FALSE, message=FALSE) | ||
| options(width=110) | ||
| ``` | ||
|
|
||
| ```{=html} | ||
| <!-- | ||
| %\VignetteIndexEntry{MSstats: Metabolomics workflow with MZMine} | ||
| %\VignetteEngine{knitr::knitr} | ||
| --> | ||
| ``` | ||
| # __MSstats: Metabolomics workflow with MZMine__ | ||
|
|
||
| Author: Swaraj Patil | ||
|
|
||
| Date: June 17th, 2026 | ||
|
|
||
| ## __Introduction__ | ||
|
|
||
| `MSstats` supports differential analysis of metabolomics data acquired with | ||
| LC-MS untargeted workflows. This vignette walks an end-to-end run: import | ||
| MZMine feature quantifications and library annotations, layer in SIRIUS | ||
| structure identifications, convert to the MSstats format, summarize features | ||
| into compound-level abundance, and test for differences between conditions. | ||
|
|
||
| Compound identification combines two evidence sources: | ||
|
|
||
| * __MZMine compound names__ come from MS/MS spectral-library matching and | ||
| correspond to MSI Level 2 putative identifications (Sumner et al., 2007). | ||
| * __SIRIUS names__ come from in-silico structure prediction and correspond to | ||
| MSI Level 3 identifications. The SIRIUS pass extends discovery coverage to | ||
| features the MZMine spectral library does not cover. | ||
|
|
||
| `MZMinetoMSstatsFormat` is re-exported from `MSstatsConvert`, so attaching | ||
| `MSstats` alone is enough to run the full workflow. | ||
|
|
||
| ## __1. Setup__ | ||
|
|
||
| ```{r setup} | ||
| library(MSstats) | ||
| library(data.table) | ||
| ``` | ||
|
|
||
| ## __2. Load example data__ | ||
|
|
||
| Example MZMine input, sample annotation, MZMine library annotations, and | ||
| SIRIUS structure identifications ship with `MSstatsConvert` and are loaded | ||
| via `system.file()`. | ||
|
|
||
| ```{r load-data} | ||
| input_path = system.file("tinytest/raw_data/MZMine/mzmine_input.csv", | ||
| package = "MSstatsConvert") | ||
| annotation_path = system.file("tinytest/raw_data/MZMine/annotation.csv", | ||
| package = "MSstatsConvert") | ||
| mzmine_ann_path = system.file("tinytest/raw_data/MZMine/mzmine_annotations.csv", | ||
| package = "MSstatsConvert") | ||
| sirius_path = system.file("tinytest/raw_data/MZMine/structure_identifications.tsv", | ||
| package = "MSstatsConvert") | ||
|
|
||
| mzmine_input = data.table::fread(input_path) | ||
| annotation = data.table::fread(annotation_path) | ||
| mzmine_annotations = data.table::fread(mzmine_ann_path) | ||
| sirius_annotations = data.table::fread(sirius_path) | ||
|
|
||
| head(mzmine_input, 5) | ||
| head(annotation) | ||
| head(mzmine_annotations) | ||
| head(sirius_annotations) | ||
| ``` | ||
|
|
||
| The MZMine feature table is wide: one row per feature, columns `row ID`, | ||
| `row m/z`, `row retention time`, and per-sample `"<run> Peak area"` columns. | ||
| The annotation table maps each MS run to its `Condition` and `BioReplicate`. | ||
| `mzmine_annotations` is the spectral-library match table | ||
| (`id`, `compound_name`, `score`, `adduct`); features with multiple library | ||
| hits resolve to the highest-scoring compound. `sirius_annotations` is | ||
| SIRIUS's `structure_identifications.tsv`; its `mappingFeatureId` joins to | ||
| `row ID` in the MZMine input. | ||
|
|
||
| ## __3. Convert with `MZMinetoMSstatsFormat`__ | ||
|
|
||
| ```{r convert, message = FALSE} | ||
| mzmine_msstats = MZMinetoMSstatsFormat( | ||
| input = mzmine_input, | ||
| annotation = annotation, | ||
| mzmine_annotations = mzmine_annotations, | ||
| sirius_annotations = sirius_annotations, | ||
| use_log_file = FALSE | ||
| ) | ||
| head(mzmine_msstats) | ||
| ``` | ||
|
|
||
| `ProteinName` is assigned per feature in priority order: (1) the | ||
| highest-scoring MZMine compound name when present, (2) the SIRIUS name when | ||
| MZMine has no match, (3) an `m/z_RT` fallback identifier for features | ||
| neither source identified. Every feature is retained -- discovery coverage | ||
| is preserved at the cost of a wider multiple-testing burden in Section 5. Although the column is named `ProteinName` for compatibility with the rest of MSstats, here it holds the compound (analyte) name; a future release will expose it as `Analyte` for metabolomics data. | ||
|
|
||
| ## __4. Summarize with `dataProcess`__ | ||
|
|
||
| ```{r summarize, message = FALSE} | ||
| summarized = dataProcess( | ||
| mzmine_msstats, | ||
| logTrans = 2, | ||
| normalization = "equalizeMedians", | ||
| featureSubset = "all", | ||
| summaryMethod = "TMP", | ||
| censoredInt = "NA", | ||
| MBimpute = TRUE, | ||
| use_log_file = FALSE | ||
| ) | ||
| head(summarized$FeatureLevelData) | ||
| head(summarized$ProteinLevelData) | ||
| ``` | ||
|
|
||
| The settings above mirror a typical discovery proteomics workflow: log-2 transform, | ||
| median-equalized normalization, all features used, and Tukey median polish | ||
| summarization. Model-based imputation is enabled (`MBimpute = TRUE`), but no | ||
| values are imputed in this small example. Caffeine is detected at two adducts (`[M+H]+` on feature 1, | ||
| `[M+Na]+` on feature 6) and is summarized into a single compound-level | ||
| abundance per run. | ||
|
|
||
| ## __5. Test for differences with `groupComparison`__ | ||
|
|
||
| We construct the contrast matrix by hand to show how a comparison is specified. | ||
| The two-condition design admits a single Treatment-vs-Control contrast: | ||
|
|
||
| ```{r contrast, message = FALSE} | ||
| # A contrast matrix has one row per comparison and one column per condition. | ||
| # Columns must match the condition levels (alphabetical here: Control, Treatment). | ||
| # The -1 / +1 pair selects the two groups being compared. | ||
| contrast_matrix = matrix(c(-1, 1), nrow = 1) | ||
| colnames(contrast_matrix) = c("Control", "Treatment") | ||
| rownames(contrast_matrix) = "Treatment vs Control" | ||
| contrast_matrix | ||
|
|
||
| comparison = groupComparison(contrast.matrix = contrast_matrix, | ||
| data = summarized, use_log_file = FALSE) | ||
| comparison$ComparisonResult | ||
| ``` | ||
|
|
||
| Each row of `ComparisonResult` is one compound (or `m/z_RT` fallback) tested | ||
| against the contrast. Columns of interest: `log2FC`, `pvalue`, and | ||
| `adj.pvalue`. The `issue` column flags compounds that could not be tested | ||
| normally, for example one missing from an entire condition; in this small | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. issue is |
||
| fixture the `issue` column is `NA` for every compound shown. | ||
|
|
||
| ## __6. Visualization__ | ||
|
|
||
| Profile plots show feature-level intensities alongside the compound-level | ||
| summary. Caffeine is identified at two adducts in this dataset and is | ||
| summarized into a single compound -- the profile plot makes that aggregation | ||
| visible. | ||
|
|
||
| ```{r profile, fig.width = 8, fig.height = 5} | ||
| dataProcessPlots(summarized, | ||
| type = "ProfilePlot", | ||
| which.Protein = "Caffeine", | ||
| address = FALSE) | ||
| ``` | ||
|
|
||
| For a study-wide view of fold-change versus significance, pass the | ||
| `groupComparison` result to `groupComparisonPlots`. On a four-sample fixture the volcano is sparse. | ||
|
|
||
| ```{r volcano, eval = FALSE} | ||
| groupComparisonPlots(data = comparison$ComparisonResult, | ||
| type = "VolcanoPlot", | ||
| address = FALSE) | ||
| ``` | ||
|
|
||
| ## __References__ | ||
|
|
||
| Sumner LW, Amberg A, Barrett D, et al. (2007). Proposed minimum reporting standards for | ||
| chemical analysis: Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative | ||
| (MSI). *Metabolomics* 3(3): 211-221. doi: [10.1007/s11306-007-0082-2](https://doi.org/10.1007/s11306-007-0082-2) | ||
|
|
||
| ```{r session} | ||
| sessionInfo() | ||
| ``` | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also mention that
ProteinNamerefers to the compound name (and that this will be changed in the future to useAnalyteas that column instead)