Add MZMine metabolomics vignette and re-export MZMinetoMSstatsFormat by swaraj-neu · Pull Request #211 · Vitek-Lab/MSstats

swaraj-neu · 2026-06-18T02:49:39Z

Motivation and Context

Please include relevant motivation and context of the problem along with a short summary of the solution.

Changes

Please provide a detailed bullet point list of your changes.

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

I have read the MSstats contributing guidelines
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules
I have run the devtools::document() command after my changes and committed the added files

Motivation and Context

MSstats needs a clear, end-to-end example for untargeted metabolomics workflows starting from MZMine (feature quantification with spectral-library compound names) and SIRIUS (structure-identification outputs). This PR makes the MZMine → MSstats conversion entry point (MZMinetoMSstatsFormat) directly available from the MSstats namespace and adds a vignette that demonstrates the full workflow: conversion, compound-level summarization, differential abundance testing, and visualization—using provided MSstatsConvert fixtures and explaining how ProteinName is assigned when MZMine matches are missing.

Detailed Changes

NAMESPACE
- Exported MZMinetoMSstatsFormat.
- Added importFrom(MSstatsConvert, MZMinetoMSstatsFormat) so MSstats users can call the function without separately attaching MSstatsConvert.
R/converters.R
- Added import/re-export wiring so MZMinetoMSstatsFormat is available under the MSstatsConvert::MZMinetoMSstatsFormat compatibility layer from within the MSstats package.
man/reexports.Rd
- Updated reexports documentation to include MZMinetoMSstatsFormat in the MSstatsConvert conversion functions list (new \alias{MZMinetoMSstatsFormat} and corresponding documentation entry).
vignettes/MSstatsMetabolomics.Rmd
- Added a new vignette (vignettes/MSstatsMetabolomics.Rmd) covering:
  - Loading example MZMine feature quantifications + sample/condition annotations + spectral-library match tables + SIRIUS structure-identification mappings from MSstatsConvert fixtures.
  - Running conversion via MZMinetoMSstatsFormat(..., use_log_file = FALSE).
  - Documenting ProteinName assignment precedence:
    - best MZMine spectral-library compound name
    - if unmatched, SIRIUS name
    - if still unavailable, m/z_RT fallback identifier
  - Running dataProcess for compound-level summarization with:
    - logTrans = 2
    - normalization = "equalizeMedians"
    - featureSubset = "all"
    - summaryMethod = "TMP"
    - censoredInt = "NA"
    - MBimpute = TRUE
    - use_log_file = FALSE
  - Building a Control vs Treatment contrast matrix and running groupComparison(use_log_file = FALSE), with explanation of result interpretation via key columns such as log2FC, pvalue, adj.pvalue, and issue (compounds not testable normally).
  - Demonstrating plotting with dataProcessPlots (ProfilePlot) and groupComparisonPlots (VolcanoPlot, with eval = FALSE).
- Note: the vignette, as described in the change set, does not include a dedicated “Lactate caveat” subsection; it only reports that issue is NA in the shown fixture.

Unit Tests

No unit tests were added or modified. The primary validation described is the runnable vignette workflow using included fixtures.

Coding Guidelines

No coding guideline violations were identified from the provided change set.

coderabbitai · 2026-06-18T02:49:53Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8b83cff4-e3b7-4686-b20c-3c2be3490b79

📥 Commits

Reviewing files that changed from the base of the PR and between c0d11a3 and 51b7657.

📒 Files selected for processing (1)

vignettes/MSstatsMetabolomics.Rmd

📝 Walkthrough

Walkthrough

MZMinetoMSstatsFormat from MSstatsConvert is added as a re-exported symbol in MSstats via NAMESPACE, R/converters.R, and man/reexports.Rd. A new vignette MSstatsMetabolomics.Rmd documents an end-to-end metabolomics workflow using MZMine and SIRIUS outputs through conversion, summarization, differential testing, and visualization.

Changes

MZMinetoMSstatsFormat re-export and metabolomics vignette

Layer / File(s)	Summary
Re-export wiring for MZMinetoMSstatsFormat `R/converters.R`, `NAMESPACE`, `man/reexports.Rd`	Adds `@export`/`@importFrom` binding in `R/converters.R`, the corresponding `export()` and `importFrom()` directives in `NAMESPACE`, and a new alias plus item link in `man/reexports.Rd`.
Vignette introduction and data setup `vignettes/MSstatsMetabolomics.Rmd`	Vignette header, knitr configuration, and introduction to the LC-MS metabolomics workflow; loads example fixture tables (MZMine feature quantifications, sample annotation, spectral-library matches, SIRIUS identifications) from `MSstatsConvert`.
MZMinetoMSstatsFormat conversion `vignettes/MSstatsMetabolomics.Rmd`	Runs `MZMinetoMSstatsFormat` to convert loaded data into MSstats format, explaining `ProteinName` assignment precedence and feature retention behavior.
Data summarization with dataProcess `vignettes/MSstatsMetabolomics.Rmd`	Applies `dataProcess` with log2 transformation, equalizeMedians normalization, TMP summarization, and MBimpute; demonstrates compound-level aggregation and fixture-driven imputation behavior.
Differential testing with groupComparison `vignettes/MSstatsMetabolomics.Rmd`	Constructs a Control-vs-Treatment contrast and runs `groupComparison`, describing result column interpretation with all tested compounds showing `issue` as `NA` in the fixture.
Visualization examples and references `vignettes/MSstatsMetabolomics.Rmd`	Demonstrates profile and volcano plots using `dataProcessPlots` and `groupComparisonPlots`; includes MSI reporting standards references and session information.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant MSstatsConvert
  participant MSstats

  User->>MSstatsConvert: system.file() — retrieve MZMine, annotation, library, SIRIUS CSVs
  User->>MSstatsConvert: MZMinetoMSstatsFormat(mzmine, annotation, library, sirius)
  MSstatsConvert-->>User: MSstats-format data frame
  User->>MSstats: dataProcess(converted, logTrans=2, normalization="equalizeMedians", MBimpute=TRUE)
  MSstats-->>User: FeatureLevelData + ProteinLevelData
  User->>MSstats: groupComparison(contrast.matrix, summarized)
  MSstats-->>User: ComparisonResult (log2FC, pvalue, adj.pvalue, issue)
  User->>MSstats: dataProcessPlots(type="ProfilePlot")
  User->>MSstats: groupComparisonPlots(type="VolcanoPlot", eval=FALSE)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A MZMine path now hops into view,
Re-exported with care, fresh and new.
The vignette unfolds, step by step it goes,
From features to proteins, the workflow flows.
With caffeine and lactate, the rabbit takes note —
MSstats for metabolites, worthy of quote! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description contains only the template with all sections left blank or unchecked, lacking any motivation, detailed changes list, testing information, or checklist completion.	Fill in all required sections: explain the metabolomics workflow being documented, list the specific changes (NAMESPACE, converters.R, reexports.Rd, new vignette), describe testing performed, and check all pre-review checklist items.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main changes: adding a metabolomics vignette and re-exporting a converter function, which matches the file alterations across NAMESPACE, R/converters.R, documentation, and the new vignette.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch MSstats/work/20260617_metabolomics_vignette

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

github-actions · 2026-06-18T02:50:34Z

Failed to generate code suggestions for PR

coderabbitai

🧹 Nitpick comments (1)

vignettes/MSstatsMetabolomics.Rmd (1)
39-39: 💤 Low value

Consider adding a reference for the MSI levels citation.

Line 39 references "Sumner et al., 2007" for the MSI (Metabolomics Standards Initiative) identification levels. While informal citations are acceptable in vignettes, adding a brief reference or URL would help readers locate the source document if they want to learn more about the classification system.
📚 Optional addition

You could add a references section at the end:
## __References__

Sumner, L.W., Amberg, A., Barrett, D., et al. (2007). Proposed minimum reporting standards for chemical analysis. _Metabolomics_, 3, 211-221. https://doi.org/10.1007/s11306-007-0082-2
Or simply add the DOI inline:
-  correspond to MSI Level 2 putative identifications (Sumner et al., 2007).
+  correspond to MSI Level 2 putative identifications (Sumner et al., 2007, https://doi.org/10.1007/s11306-007-0082-2).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vignettes/MSstatsMetabolomics.Rmd` at line 39, Add a formal reference for the
Sumner et al., 2007 citation mentioned in relation to MSI Level 2 identification
levels. Either create a References section at the end of the vignette document
with the complete citation details (authors, year, title, journal, volume,
pages, and DOI), or add the DOI inline with the existing citation at the
location where MSI levels are mentioned. This will help readers locate the
source document and understand the classification system better.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@vignettes/MSstatsMetabolomics.Rmd`:
- Line 39: Add a formal reference for the Sumner et al., 2007 citation mentioned
in relation to MSI Level 2 identification levels. Either create a References
section at the end of the vignette document with the complete citation details
(authors, year, title, journal, volume, pages, and DOI), or add the DOI inline
with the existing citation at the location where MSI levels are mentioned. This
will help readers locate the source document and understand the classification
system better.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 216978bb-d16b-4477-bf79-31e524d04a99

📥 Commits

Reviewing files that changed from the base of the PR and between 5b07ebb and 0952095.

📒 Files selected for processing (4)

NAMESPACE
R/converters.R
man/reexports.Rd
vignettes/MSstatsMetabolomics.Rmd

tonywu1999 · 2026-06-22T19:43:44Z

+```
+# __MSstats: Metabolomics workflow with MZMine__
+
+Author: MSstats Team


You can put your name here

tonywu1999 · 2026-06-22T19:44:33Z

+  correspond to MSI Level 2 putative identifications (Sumner et al., 2007).
+* __SIRIUS names__ come from in-silico structure prediction and correspond to
+  MSI Level 3 identifications. The SIRIUS pass extends discovery coverage to
+  features the spectral library does not cover.


"features the MZMine spectral library does not cover"

tonywu1999 · 2026-06-22T19:46:37Z

+head(mzmine_msstats)
+```
+
+`ProteinName` is assigned per feature in priority order: (1) the


Should also mention that ProteinName refers to the compound name (and that this will be changed in the future to use Analyte as that column instead)

tonywu1999 · 2026-06-22T19:47:42Z

+### Lactate caveat
+
+Lactate (feature 3) is missing one of its four measurements in this fixture, so its
+differential result is unreliable. That value is dropped rather than estimated, so Lactate is
+tested on three points and its degrees of freedom fall to 1, against 2 for the fully measured
+compounds. With so little data the variance estimate is unstable, which is why Lactate shows a
+very small standard error, a large t-statistic, and the only small p-value in the table. Treat
+it as an artifact of the tiny example, not a real difference.


This should appear in the end after differential abundance analysis is performed (i.e. after groupComparison)

tonywu1999 · 2026-06-22T19:48:25Z

+head(summarized$ProteinLevelData)
+```
+
+The settings above mirror the `MSstatsWorkflow` vignette: log-2 transform,


I'd say they mirror a typical discovery proteomics workflow

tonywu1999 · 2026-06-22T19:49:07Z

+
+## __5. Test for differences with `groupComparison`__
+
+With two conditions in the design, a single Control-vs-Treatment contrast


I would construct the contrast matrix here rather than using "pairwise" just to show people how it's done.

tonywu1999 · 2026-06-22T19:49:45Z

+Each row of `ComparisonResult` is one compound (or `m/z_RT` fallback) tested
+against the contrast. Columns of interest: `log2FC`, `pvalue`, and
+`adj.pvalue`. The `issue` column flags compounds that could not be tested
+normally, for example one missing from an entire condition; in this small


issue is NA for every compound shown you mean?

tonywu1999 · 2026-06-22T19:50:06Z

+
+## __6. Visualization__
+
+Profile plots show feature-level intensities alongside the protein-level


compound-level summary, not protein-level summary.

tonywu1999 · 2026-06-22T19:50:52Z

+
+For a study-wide view of fold-change versus significance, pass the
+`groupComparison` result to `groupComparisonPlots`. On a four-sample fixture
+the volcano is sparse; on a real metabolomics dataset it is the standard


on a real metabolomics dataset it is the standard summary plot - this can be removed.

tonywu1999 · 2026-06-23T12:37:15Z

+contrast_matrix = matrix(c(1, -1), nrow = 1)
+colnames(contrast_matrix) = c("Control", "Treatment")
+rownames(contrast_matrix) = "Control vs Treatment"


I'd make -1 control and 1 for treatment. Then the comparison name is treatment vs control

tonywu1999 · 2026-06-23T12:44:18Z

+### Lactate caveat
+
+Lactate (feature 3) is missing one of its four measurements in this fixture, so its
+differential result is unreliable. That value is dropped rather than estimated, so Lactate is
+tested on three points and its degrees of freedom fall to 1, against 2 for the fully measured
+compounds. With so little data the variance estimate is unstable, which is why Lactate shows a
+very small standard error, a large t-statistic, and the only small p-value in the table. Treat
+it as an artifact of the tiny example, not a real difference.


Thinking about it, let's get rid of this for now, people will run their own datasets anyways so this caveat may confuse them more.

Add MZMine metabolomics vignette and re-export converter

0952095

swaraj-neu requested a review from tonywu1999 June 18, 2026 02:49

swaraj-neu self-assigned this Jun 18, 2026

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

Add reference for MSI levels citation

3369708

tonywu1999 reviewed Jun 22, 2026

View reviewed changes

Address PR reviews for the metabolomics Vignette

c0d11a3

tonywu1999 reviewed Jun 23, 2026

View reviewed changes

Address PR review follow-ups in Metabolomics vignette

51b7657

tonywu1999 merged commit 87f788e into devel Jun 24, 2026
2 checks passed

tonywu1999 deleted the MSstats/work/20260617_metabolomics_vignette branch June 24, 2026 12:36


		## __5. Test for differences with `groupComparison`__

		With two conditions in the design, a single Control-vs-Treatment contrast


		## __6. Visualization__

		Profile plots show feature-level intensities alongside the protein-level

Uh oh!

Conversation

swaraj-neu commented Jun 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Changes

Testing

Checklist Before Requesting a Review

Motivation and Context

Detailed Changes

Unit Tests

Coding Guidelines

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

swaraj-neu commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading