Skip to content

Add MZMine metabolomics vignette and re-export MZMinetoMSstatsFormat#211

Merged
tonywu1999 merged 4 commits into
develfrom
MSstats/work/20260617_metabolomics_vignette
Jun 24, 2026
Merged

Add MZMine metabolomics vignette and re-export MZMinetoMSstatsFormat#211
tonywu1999 merged 4 commits into
develfrom
MSstats/work/20260617_metabolomics_vignette

Conversation

@swaraj-neu

@swaraj-neu swaraj-neu commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Motivation and Context

Please include relevant motivation and context of the problem along with a short summary of the solution.

Changes

Please provide a detailed bullet point list of your changes.

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • I have run the devtools::document() command after my changes and committed the added files

Motivation and Context

MSstats needs a clear, end-to-end example for untargeted metabolomics workflows starting from MZMine (feature quantification with spectral-library compound names) and SIRIUS (structure-identification outputs). This PR makes the MZMine → MSstats conversion entry point (MZMinetoMSstatsFormat) directly available from the MSstats namespace and adds a vignette that demonstrates the full workflow: conversion, compound-level summarization, differential abundance testing, and visualization—using provided MSstatsConvert fixtures and explaining how ProteinName is assigned when MZMine matches are missing.

Detailed Changes

  • NAMESPACE

    • Exported MZMinetoMSstatsFormat.
    • Added importFrom(MSstatsConvert, MZMinetoMSstatsFormat) so MSstats users can call the function without separately attaching MSstatsConvert.
  • R/converters.R

    • Added import/re-export wiring so MZMinetoMSstatsFormat is available under the MSstatsConvert::MZMinetoMSstatsFormat compatibility layer from within the MSstats package.
  • man/reexports.Rd

    • Updated reexports documentation to include MZMinetoMSstatsFormat in the MSstatsConvert conversion functions list (new \alias{MZMinetoMSstatsFormat} and corresponding documentation entry).
  • vignettes/MSstatsMetabolomics.Rmd

    • Added a new vignette (vignettes/MSstatsMetabolomics.Rmd) covering:
      • Loading example MZMine feature quantifications + sample/condition annotations + spectral-library match tables + SIRIUS structure-identification mappings from MSstatsConvert fixtures.
      • Running conversion via MZMinetoMSstatsFormat(..., use_log_file = FALSE).
      • Documenting ProteinName assignment precedence:
        • best MZMine spectral-library compound name
        • if unmatched, SIRIUS name
        • if still unavailable, m/z_RT fallback identifier
      • Running dataProcess for compound-level summarization with:
        • logTrans = 2
        • normalization = "equalizeMedians"
        • featureSubset = "all"
        • summaryMethod = "TMP"
        • censoredInt = "NA"
        • MBimpute = TRUE
        • use_log_file = FALSE
      • Building a Control vs Treatment contrast matrix and running groupComparison(use_log_file = FALSE), with explanation of result interpretation via key columns such as log2FC, pvalue, adj.pvalue, and issue (compounds not testable normally).
      • Demonstrating plotting with dataProcessPlots (ProfilePlot) and groupComparisonPlots (VolcanoPlot, with eval = FALSE).
    • Note: the vignette, as described in the change set, does not include a dedicated “Lactate caveat” subsection; it only reports that issue is NA in the shown fixture.

Unit Tests

  • No unit tests were added or modified. The primary validation described is the runnable vignette workflow using included fixtures.

Coding Guidelines

  • No coding guideline violations were identified from the provided change set.

@swaraj-neu swaraj-neu requested a review from tonywu1999 June 18, 2026 02:49
@swaraj-neu swaraj-neu self-assigned this Jun 18, 2026
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8b83cff4-e3b7-4686-b20c-3c2be3490b79

📥 Commits

Reviewing files that changed from the base of the PR and between c0d11a3 and 51b7657.

📒 Files selected for processing (1)
  • vignettes/MSstatsMetabolomics.Rmd

📝 Walkthrough

Walkthrough

MZMinetoMSstatsFormat from MSstatsConvert is added as a re-exported symbol in MSstats via NAMESPACE, R/converters.R, and man/reexports.Rd. A new vignette MSstatsMetabolomics.Rmd documents an end-to-end metabolomics workflow using MZMine and SIRIUS outputs through conversion, summarization, differential testing, and visualization.

Changes

MZMinetoMSstatsFormat re-export and metabolomics vignette

Layer / File(s) Summary
Re-export wiring for MZMinetoMSstatsFormat
R/converters.R, NAMESPACE, man/reexports.Rd
Adds @export/@importFrom binding in R/converters.R, the corresponding export() and importFrom() directives in NAMESPACE, and a new alias plus item link in man/reexports.Rd.
Vignette introduction and data setup
vignettes/MSstatsMetabolomics.Rmd
Vignette header, knitr configuration, and introduction to the LC-MS metabolomics workflow; loads example fixture tables (MZMine feature quantifications, sample annotation, spectral-library matches, SIRIUS identifications) from MSstatsConvert.
MZMinetoMSstatsFormat conversion
vignettes/MSstatsMetabolomics.Rmd
Runs MZMinetoMSstatsFormat to convert loaded data into MSstats format, explaining ProteinName assignment precedence and feature retention behavior.
Data summarization with dataProcess
vignettes/MSstatsMetabolomics.Rmd
Applies dataProcess with log2 transformation, equalizeMedians normalization, TMP summarization, and MBimpute; demonstrates compound-level aggregation and fixture-driven imputation behavior.
Differential testing with groupComparison
vignettes/MSstatsMetabolomics.Rmd
Constructs a Control-vs-Treatment contrast and runs groupComparison, describing result column interpretation with all tested compounds showing issue as NA in the fixture.
Visualization examples and references
vignettes/MSstatsMetabolomics.Rmd
Demonstrates profile and volcano plots using dataProcessPlots and groupComparisonPlots; includes MSI reporting standards references and session information.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant MSstatsConvert
  participant MSstats

  User->>MSstatsConvert: system.file() — retrieve MZMine, annotation, library, SIRIUS CSVs
  User->>MSstatsConvert: MZMinetoMSstatsFormat(mzmine, annotation, library, sirius)
  MSstatsConvert-->>User: MSstats-format data frame
  User->>MSstats: dataProcess(converted, logTrans=2, normalization="equalizeMedians", MBimpute=TRUE)
  MSstats-->>User: FeatureLevelData + ProteinLevelData
  User->>MSstats: groupComparison(contrast.matrix, summarized)
  MSstats-->>User: ComparisonResult (log2FC, pvalue, adj.pvalue, issue)
  User->>MSstats: dataProcessPlots(type="ProfilePlot")
  User->>MSstats: groupComparisonPlots(type="VolcanoPlot", eval=FALSE)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A MZMine path now hops into view,
Re-exported with care, fresh and new.
The vignette unfolds, step by step it goes,
From features to proteins, the workflow flows.
With caffeine and lactate, the rabbit takes note —
MSstats for metabolites, worthy of quote! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description contains only the template with all sections left blank or unchecked, lacking any motivation, detailed changes list, testing information, or checklist completion. Fill in all required sections: explain the metabolomics workflow being documented, list the specific changes (NAMESPACE, converters.R, reexports.Rd, new vignette), describe testing performed, and check all pre-review checklist items.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: adding a metabolomics vignette and re-exporting a converter function, which matches the file alterations across NAMESPACE, R/converters.R, documentation, and the new vignette.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch MSstats/work/20260617_metabolomics_vignette

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions

Copy link
Copy Markdown

Failed to generate code suggestions for PR

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
vignettes/MSstatsMetabolomics.Rmd (1)

39-39: 💤 Low value

Consider adding a reference for the MSI levels citation.

Line 39 references "Sumner et al., 2007" for the MSI (Metabolomics Standards Initiative) identification levels. While informal citations are acceptable in vignettes, adding a brief reference or URL would help readers locate the source document if they want to learn more about the classification system.

📚 Optional addition

You could add a references section at the end:

## __References__

Sumner, L.W., Amberg, A., Barrett, D., et al. (2007). Proposed minimum reporting standards for chemical analysis. _Metabolomics_, 3, 211-221. https://doi.org/10.1007/s11306-007-0082-2

Or simply add the DOI inline:

-  correspond to MSI Level 2 putative identifications (Sumner et al., 2007).
+  correspond to MSI Level 2 putative identifications (Sumner et al., 2007, https://doi.org/10.1007/s11306-007-0082-2).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vignettes/MSstatsMetabolomics.Rmd` at line 39, Add a formal reference for the
Sumner et al., 2007 citation mentioned in relation to MSI Level 2 identification
levels. Either create a References section at the end of the vignette document
with the complete citation details (authors, year, title, journal, volume,
pages, and DOI), or add the DOI inline with the existing citation at the
location where MSI levels are mentioned. This will help readers locate the
source document and understand the classification system better.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@vignettes/MSstatsMetabolomics.Rmd`:
- Line 39: Add a formal reference for the Sumner et al., 2007 citation mentioned
in relation to MSI Level 2 identification levels. Either create a References
section at the end of the vignette document with the complete citation details
(authors, year, title, journal, volume, pages, and DOI), or add the DOI inline
with the existing citation at the location where MSI levels are mentioned. This
will help readers locate the source document and understand the classification
system better.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 216978bb-d16b-4477-bf79-31e524d04a99

📥 Commits

Reviewing files that changed from the base of the PR and between 5b07ebb and 0952095.

📒 Files selected for processing (4)
  • NAMESPACE
  • R/converters.R
  • man/reexports.Rd
  • vignettes/MSstatsMetabolomics.Rmd

Comment thread vignettes/MSstatsMetabolomics.Rmd Outdated
```
# __MSstats: Metabolomics workflow with MZMine__

Author: MSstats Team

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put your name here

Comment thread vignettes/MSstatsMetabolomics.Rmd Outdated
correspond to MSI Level 2 putative identifications (Sumner et al., 2007).
* __SIRIUS names__ come from in-silico structure prediction and correspond to
MSI Level 3 identifications. The SIRIUS pass extends discovery coverage to
features the spectral library does not cover.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"features the MZMine spectral library does not cover"

head(mzmine_msstats)
```

`ProteinName` is assigned per feature in priority order: (1) the

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also mention that ProteinName refers to the compound name (and that this will be changed in the future to use Analyte as that column instead)

Comment thread vignettes/MSstatsMetabolomics.Rmd Outdated
Comment on lines +109 to +116
### Lactate caveat

Lactate (feature 3) is missing one of its four measurements in this fixture, so its
differential result is unreliable. That value is dropped rather than estimated, so Lactate is
tested on three points and its degrees of freedom fall to 1, against 2 for the fully measured
compounds. With so little data the variance estimate is unstable, which is why Lactate shows a
very small standard error, a large t-statistic, and the only small p-value in the table. Treat
it as an artifact of the tiny example, not a real difference.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should appear in the end after differential abundance analysis is performed (i.e. after groupComparison)

Comment thread vignettes/MSstatsMetabolomics.Rmd Outdated
head(summarized$ProteinLevelData)
```

The settings above mirror the `MSstatsWorkflow` vignette: log-2 transform,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say they mirror a typical discovery proteomics workflow

Comment thread vignettes/MSstatsMetabolomics.Rmd Outdated

## __5. Test for differences with `groupComparison`__

With two conditions in the design, a single Control-vs-Treatment contrast

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would construct the contrast matrix here rather than using "pairwise" just to show people how it's done.

Each row of `ComparisonResult` is one compound (or `m/z_RT` fallback) tested
against the contrast. Columns of interest: `log2FC`, `pvalue`, and
`adj.pvalue`. The `issue` column flags compounds that could not be tested
normally, for example one missing from an entire condition; in this small

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue is NA for every compound shown you mean?

Comment thread vignettes/MSstatsMetabolomics.Rmd Outdated

## __6. Visualization__

Profile plots show feature-level intensities alongside the protein-level

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compound-level summary, not protein-level summary.

Comment thread vignettes/MSstatsMetabolomics.Rmd Outdated

For a study-wide view of fold-change versus significance, pass the
`groupComparison` result to `groupComparisonPlots`. On a four-sample fixture
the volcano is sparse; on a real metabolomics dataset it is the standard

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a real metabolomics dataset it is the standard summary plot - this can be removed.

Comment thread vignettes/MSstatsMetabolomics.Rmd Outdated
Comment on lines +142 to +144
contrast_matrix = matrix(c(1, -1), nrow = 1)
colnames(contrast_matrix) = c("Control", "Treatment")
rownames(contrast_matrix) = "Control vs Treatment"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make -1 control and 1 for treatment. Then the comparison name is treatment vs control

Comment thread vignettes/MSstatsMetabolomics.Rmd Outdated
Comment on lines +158 to +165
### Lactate caveat

Lactate (feature 3) is missing one of its four measurements in this fixture, so its
differential result is unreliable. That value is dropped rather than estimated, so Lactate is
tested on three points and its degrees of freedom fall to 1, against 2 for the fully measured
compounds. With so little data the variance estimate is unstable, which is why Lactate shows a
very small standard error, a large t-statistic, and the only small p-value in the table. Treat
it as an artifact of the tiny example, not a real difference.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, let's get rid of this for now, people will run their own datasets anyways so this caveat may confuse them more.

@tonywu1999 tonywu1999 merged commit 87f788e into devel Jun 24, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the MSstats/work/20260617_metabolomics_vignette branch June 24, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants