Add MZMine metabolomics vignette and re-export MZMinetoMSstatsFormat#211
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthrough
ChangesMZMinetoMSstatsFormat re-export and metabolomics vignette
Sequence Diagram(s)sequenceDiagram
participant User
participant MSstatsConvert
participant MSstats
User->>MSstatsConvert: system.file() — retrieve MZMine, annotation, library, SIRIUS CSVs
User->>MSstatsConvert: MZMinetoMSstatsFormat(mzmine, annotation, library, sirius)
MSstatsConvert-->>User: MSstats-format data frame
User->>MSstats: dataProcess(converted, logTrans=2, normalization="equalizeMedians", MBimpute=TRUE)
MSstats-->>User: FeatureLevelData + ProteinLevelData
User->>MSstats: groupComparison(contrast.matrix, summarized)
MSstats-->>User: ComparisonResult (log2FC, pvalue, adj.pvalue, issue)
User->>MSstats: dataProcessPlots(type="ProfilePlot")
User->>MSstats: groupComparisonPlots(type="VolcanoPlot", eval=FALSE)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Failed to generate code suggestions for PR |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
vignettes/MSstatsMetabolomics.Rmd (1)
39-39: 💤 Low valueConsider adding a reference for the MSI levels citation.
Line 39 references "Sumner et al., 2007" for the MSI (Metabolomics Standards Initiative) identification levels. While informal citations are acceptable in vignettes, adding a brief reference or URL would help readers locate the source document if they want to learn more about the classification system.
📚 Optional addition
You could add a references section at the end:
## __References__ Sumner, L.W., Amberg, A., Barrett, D., et al. (2007). Proposed minimum reporting standards for chemical analysis. _Metabolomics_, 3, 211-221. https://doi.org/10.1007/s11306-007-0082-2Or simply add the DOI inline:
- correspond to MSI Level 2 putative identifications (Sumner et al., 2007). + correspond to MSI Level 2 putative identifications (Sumner et al., 2007, https://doi.org/10.1007/s11306-007-0082-2).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vignettes/MSstatsMetabolomics.Rmd` at line 39, Add a formal reference for the Sumner et al., 2007 citation mentioned in relation to MSI Level 2 identification levels. Either create a References section at the end of the vignette document with the complete citation details (authors, year, title, journal, volume, pages, and DOI), or add the DOI inline with the existing citation at the location where MSI levels are mentioned. This will help readers locate the source document and understand the classification system better.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@vignettes/MSstatsMetabolomics.Rmd`:
- Line 39: Add a formal reference for the Sumner et al., 2007 citation mentioned
in relation to MSI Level 2 identification levels. Either create a References
section at the end of the vignette document with the complete citation details
(authors, year, title, journal, volume, pages, and DOI), or add the DOI inline
with the existing citation at the location where MSI levels are mentioned. This
will help readers locate the source document and understand the classification
system better.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 216978bb-d16b-4477-bf79-31e524d04a99
📒 Files selected for processing (4)
NAMESPACER/converters.Rman/reexports.Rdvignettes/MSstatsMetabolomics.Rmd
| ``` | ||
| # __MSstats: Metabolomics workflow with MZMine__ | ||
|
|
||
| Author: MSstats Team |
There was a problem hiding this comment.
You can put your name here
| correspond to MSI Level 2 putative identifications (Sumner et al., 2007). | ||
| * __SIRIUS names__ come from in-silico structure prediction and correspond to | ||
| MSI Level 3 identifications. The SIRIUS pass extends discovery coverage to | ||
| features the spectral library does not cover. |
There was a problem hiding this comment.
"features the MZMine spectral library does not cover"
| head(mzmine_msstats) | ||
| ``` | ||
|
|
||
| `ProteinName` is assigned per feature in priority order: (1) the |
There was a problem hiding this comment.
Should also mention that ProteinName refers to the compound name (and that this will be changed in the future to use Analyte as that column instead)
| ### Lactate caveat | ||
|
|
||
| Lactate (feature 3) is missing one of its four measurements in this fixture, so its | ||
| differential result is unreliable. That value is dropped rather than estimated, so Lactate is | ||
| tested on three points and its degrees of freedom fall to 1, against 2 for the fully measured | ||
| compounds. With so little data the variance estimate is unstable, which is why Lactate shows a | ||
| very small standard error, a large t-statistic, and the only small p-value in the table. Treat | ||
| it as an artifact of the tiny example, not a real difference. |
There was a problem hiding this comment.
This should appear in the end after differential abundance analysis is performed (i.e. after groupComparison)
| head(summarized$ProteinLevelData) | ||
| ``` | ||
|
|
||
| The settings above mirror the `MSstatsWorkflow` vignette: log-2 transform, |
There was a problem hiding this comment.
I'd say they mirror a typical discovery proteomics workflow
|
|
||
| ## __5. Test for differences with `groupComparison`__ | ||
|
|
||
| With two conditions in the design, a single Control-vs-Treatment contrast |
There was a problem hiding this comment.
I would construct the contrast matrix here rather than using "pairwise" just to show people how it's done.
| Each row of `ComparisonResult` is one compound (or `m/z_RT` fallback) tested | ||
| against the contrast. Columns of interest: `log2FC`, `pvalue`, and | ||
| `adj.pvalue`. The `issue` column flags compounds that could not be tested | ||
| normally, for example one missing from an entire condition; in this small |
There was a problem hiding this comment.
issue is NA for every compound shown you mean?
|
|
||
| ## __6. Visualization__ | ||
|
|
||
| Profile plots show feature-level intensities alongside the protein-level |
There was a problem hiding this comment.
compound-level summary, not protein-level summary.
|
|
||
| For a study-wide view of fold-change versus significance, pass the | ||
| `groupComparison` result to `groupComparisonPlots`. On a four-sample fixture | ||
| the volcano is sparse; on a real metabolomics dataset it is the standard |
There was a problem hiding this comment.
on a real metabolomics dataset it is the standard summary plot - this can be removed.
| contrast_matrix = matrix(c(1, -1), nrow = 1) | ||
| colnames(contrast_matrix) = c("Control", "Treatment") | ||
| rownames(contrast_matrix) = "Control vs Treatment" |
There was a problem hiding this comment.
I'd make -1 control and 1 for treatment. Then the comparison name is treatment vs control
| ### Lactate caveat | ||
|
|
||
| Lactate (feature 3) is missing one of its four measurements in this fixture, so its | ||
| differential result is unreliable. That value is dropped rather than estimated, so Lactate is | ||
| tested on three points and its degrees of freedom fall to 1, against 2 for the fully measured | ||
| compounds. With so little data the variance estimate is unstable, which is why Lactate shows a | ||
| very small standard error, a large t-statistic, and the only small p-value in the table. Treat | ||
| it as an artifact of the tiny example, not a real difference. |
There was a problem hiding this comment.
Thinking about it, let's get rid of this for now, people will run their own datasets anyways so this caveat may confuse them more.
Motivation and Context
Please include relevant motivation and context of the problem along with a short summary of the solution.
Changes
Please provide a detailed bullet point list of your changes.
Testing
Please describe any unit tests you added or modified to verify your changes.
Checklist Before Requesting a Review
Motivation and Context
MSstats needs a clear, end-to-end example for untargeted metabolomics workflows starting from MZMine (feature quantification with spectral-library compound names) and SIRIUS (structure-identification outputs). This PR makes the MZMine → MSstats conversion entry point (
MZMinetoMSstatsFormat) directly available from the MSstats namespace and adds a vignette that demonstrates the full workflow: conversion, compound-level summarization, differential abundance testing, and visualization—using provided MSstatsConvert fixtures and explaining howProteinNameis assigned when MZMine matches are missing.Detailed Changes
NAMESPACEMZMinetoMSstatsFormat.importFrom(MSstatsConvert, MZMinetoMSstatsFormat)so MSstats users can call the function without separately attachingMSstatsConvert.R/converters.RMZMinetoMSstatsFormatis available under theMSstatsConvert::MZMinetoMSstatsFormatcompatibility layer from within the MSstats package.man/reexports.RdMZMinetoMSstatsFormatin theMSstatsConvertconversion functions list (new\alias{MZMinetoMSstatsFormat}and corresponding documentation entry).vignettes/MSstatsMetabolomics.Rmdvignettes/MSstatsMetabolomics.Rmd) covering:MZMinetoMSstatsFormat(..., use_log_file = FALSE).ProteinNameassignment precedence:m/z_RTfallback identifierdataProcessfor compound-level summarization with:logTrans = 2normalization = "equalizeMedians"featureSubset = "all"summaryMethod = "TMP"censoredInt = "NA"MBimpute = TRUEuse_log_file = FALSEgroupComparison(use_log_file = FALSE), with explanation of result interpretation via key columns such aslog2FC,pvalue,adj.pvalue, andissue(compounds not testable normally).dataProcessPlots(ProfilePlot) andgroupComparisonPlots(VolcanoPlot, witheval = FALSE).issueisNAin the shown fixture.Unit Tests
Coding Guidelines