Skip to content

Conversation

Copy link

Copilot AI commented Jan 14, 2026

Adds two comprehensive vignettes demonstrating real-world meta-analysis workflows for gene expression studies in diabetic nephropathy and urothelial cancer.

Vignettes Added

vignettes/case-diabetic-nephropathy.Rmd - Meta-analysis workflow covering:

  • Multi-term search strategies via geo_search() (DN + DKD terminology)
  • Metadata database construction with geo_meta()
  • Quality filtering (sample size, platform type, study relevance)
  • Visualization of temporal distribution, sample sizes, and platform usage
  • Curated dataset preparation for downstream analysis

vignettes/case-urothelial-cancer.Rmd - Expression analysis workflow covering:

  • Combined bladder/urothelial cancer dataset discovery
  • Stringent filtering (≥20 samples, microarray only)
  • Expression matrix retrieval via geo_matrix()
  • Log2 transformation with log_trans()
  • Exploratory analysis (PCA, expression distributions, correlation heatmaps)
  • Cross-dataset integration considerations

Key Patterns Demonstrated

# Multi-strategy search with deduplication
dn_search_terms <- c(
  "diabetic nephropathy[ALL] AND Homo sapiens[ORGN] AND GSE[ETYP]",
  "diabetic kidney disease[ALL] AND Homo sapiens[ORGN] AND GSE[ETYP]"
)
dn_gse <- unique(dplyr::bind_rows(lapply(dn_search_terms, geo_search)))

# Metadata database construction
dn_metadb <- geo_meta(dn_gse_filtered[["Series Accession"]], odir = "dn_metadb")

# Expression data download and transformation
expression_sets <- lapply(gse_ids, function(id) geo_matrix(id, odir = tempdir()))
expression_sets_log <- lapply(expression_sets, log_trans)

Implementation Notes

  • All download-dependent chunks use eval = FALSE to avoid build-time network access
  • Search chunks use cache = TRUE for efficiency
  • Uses only existing Suggests dependencies (added ggplot2, tidyr to DESCRIPTION)
  • Backward compatible with older dplyr/ggplot2 versions (avoids if_any(), uses size not linewidth)
  • GSE numbers used as chronological proxy where actual submission dates unavailable
Original prompt

Summary

Add two comprehensive case study vignettes demonstrating real-world applications of the geokit package for gene expression meta-analysis.

Vignettes to Add

1. Diabetic Nephropathy Meta-Analysis (vignettes/case-diabetic-nephropathy.Rmd)

This vignette demonstrates a meta-analysis workflow for diabetic nephropathy gene expression datasets:

  • Search functionality: Use geo_search() with multiple query terms to identify DN datasets exclusively through NCBI E-Utils interface
  • Build metadb: Fetch detailed metadata using geo_meta() to create a customized metadata database
  • Filter and categorize: Apply criteria for platform type, sample size, and study relevance
  • Visualization: Timeline of dataset submissions, sample size distributions, platform trends using ggplot2
  • Prepare for meta-analysis: Curate list of datasets suitable for downstream integration

Key code patterns to include:

# Multiple search strategies
dn_search_terms <- c(
  "diabetic nephropathy[ALL] AND Homo sapiens[ORGN] AND GSE[ETYP]",
  "diabetic kidney disease[ALL] AND Homo sapiens[ORGN] AND GSE[ETYP]"
)
dn_gse_list <- lapply(dn_search_terms, geo_search)
dn_gse <- unique(dplyr::bind_rows(dn_gse_list))

# Build metadb
dn_metadb <- geo_meta(dn_gse_filtered[["Series Accession"]], odir = "dn_metadb")

2. Urothelial Cancer Example (vignettes/case-urothelial-cancer.Rmd)

This vignette demonstrates extracting, filtering, downloading, and analyzing urothelial cancer expression data:

  • Dataset discovery: Search for bladder/urothelial cancer datasets
  • Filtering: Select datasets based on sample size (≥20), platform type (microarray), and relevance
  • Download: Retrieve expression matrices using geo_matrix()
  • Data integration: Log2 transformation, handling multiple datasets
  • Exploratory analysis:
    • Expression distribution boxplots
    • Principal Component Analysis (PCA)
    • Sample correlation heatmaps
    • Summary statistics
  • Visualization: All plots using ggplot2 with publication-quality formatting

Key code patterns to include:

# Search and combine
uc_searches <- list(
  bladder = geo_search("bladder cancer[ALL] AND Homo sapiens[ORGN] AND GSE[ETYP]"),
  urothelial = geo_search("urothelial cancer[ALL] AND Homo sapiens[ORGN] AND GSE[ETYP]")
)
uc_gse <- unique(dplyr::bind_rows(uc_searches))

# Download and analyze
expression_sets <- lapply(demo_gse, function(gse_id) {
  geo_matrix(gse_id, odir = tempdir())
})

# Log transformation
expression_sets_log <- lapply(expression_sets, log_trans)

Technical Requirements

  1. Both vignettes should follow the existing vignette style in the repository (see vignettes/geometadb.Rmd for reference)
  2. Use standard R Markdown vignette headers with proper VignetteIndexEntry
  3. Include knitr::opts_chunk$set() for consistent formatting
  4. Use cache = TRUE for long-running chunks
  5. Use eval = FALSE for code that requires actual downloads in non-demo sections
  6. Include sessionInfo() at the end
  7. Dependencies: geokit, dplyr, tidyr, stringr, ggplot2, patchwork (for combining plots)

File Locations

  • vignettes/case-diabetic-nephropathy.Rmd
  • vignettes/case-urothelial-cancer.Rmd

This pull request was created from Copilot chat.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 5 commits January 14, 2026 13:41
…nd urothelial cancer

Co-authored-by: ShixiangWang <25057508+ShixiangWang@users.noreply.github.com>
Co-authored-by: ShixiangWang <25057508+ShixiangWang@users.noreply.github.com>
…documentation, and simplify missing value calculation

Co-authored-by: ShixiangWang <25057508+ShixiangWang@users.noreply.github.com>
… GSE number as time proxy

Co-authored-by: ShixiangWang <25057508+ShixiangWang@users.noreply.github.com>
…l() and using size instead of linewidth

Co-authored-by: ShixiangWang <25057508+ShixiangWang@users.noreply.github.com>
Copilot AI changed the title [WIP] Add case study vignette for diabetic nephropathy meta-analysis Add diabetic nephropathy and urothelial cancer case study vignettes Jan 14, 2026
Copilot AI requested a review from ShixiangWang January 14, 2026 13:50
@ShixiangWang ShixiangWang marked this pull request as ready for review January 14, 2026 13:57
@ShixiangWang ShixiangWang requested a review from Yunuuuu January 14, 2026 13:57
@codecov
Copy link

codecov bot commented Jan 14, 2026

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants