Package: SpatialCellChat (CellChat v3)
Data: 10x Genomics Xenium, ~36,000 cells, ~5,000 genes
Environment: 32 CPUs, 125 GB RAM, 60 GB disk
Description
I am running SpatialCellChat on a Xenium region of interest (~36,000 cells, 5,000 genes) and finding that identifyOverExpressedGenes takes several hours to complete — if it completes at all.
I have parallelized my environment using the future package, but it is unclear whether identifyOverExpressedGenes in SpatialCellChat v3 actually dispatches work through future. In v2, this function does use future-based parallelism, but given the new per-cell resolution infrastructure in v3 and the added MERINGUE/ALRA steps, I'm not sure whether that is still the case.
Steps to reproduce
library(SpatialCellChat)
library(patchwork)
options(stringsAsFactors = FALSE)
# Normalize counts (log1p library-size normalization)
# Skip if your matrix is already normalized
counts_norm <- normalizeData(counts)
# Create SpatialCellChat object directly from matrix — no Seurat needed
chat <- createSpatialCellChat(
object = counts_norm, # genes × cells normalized matrix
meta = meta, # data.frame with cell annotations
group.by = annotation_col, # column name set above
datatype = "spatial",
coordinates = spatial.locs,
spatial.factors = spatial.factors
)
CellChatDB <- CellChatDB.human
# Subset to protein-based signaling categories
CellChatDB.use <- subsetDB(
CellChatDB,
search = c("Secreted Signaling", "ECM-Receptor", "Cell-Cell Contact"),
non_protein = FALSE
)
# Attach database to the CellChat object
chat@DB <- CellChatDB.use
# Subset to signaling genes only (required step)
chat <- subsetData(chat)
chat <- preProcessing(chat)
# Identify over-expressed ligands / receptors.
chat <- identifyOverExpressedGenes(
chat,
selection.method = "moransi" #<-- extremely slow/does not complete with meringue
)
What I have tried
- MERINGUE for spatial neighborhood detection — analysis never completed
- Moran's I as alternative (
spatial.factors parameter) — completed once after several hours, but subsequent steps (computeCommunProb etc.) were also prohibitively slow
- Parallelization via
future — set using plan("multisession", workers = 30). I see no evidence that the parallelization is being picked up by identifyOverExpressedGenes.
Questions
- Does
identifyOverExpressedGenes in SpatialCellChat v3 support future-based parallelism? If so, is there anything specific to the spatial/per-cell mode that prevents it from being picked up?
- Is
sketchData() (available in CellChat v2 for downsampling) compatible with SpatialCellChat v3? This seems like an important workaround for large imaging-based datasets, but it's not mentioned in the v3 tutorial.
- For imaging-based data at this scale (tens of thousands of cells from a single Xenium ROI), what is the expected runtime on typical hardware? Is 36k cells within the intended scope of SpatialCellChat v3?
- Is there a recommended per-cell type downsampling strategy (e.g., via
subset() in Seurat) that is compatible with the spatial coordinate requirements of v3?
Additional context
- I did not encounter memory errors, suggesting RAM is not the bottleneck; the process appears to be CPU-bound
- This appears related to the per-cell resolution design of v3 — at 36k cells, the number of cell-cell pairs that need to be evaluated is orders of magnitude larger than in a cluster-level analysis
Package: SpatialCellChat (CellChat v3)
Data: 10x Genomics Xenium, ~36,000 cells, ~5,000 genes
Environment: 32 CPUs, 125 GB RAM, 60 GB disk
Description
I am running SpatialCellChat on a Xenium region of interest (~36,000 cells, 5,000 genes) and finding that
identifyOverExpressedGenestakes several hours to complete — if it completes at all.I have parallelized my environment using the
futurepackage, but it is unclear whetheridentifyOverExpressedGenesin SpatialCellChat v3 actually dispatches work throughfuture. In v2, this function does usefuture-based parallelism, but given the new per-cell resolution infrastructure in v3 and the added MERINGUE/ALRA steps, I'm not sure whether that is still the case.Steps to reproduce
What I have tried
spatial.factorsparameter) — completed once after several hours, but subsequent steps (computeCommunProbetc.) were also prohibitively slowfuture— set usingplan("multisession", workers = 30). I see no evidence that the parallelization is being picked up byidentifyOverExpressedGenes.Questions
identifyOverExpressedGenesin SpatialCellChat v3 supportfuture-based parallelism? If so, is there anything specific to the spatial/per-cell mode that prevents it from being picked up?sketchData()(available in CellChat v2 for downsampling) compatible with SpatialCellChat v3? This seems like an important workaround for large imaging-based datasets, but it's not mentioned in the v3 tutorial.subset()in Seurat) that is compatible with the spatial coordinate requirements of v3?Additional context