GreS is a spatial domain identification framework that incorporates gene-level semantic priors into spatial representation learning. GreS models spatial domain organization from three complementary perspectives: physical spatial proximity, transcriptomic similarity, and semantic similarity derived from gene function. To capture these relationships, GreS constructs a spatial graph, a feature graph, and a semantic graph, which are encoded by three parallel GCN branches and fused into a unified spot representation.
Key features:
- 🧠 Gene-Level Semantic Priors: Builds spot-level semantic descriptors by encoding gene descriptions and aggregating gene semantics according to each spot's expression profile.
- 🕸️ Three Complementary Graphs: Constructs a spatial graph (physical proximity), a feature graph (transcriptomic similarity), and a semantic graph (functional similarity).
- 🔀 Three Parallel GCN Branches: Encodes the three views with separate GCN encoders and integrates them via a branch-weighted fusion mechanism.
- 📉 ZINB Autoencoding: Reconstructs input gene expression with a zero-inflated negative binomial (ZINB) decoder to handle sparsity and noise.
A runnable end-to-end walkthrough is also available in
tutorial.ipynb.
We provide a requirements file for quick environment setup:
pip install -r environment/requirements_sc.txtGreS requires pretrained gene embeddings and their vocabulary. Please download them from our Hugging Face repository and place them under embedding/text_embedd_large/:
GreS/
├── embedding/
│ └── text_embedd_large/
│ ├── pretrained_gene_embeddings.pt
│ └── vocab.json
└── ...
Each dataset is identified by a dataset_id. Place the raw data under data/raw_h5ad/<dataset_id>/.
For 10x Visium data, the directory should contain:
filtered_feature_bc_matrix.h5: raw gene expression counts.spatial/: spatial metadata (tissue positions, scale factors, images).metadata.tsv: per-spot annotations, including the ground-truth label column.
Example (DLPFC sample 151672):
data/raw_h5ad/151672/
├── filtered_feature_bc_matrix.h5
├── spatial/
└── metadata.tsv
An .h5ad file containing adata.X (raw counts), adata.obsm['spatial'], and a label column is also supported.
The full pipeline has three steps. All commands are run from the project root, and we use DLPFC sample 151672 as an example.
Generate a graph-augmented data.h5ad from the raw data with preprocess/generate_data.py.
python preprocess/generate_data.py --dataset_id 151672 --label_column layer_guess_reordered| Argument | Description | Default |
|---|---|---|
--dataset_id |
Dataset ID; locates the input and names the output | (required) |
--label_column |
Label column in metadata.tsv (DLPFC uses layer_guess_reordered) |
ground_truth |
Build a per-spot semantic embedding from data.h5ad and the pretrained gene embeddings with preprocess/generate_raw_gene_concat_spot_embedding.py. Results are written to data/npys_grn_raw_concat/.
python preprocess/generate_raw_gene_concat_spot_embedding.py \
--dataset_id 151672 \
--embedding embedding/text_embedd_large/pretrained_gene_embeddings.pt \
--vocab embedding/text_embedd_large/vocab.jsonThis produces embeddings_<dataset_id>.npy (the spot semantic embeddings used during training), together with an _attribution.npz and a _stats.json file.
Train the model and cluster the spots with tools/train.py. Hyper-parameters are read from config/<config_name>.ini (default DLPFC).
python tools/train.py --dataset_id 151672Results are saved in data/results/<dataset_id>/:
<dataset_id>.h5ad: AnnData with clustering results; cluster labels are stored inobs['idx'].<dataset_id>_clusters.png: spatial cluster plot.
GreS/
├── config/ # Configuration files
├── data/
│ ├── raw_h5ad/ # Input data, one folder per dataset_id
│ ├── generated/ # Preprocessing output (data.h5ad, graphs)
│ ├── npys_grn_raw_concat/# Generated spot semantic embeddings
│ └── results/ # Training results (h5ad + cluster plots)
├── embedding/
│ └── text_embedd_large/ # Pretrained gene embeddings and vocabulary
├── preprocess/
│ ├── generate_data.py # Step 1: build data.h5ad + graphs
│ ├── generate_raw_gene_concat_spot_embedding.py# Step 2: build spot semantic embeddings
│ └── config.py
├── tools/
│ ├── model.py # GreS model architecture
│ ├── train.py # Step 3: training & clustering
│ └── utils.py
├── fig/ # Figure assets
├── tutorial.ipynb # End-to-end tutorial
└── README.md
