Submission for Global AI Hackathon'25 by Elucidata
This project tackles a real-world biomedical machine learning problem: predicting the spatial abundance of 35 cell types from high-resolution H&E-stained histology images using deep learning.
Spatial transcriptomics is powerful but expensive. Histology is cheap but coarse. The challenge is to bridge the two with AI — mapping visual signals to molecular insights.
- Provided via Kaggle (Elucidata Global AI Hackathon 2025)
- 6 training slides, 1 test slide
- For each slide:
- HE image (float32 RGB)
- ~2000 spatial transcriptomic spots
- Each spot has abundance data for 35 cell types (C1–C35)
- Visualized tissue coverage and spot distribution
- Mapped cell-type heatmaps over tissue
- Found biologically meaningful spatial correlations
- Previewed high-abundance patches per cell type
- ResNet18 pretrained on ImageNet
- Modified for 35-channel regression output
- Trained with MSE loss (can upgrade to Spearman rank loss)
- Extracted 224x224 patches per spot
- Load
.h5data and extract patches - Train CNN on spot-wise cell composition
- Predict on test slide S_7
- Export
submission.csvfor Kaggle
- Baseline model achieved %
- Strongest correlations with structural cell types
- Future work: try EfficientNet, ViT, and integrate multimodal embeddings
- Python, PyTorch, torchvision, h5py, pandas, matplotlib
- Jupyter Notebooks for reproducibility
Include screenshots:
- Spot overlay on slide
- Heatmaps of cell type abundance
- Correlation matrix
- Top-k patch previews
Includes a valid submission.csv for Kaggle.
Want to see the code? Check out: