CEll LIne OmicS processor — extract omics data into integrated activity datasets for Boolean model calibration.
Used in the DrugLogics and TRAFIKK pipelines.
✅ Node Dictionary (node_HGNC_dict.csv)
- Network nodes mapped to HGNC gene symbols
- Enables consistent gene annotation
✅ Activity Master Matrix (activity_master_matrix.csv)
- All omics data sources combined (mutations, CNV, TF, expression) for each node in the network across cell lines
- Format: nodes × cell_lines__data_source
✅ Priority-Filtered Activity (activity_from_master.csv)
- One activity value per node-cell line (highest priority source)
- Ready for downstream analysis
✅ Per-Cell-Line Training Files (optional)
- DrugLogics or Trafikk compatible format
- Saved to
cellfiles_dir/if specified
Install:
pip install -e .Run pipeline:
celios run --config config.yaml --verboseGet help:
celios --helpDetailed documentation is in the documentation markdown files:
| Document | Purpose |
|---|---|
| QUICKSTART.md | 5-minute quick reference with common commands |
| INSTALL.md | Installation guide, virtual environments, troubleshooting |
| USAGE.md | Full usage guide, advanced examples, and API reference |
| CONFIGURATION.md | Configuration reference with all options |
| OUTPUTS.md | Output file formats and interpretation |
| notebooks/ | Interactive Jupyter notebooks with examples |
- Step 1: Node dictionary generation from biological networks (SIF format)
- Step 2: Cell-line identifier resolution and tissue-aware organization
- Step 3: Multi-source omics integration (mutations, CNV, TF activity, expression)
- Step 4: DrugLogics and Trafikk pipeline training file generation (specific calibration file used in the Gitsbe module)
- Format support: Legacy CCLE (genes × SIDM) and 26Q1 (ModelID × genes) formats
- Configuration-driven: JSON or YAML configs for easy reproducibility
- Tissue organization: Optional per-tissue output structure for DrugLogics compatibility