Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
name: Docs

on:
push:
branches:
- master
workflow_dispatch:

permissions:
contents: read
pages: write
id-token: write

concurrency:
group: github-pages
cancel-in-progress: true

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6

- uses: actions/setup-python@v6
with:
python-version: "3.x"

- name: Configure Pages
uses: actions/configure-pages@v6

- name: Install docs dependencies
run: |
python -m pip install --upgrade pip
python -m pip install mkdocs mkdocs-material pymdown-extensions

- name: Build site
run: |
mkdocs build --clean

- name: Upload Pages artifact
uses: actions/upload-pages-artifact@v5
with:
path: site

deploy:
runs-on: ubuntu-latest
needs: build
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- id: deployment
uses: actions/deploy-pages@v5
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
[![License](https://img.shields.io/badge/License-BSD\%202--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)
[![Build Status](https://app.travis-ci.com/AlexandrovLab/SigProfilerSimulator.svg?branch=master)](https://app.travis-ci.com/AlexandrovLab/SigProfilerSimulator)

<img src="docs/assets/images/SigProfilerSimulator.png" alt="SigProfilerSimulator" width="1000"/>

# SigProfilerSimulator
SigProfilerSimulator allows realistic simulations of mutational signatures in cancer genomes. The tool can simulate signatures of single base substitutions, double base substitutions, and insertions/deletions across the whole genome or user-defined regions. SigProfilerSimulator makes use of [SigProfilerMatrixGenerator](https://github.com/SigProfilerSuite/SigProfilerMatrixGenerator) and [SigProfilerPlotting](https://github.com/SigProfilerSuite/SigProfilerPlotting), seamlessly integrating with other tools in [SigProfilerSuite](https://github.com/SigProfilerSuite).

Expand Down Expand Up @@ -50,4 +52,4 @@ Released Jan 2011. Last updated March 2012. This genome was downloaded from ENSE
rn6 (Rnor_6.0) INSDC Assembly GCA_000001895.4, Jul 2014. Released Jun 2015. Last updated Jan 2017.
This genome was downloaded from ENSEMBL database version 96.6.

yeast (Saccharomyces cerevisiae S288C; assembly R64-2-1). Released Nov 2014.
yeast (Saccharomyces cerevisiae S288C; assembly R64-2-1). Released Nov 2014.
8 changes: 4 additions & 4 deletions ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ jobs:
python-version: ['3.9', '3.14']

steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python-version }}

Expand All @@ -38,7 +38,7 @@ jobs:
python -m pip install --upgrade pip setuptools packaging

- name: Cache src directory
uses: actions/cache@v4
uses: actions/cache@v5
with:
path: ${{ github.workspace }}/src/
key: ${{ runner.os }}-src-grch37
Expand Down
74 changes: 74 additions & 0 deletions docs/1_installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Installation


----------


This section will help you set up the necessary software and packages required to run SigProfilerSimulator.

----------


## Prerequisites ##

- [Python][1] version >= 3.9
- [SigProfilerMatrixGenerator][2] with a downloaded reference genome
- Other dependencies are installed automatically during package installation

## Installation ##

SigProfilerSimulator can be executed on any Windows/macOS/Unix system. First follow the [SigProfilerMatrixGenerator][2] guide for installing `Python` and `pip`. Next, follow the instructions below for the latest stable release or the current GitHub version.

### Installation with `pip` ###

Install the latest `SigProfilerSimulator` PyPI version using `pip`:
```
$ pip install SigProfilerSimulator
```

To upgrade an existing installation to the most recent version:
```
$ pip install SigProfilerSimulator --upgrade
```

### Install specific GitHub Release ###

First, download the [zip file][3] or clone the GitHub repository:
```
$ git clone https://github.com/SigProfilerSuite/SigProfilerSimulator.git
```

Next, enter the downloaded directory and install the package:
```
$ cd SigProfilerSimulator
$ pip install .
```

## Download Reference Genome ##

SigProfilerSimulator requires a reference genome to perform simulations. To install the reference genome/s, use [SigProfilerMatrixGenerator][2].

The last PyPI [SigProfilerMatrixGenerator][2] version is installed with SigProfilerSimulator by default. Install your desired reference genome from the command line/terminal as follows.

### Installation from command line ###

```
$ SigProfilerMatrixGenerator install GRCh37
```

### Installation from Python terminal ###

``` python
$ python
>>> from SigProfilerMatrixGenerator import install as genInstall
>>> genInstall.install('GRCh37', rsync=False, bash=True)
```

If you have a firewall on your server, you may need to install `rsync` and use the `rsync=True` parameter. If bash is not available, use `bash=False`.

For a full list of supported reference genomes, refer to the [Supported Genomes][4] section.

[1]: https://www.python.org/downloads
[2]: https://sigprofilersuite.github.io/SigProfilerMatrixGenerator/
[3]: https://github.com/SigProfilerSuite/SigProfilerSimulator/releases
[4]: https://sigprofilersuite.github.io/SigProfilerSimulator/6_supported_genomes.html
54 changes: 54 additions & 0 deletions docs/2_quick_start_example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Quick Start Example


----------


This section provides a minimal example to get started with SigProfilerSimulator. The following example generates 100 simulations from a VCF input using the GRCh37 reference genome and the SBS-96 context.

----------

## Prerequisites ##

This tutorial requires that you have completed all steps in the [installation guide][1], specifically:

- Installed SigProfilerSimulator
- Downloaded the **GRCh37** reference genome using [SigProfilerMatrixGenerator][2]

## Input data ##

SigProfilerSimulator accepts four input file formats: VCF, MAF, simple text file, and ICGC format. Input files must be placed in an `input/` subdirectory within the project folder:

```
path/to/project/
└── input/
├── sample1.vcf
└── sample2.vcf
```

## Running SigProfilerSimulator ##

Start a Python interactive shell and import SigProfilerSimulator:

``` python
$ python
>>> from SigProfilerSimulator import SigProfilerSimulator as sigSim
```

Run the simulator on your data. **Note**: Update `"path/to/project/"` with the actual path to your project directory.

``` python
>>> sigSim.SigProfilerSimulator("my_project", "path/to/project/", "GRCh37",
contexts=["96"], simulations=100, chrom_based=True)
```

After SigProfilerSimulator has finished, the simulated mutation files will be placed in the `output/` subdirectory of your project folder, organized by context and simulation number.

## Additional Information ##

In the above example, unspecified parameters use their default values. All function arguments are described in detail in the [Using the Tool][3] section. For the full list of supported reference genomes, refer to the [Supported Genomes][4] section.

[1]: https://sigprofilersuite.github.io/SigProfilerSimulator/1_installation.html
[2]: https://sigprofilersuite.github.io/SigProfilerMatrixGenerator/
[3]: https://sigprofilersuite.github.io/SigProfilerSimulator/4_using_the_tool_input.html
[4]: https://sigprofilersuite.github.io/SigProfilerSimulator/6_supported_genomes.html
37 changes: 37 additions & 0 deletions docs/3_workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Workflow


----------


This section describes the methodology used by SigProfilerSimulator to generate realistic simulations of somatic mutations.

----------

## Overview ##

SigProfilerSimulator takes real somatic mutations as input and produces simulated samples by randomly redistributing those mutations across the genome. The redistribution is performed in an unbiased manner, preserving three biological properties of the original data:

- **Sequence context** — each mutation is placed in a position with the same local nucleotide context as the original
- **Transcriptional strand bias** — the strand orientation of each mutation relative to the direction of transcription is maintained
- **Chromosomal mutation burden** — the number of mutations assigned to each chromosome reflects the same proportional distribution as the input

This approach ensures that simulated samples are realistic null hypothesis models of the original mutation landscape, suitable as background distributions for downstream statistical analyses.

## Simulation Procedure ##

For each simulation, SigProfilerSimulator performs the following steps:

1. **Input parsing** — the input file (VCF, MAF, simple text, or ICGC format) is parsed and mutations are catalogued by sample, chromosome, and mutational context.

2. **Context distribution** — the genomic distribution of available positions for each mutation context is computed from the reference genome. If a BED file or exome restriction is provided, only the targeted regions are considered.

3. **Random placement** — each mutation is randomly assigned to a new position selected from the pool of positions sharing its original context. The number of mutations per chromosome is preserved by sampling within each chromosome independently.

4. **Output generation** — simulated mutations are written to MAF or VCF files (one per simulation). Parallel execution across chromosomes accelerates this step for large genomes.

## Use With SigProfilerClusters ##

Simulated datasets produced by SigProfilerSimulator are directly used as the background model in [SigProfilerClusters][1]. For this use case, the `chrom_based=True` parameter must be set to ensure per-chromosome normalisation of mutation burden.

[1]: https://sigprofilersuite.github.io/SigProfilerClusters/
80 changes: 80 additions & 0 deletions docs/4_using_the_tool_input.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Using SigProfilerSimulator - Input


----------


This section describes SigProfilerSimulator's main function and all available parameters.

----------

## Function ##

The main function in SigProfilerSimulator is `SigProfilerSimulator`. It randomizes the position of each somatic mutation across the genome while preserving the sequence context, transcriptional strand bias, and chromosomal mutational burden of the input data.

### Input files ###

SigProfilerSimulator accepts four input file formats:

- **VCF** — one file per sample.
- **MAF** — standard Mutation Annotation Format file.
- **Simple text file** — tab-delimited plain text format as described in [SigProfilerMatrixGenerator][1].
- **ICGC Format** — ICGC simple somatic mutation format.

Input files must be placed in an `input/` subdirectory within the project folder.

### Running the function ###

First, start a Python interactive shell and import SigProfilerSimulator:

``` python
$ python
>>> from SigProfilerSimulator import SigProfilerSimulator as sigSim
```

Then call the function with the required parameters:

``` python
>>> sigSim.SigProfilerSimulator(project, project_path, genome, contexts)
```

### Required parameters ###

| Parameter | Variable Type | Parameter Description |
|-----------|---------------|-----------------------|
| `project` | String | Unique name for the given project |
| `project_path` | String | Path to the project directory. The `input/` subfolder containing the mutation files must exist within this path |
| `genome` | String | Reference genome to use. Must be installed using [SigProfilerMatrixGenerator][1]. Supported genomes: GRCh37, GRCh38, mm9, mm10, rn6, yeast |
| `contexts` | List of Strings | Mutational contexts to simulate. Must be provided as a list (e.g., `["96"]`, `["96", "ID"]`). See the full list of supported contexts below |

### Optional parameters ###

| Parameter | Variable Type | Parameter Description |
|-----------|---------------|-----------------------|
| `simulations` | Integer | Number of simulations to generate. Default: `1` |
| `exome` | Boolean | Restrict simulations to exome regions. Default: `None` (whole genome) |
| `chrom_based` | Boolean | Normalize mutation burden on a per-chromosome basis. Recommended when using the output as background model for [SigProfilerClusters][2]. Default: `False` |
| `gender` | String | Determines whether the Y chromosome is included. Accepted values: `"female"` (default, Y excluded), `"male"` (Y included) |
| `bed_file` | String | Path to a BED file to restrict simulations to user-defined genomic regions. Default: `None` |
| `vcf` | Boolean | Output simulated mutations as VCF files. When `False`, output is in MAF format. Default: `False` |
| `seqInfo` | Boolean | Save the sequence context information for each simulated mutation. Default: `False` |
| `seed_file` | String | Path to a file containing seeds for reproducible simulations. Default: `None` |
| `noisePoisson` | Boolean | Add Poisson-distributed noise to the simulated mutations. Default: `False` |
| `noiseUniform` | Float | Add uniform noise to the simulated mutations. Default: `0` |
| `spacing` | Integer | Minimum spacing (in bp) enforced between simulated mutations. Default: `1` |
| `cushion` | Integer | Cushion (in bp) around the edges of BED file regions within which mutations will not be placed. Default: `100` |
| `overlap` | Boolean | Allow simulated mutations to overlap. Default: `False` |
| `updating` | Boolean | Update mutation types during simulation. Default: `False` |
| `region` | String | Restrict simulations to a single chromosome (e.g., `"1"`). Default: `None` |
| `mask` | String | Path to a mask file to exclude specific genomic regions from simulations. Default: `None` |

### Supported contexts ###

| Mutation type | Accepted context values |
|---------------|------------------------|
| Single Base Substitutions (SBS) | `"6"`, `"24"`, `"96"`, `"288"`, `"384"`, `"1536"`, `"6144"` |
| Insertions and Deletions (ID) | `"ID"`, `"ID415"` |
| Double Base Substitutions (DBS) | `"DBS"`, `"DBS186"` |

[1]: https://sigprofilersuite.github.io/SigProfilerMatrixGenerator/
[2]: https://sigprofilersuite.github.io/SigProfilerClusters/
Loading
Loading