Skip to content

Running on Mutational Data

Yo Akiyama edited this page Jun 24, 2021 · 5 revisions

Mutational Signatures

Decomposition of mutational signatures using signatureanalyzer. For a comprehensive description of mutational signatures, their relevance, and references, please see the Catalogue of Somatic Mutations in Cancer, or COSMIC, here. The following document is a reference for important considerations when running this method.


Objective Function

For mutational signatures, we assume a poisson distribution of counts and use Fevotte & Tan's derivation of a poisson objective function for ARD-NMF. Thus, it is important to use the default value for the objective function (poisson).

Use:

--objective poisson

Human Genome (Hg) Build

Select which human genome build to use for mapping. We build base contexts using a 2-bit representation of the genome build. These may be downloaded here:

  • hg38: wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit
  • hg19: wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit

Use:

--hg_build <PATH>/hg19.2bit

COSMIC Signatures

Signature Analyzer supports encoding of:

  • Single Base Substitution (SBS) Signatures (WGS: cosmic3, WES: cosmic3_exome)
  • Doublet Base Substitution (DBS) Signatures (DBS: cosmic3_DBS)
  • Small Insertion & Deletion (ID) Signatures (ID: cosmic3_ID)

PCAWG Signatures

Signature Analyzer supports encoding of:

  • 1536 Single Base Substitution (SBS) Signatures (pcawg_SBS)
  • Composite Signatures (1536 SBS: pcawg_COMPOSITE, 96 SBS: pcawg_COMPOSITE96)
  • SBS + ID Signatures (1536 SBS: pcawg_SBS_ID, 96 SBS: pcawg_SBS96_ID)

POLE/POLE-exo* + MSI Signatures

Signature Analyzer supports encoding of:

  • 1536 Single Base Substitution + Indel* Signatures
  • 96 Single Base Substitution + Indels* Signatures

*Indel Features: INS1, … , INS>=4, DEL1, … , DEL>=4

Use:

--reference {cosmic2,cosmic3,cosmic3_exome,cosmic3_DBS,cosmic3_ID,cosmic3_TSB,
             pcawg_SBS,pcawg_COMPOSITE,pcawg_COMPOSITE96,pcawg_SBS_ID,pcawg_SBS96_ID,
             polymerase_msi,polymerase_msi96}

Prior on H & W

We generally impose an exponential (L1) prior on the W & H matrices for non-negative matrix factorization.

Use:

--prior_on_H L1 --prior_on_W L1

Running the method

This method may be run in two ways, from a .maf file or a spectra file (.txt, .parquet, .txt.gz, .csv).

  • Mutation Annotation Format: for details on this format (.maf), please see this reference from NCI's Genomic Data Commons website
    • If this option is used, signatureanalyzer will generate a spectra using the .maf based on what --reference option is selected
  • Spectra: this option is provided if the user wants to provide a pre-computed mutational spectra (ex. 96-base context; see COSMIC site or Generating Mutational Spectra

Use:

signatureanalyzer -n 10 \
                  --reference cosmic3_exome \
                  --hg_build hg38.2bit \
                  --objective poisson \
                  --max_iter 30000 \
                  --prior_on_H L1 \
                  --prior_on_W L1 \
                  input.maf

Clone this wiki locally