-
Notifications
You must be signed in to change notification settings - Fork 22
Running on Mutational Data
Decomposition of mutational signatures using signatureanalyzer. For a comprehensive description of mutational signatures, their relevance, and references, please see the Catalogue of Somatic Mutations in Cancer, or COSMIC, here. The following document is a reference for important considerations when running this method.
For mutational signatures, we assume a poisson distribution of counts and use Fevotte & Tan's derivation of a poisson objective function for ARD-NMF. Thus, it is important to use the default value for the objective function (poisson).
Use:
--objective poisson
Select which human genome build to use for mapping. We build base contexts using a 2-bit representation of the genome build. These may be downloaded here:
- hg38:
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit - hg19:
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit
Use:
--hg_build <PATH>/hg19.2bit
Signature Analyzer supports encoding of:
- Single Base Substitution (SBS) Signatures (WGS:
cosmic3, WES:cosmic3_exome) - Doublet Base Substitution (DBS) Signatures (DBS:
cosmic3_DBS) - Small Insertion & Deletion (ID) Signatures (ID:
cosmic3_ID)
Signature Analyzer supports encoding of:
- 1536 Single Base Substitution (SBS) Signatures (
pcawg_SBS) - Composite Signatures (1536 SBS:
pcawg_COMPOSITE, 96 SBS:pcawg_COMPOSITE96) - SBS + ID Signatures (1536 SBS:
pcawg_SBS_ID, 96 SBS:pcawg_SBS96_ID)
Signature Analyzer supports encoding of:
- 1536 Single Base Substitution + Indel* Signatures
- 96 Single Base Substitution + Indels* Signatures
*Indel Features: INS1, … , INS>=4, DEL1, … , DEL>=4
Use:
--reference {cosmic2,cosmic3,cosmic3_exome,cosmic3_DBS,cosmic3_ID,cosmic3_TSB,
pcawg_SBS,pcawg_COMPOSITE,pcawg_COMPOSITE96,pcawg_SBS_ID,pcawg_SBS96_ID,
polymerase_msi,polymerase_msi96}
We generally impose an exponential (L1) prior on the W & H matrices for non-negative matrix factorization.
Use:
--prior_on_H L1 --prior_on_W L1
This method may be run in two ways, from a .maf file or a spectra file (.txt, .parquet, .txt.gz, .csv).
-
Mutation Annotation Format: for details on this format (
.maf), please see this reference from NCI's Genomic Data Commons website- If this option is used,
signatureanalyzerwill generate a spectra using the.mafbased on what--referenceoption is selected
- If this option is used,
- Spectra: this option is provided if the user wants to provide a pre-computed mutational spectra (ex. 96-base context; see COSMIC site or Generating Mutational Spectra
Use:
signatureanalyzer -n 10 \
--reference cosmic3_exome \
--hg_build hg38.2bit \
--objective poisson \
--max_iter 30000 \
--prior_on_H L1 \
--prior_on_W L1 \
input.maf