Skip to content

Plot mutation density in positively selected genes as internal control #437

@efigb

Description

@efigb

These plots could be added as QC from omega:

  • Plot number of mutations/mutation density in positively selected genes vs rest of genes for all regions, all mutation types and all samples
  • Plot number of mutations/mutation density in positively selected genes per sample

Note that we would need to give a list of drivers calculated by omega for all cohort samples that could come from the ones listed in deepCSA output: plots/selection_summary/all_samples.plots/all_samples.positive_selection_summary.pdf

# (1) Plot number of mutations/mutation density in positively selected genes vs other genes for all regions, all mutation types and all samples
#mut_df is the original table from deepCSA output: `/mutdensity/all_mutdensities.tsv'`
# Replace 'N_MUTS' in code to have the same analysis for mutation density

# Apply filters to data
mut_df['DRIVER_IN_COHORT'] = mut_df['GENE'].apply(lambda x: 'Yes' if x in genes_pos_selection_cohort else 'No')
selection1 = mut_df[(mut_df['GENE'] != 'ALL_GENES') &
                     (mut_df['REGIONS'] == 'all') &
                       (mut_df['MUTTYPES'] == 'all_types') &
                        (mut_df['SAMPLE_ID'] == 'all_samples')][[ 'SAMPLE_ID', 'GENE', 'DEPTH', 'N_MUTS', 'MUTDENSITY_MB', 'MUTDENSITY_MB_ADJUSTED', 'DRIVER_IN_COHORT']]

plt.figure(figsize=(5,5))
sns.boxplot(data=selection1, x='DRIVER_IN_COHORT', y='N_MUTS')
sns.stripplot(data=selection1, x='DRIVER_IN_COHORT', y='N_MUTS', color='orange', alpha=0.5)

# Add statistical annotation 
stat, p_value = stats.mannwhitneyu(selection1[selection1['DRIVER_IN_COHORT'] == 'Yes']['N_MUTS'], selection1[selection1['DRIVER_IN_COHORT'] == 'No']['N_MUTS'])
plt.text(0.5, max(selection1['N_MUTS']) * 0.95, f'p = {p_value:.3e}', ha='center', fontsize=12)
plt.title('All mutation types and all region mutations in all samples', fontsize=14)
plt.show()

# (2) Plot number of mutations/mutation density in positively selected genes per sample
# Here we include additional filter to include those positively selected genes only
selection2 = mut_df[(mut_df['GENE'] != 'ALL_GENES') &
                     (mut_df['REGIONS'] == 'all') &
                       (mut_df['MUTTYPES'] == 'all_types') &
                        (mut_df['SAMPLE_ID'] != 'all_samples') &
                        (mut_df['DRIVER_IN_COHORT'] == 'Yes')][[ 'SAMPLE_ID', 'GENE', 'DEPTH', 'N_MUTS', 'MUTDENSITY_MB', 'MUTDENSITY_MB_ADJUSTED', 'DRIVER_IN_COHORT']]

plt.figure(figsize=(len(selection2['GENE'].unique()),8))
selection2 = selection2.sort_values(by='N_MUTS', ascending=False)
sns.boxplot(data=selection2, x='GENE', y='N_MUTS')
sns.stripplot(data=selection2, x='GENE', y='N_MUTS', color='orange', alpha=0.5)
plt.xticks(rotation=90)
plt.title('All mutation types and all region mutations', fontsize=14)
plt.show()

Image Image

Metadata

Metadata

Assignees

Labels

discussAny potential implementation that requires discussionplotFor plotting related issues

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions