Note that we would need to give a list of drivers calculated by omega for all cohort samples that could come from the ones listed in deepCSA output: plots/selection_summary/all_samples.plots/all_samples.positive_selection_summary.pdf
# (1) Plot number of mutations/mutation density in positively selected genes vs other genes for all regions, all mutation types and all samples
#mut_df is the original table from deepCSA output: `/mutdensity/all_mutdensities.tsv'`
# Replace 'N_MUTS' in code to have the same analysis for mutation density
# Apply filters to data
mut_df['DRIVER_IN_COHORT'] = mut_df['GENE'].apply(lambda x: 'Yes' if x in genes_pos_selection_cohort else 'No')
selection1 = mut_df[(mut_df['GENE'] != 'ALL_GENES') &
(mut_df['REGIONS'] == 'all') &
(mut_df['MUTTYPES'] == 'all_types') &
(mut_df['SAMPLE_ID'] == 'all_samples')][[ 'SAMPLE_ID', 'GENE', 'DEPTH', 'N_MUTS', 'MUTDENSITY_MB', 'MUTDENSITY_MB_ADJUSTED', 'DRIVER_IN_COHORT']]
plt.figure(figsize=(5,5))
sns.boxplot(data=selection1, x='DRIVER_IN_COHORT', y='N_MUTS')
sns.stripplot(data=selection1, x='DRIVER_IN_COHORT', y='N_MUTS', color='orange', alpha=0.5)
# Add statistical annotation
stat, p_value = stats.mannwhitneyu(selection1[selection1['DRIVER_IN_COHORT'] == 'Yes']['N_MUTS'], selection1[selection1['DRIVER_IN_COHORT'] == 'No']['N_MUTS'])
plt.text(0.5, max(selection1['N_MUTS']) * 0.95, f'p = {p_value:.3e}', ha='center', fontsize=12)
plt.title('All mutation types and all region mutations in all samples', fontsize=14)
plt.show()
# (2) Plot number of mutations/mutation density in positively selected genes per sample
# Here we include additional filter to include those positively selected genes only
selection2 = mut_df[(mut_df['GENE'] != 'ALL_GENES') &
(mut_df['REGIONS'] == 'all') &
(mut_df['MUTTYPES'] == 'all_types') &
(mut_df['SAMPLE_ID'] != 'all_samples') &
(mut_df['DRIVER_IN_COHORT'] == 'Yes')][[ 'SAMPLE_ID', 'GENE', 'DEPTH', 'N_MUTS', 'MUTDENSITY_MB', 'MUTDENSITY_MB_ADJUSTED', 'DRIVER_IN_COHORT']]
plt.figure(figsize=(len(selection2['GENE'].unique()),8))
selection2 = selection2.sort_values(by='N_MUTS', ascending=False)
sns.boxplot(data=selection2, x='GENE', y='N_MUTS')
sns.stripplot(data=selection2, x='GENE', y='N_MUTS', color='orange', alpha=0.5)
plt.xticks(rotation=90)
plt.title('All mutation types and all region mutations', fontsize=14)
plt.show()
These plots could be added as QC from omega:
Note that we would need to give a list of drivers calculated by omega for all cohort samples that could come from the ones listed in deepCSA output:
plots/selection_summary/all_samples.plots/all_samples.positive_selection_summary.pdf