Current Status: Functional annotation steps (Pfam, DeepTMHMM) are currently running on the full dataset. Downstream isoform switch metrics are being integrated.
Hürthle cell carcinoma (HCC) is a subtype of thyroid cancer, accounting for 3-5% of all thyroid malignancies. HCC is characterised by an abundance of malfunctioning mitochondria and poor response to radioiodine therapy. While prior studies have documented mitochondrial complex I DNA mutations and metabolomic vulnerabilities in HCC, the transcriptomic landscape remains largely unexplored. No studies to date have specifically characterised isoform switching events in HCC.
The analysis is performed on NCBI GEO dataset and explores the expression profiles of different isoforms in HCC tissues. The original study explored metabolomic profiles of HCC and identified that mitochondrial complex I loss along with lipid peroxide stress is a vulnerability in HCC. This analysis performed on a subset of samples explores the isoform profiles in HCC, identifies some major genes undergoing functional isoform switching and highlights alternative splicing mechanisms that may drive the pathogenesis of the disease.
- Quantify the expression of transcripts between normal & HCC tissues.
- Identify transcript isoforms in HCC tissues.
- Predict functional consequences & alternative splicing event mechanisms in HCC tissues.
The dataset for this analysis have been obtained from NCBI GEO with accession ID GSE228870.
- Data Import
-
Download paired-end raw reads (fastq files) from NCBI GEO.
-
Tool: fasterq-dump
- Initial QC
-
Check the quality of raw reads, including per base sequence quality, adapter sequences, GC content etc.
-
Tool: fastqc
- QC
-
All-in-one processing to remove low quality sequences, over represented sequences & adapter trimming.
-
Tool: fastp
- Quantification
-
Mapping & quantification of reads against a reference transcriptome (GRCh38).
-
Tool: Salmon
- Post-Alignment QC
-
Check the mapping quality & mapping rate of reads against the reference transcriptome.
-
Tool: MultiQC
- Isoform Switch Analysis
-
Isoform switches in tumor samples along with their functional consequences including Non-sense mediated decay (NMD) sensitivity, intron retention & coding potential of isoform was analyzed.
-
Tool: IsoformSwitchAnalyzeR
- Visualization
-
Statistically significant switches, alternative splicing events & consequence summary for different genes was visualized.
-
Tool: IsoformSwitchAnalyzeR
Why Salmon over HISAT2 + featureCounts?
Isoform-level quantification requires transcript-level resolution. HISAT2 + featureCounts is optimised for gene-level count matrices and would have required additional assembly steps (e.g. StringTie) to recover novel isoforms, which was outside the scope of this analysis. Salmon's quasi-mapping approach quantifies directly against the reference transcriptome at transcript resolution, is computationally efficient, and its output integrates directly with IsoformSwitchAnalyzeR. --validateMappings was enabled to improve mapping accuracy by removing invalid multi-mapping reads.
Why GENCODE v44 as reference?
GENCODE v44 provides the most comprehensive human transcript annotation, including complete genome assembly (GRCh38.p14) and a matched GTF file required for transcript-level quantification. Using a comprehensive annotation is critical for isoform analysis.
Why isoform-level analysis over standard DEG?
Standard DEG analysis (e.g. DESeq2 on gene-level counts) would not capture scenarios where total gene expression is stable but isoform usage shifts — a pattern that can have profound functional consequences. DIXDC1 is a direct example from this dataset: gene-level analysis would have missed it entirely.
This section will be updated once analysis is completed.