Skip to content

vvXranjan/audio-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Sentiment Analysis

An end-to-end audio sentiment analysis pipeline using Whisper for speech-to-text and transformer-based sentiment classification.

Pipeline

Audio → Whisper (STT) → Sentiment Classification

Baseline Model

  • cardiffnlp/twitter-roberta-base-sentiment-latest

Baseline Performance

  • Accuracy: 0.967
  • Macro F1: 0.967
  • Avg Latency: 2.83 sec/clip

Fine-Tuning Results

  • Validation Accuracy: 0.923
  • Validation Macro F1: 0.915
  • Test Accuracy: 1.000
  • Test Macro F1: 1.000

Notes

The fine-tuned model achieved perfect test performance on the current split, but validation performance suggests possible overfitting due to the small dataset size. Larger real-world audio datasets are needed for stronger generalization claims.

Repository Structure

  • data/ — cleaned dataset and train/val/test splits
  • outputs/ — leaderboard and evaluation summaries
  • reports/ — model selection summary
  • src/ — data generation, evaluation, and fine-tuning scripts

Run

python src/fine_tune_cardiff_roberta.py

About

End-to-end audio sentiment analysis pipeline using Whisper and transformer models, with model benchmarking and fine-tuning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages