This project performs Sentiment Analysis on Twitter data using Natural Language Processing (NLP) and a Naive Bayes Classifier to classify tweets into Positive and Negative sentiments.
- Built a sentiment classifier for tweets
- Removed neutral tweets for better performance
- Applied text preprocessing and feature extraction
- Visualized frequent words using WordCloud
- Python
- NumPy, Pandas
- NLTK (Natural Language Toolkit)
- Scikit-learn
- Matplotlib
- WordCloud
-
Removed neutral tweets
-
Cleaned text:
- Removed URLs, mentions (@), hashtags (#)
- Removed stopwords
- Converted text to lowercase
- Tokenization
-
Generated WordClouds for:
- Positive tweets
- Negative tweets
-
Extracted words using
nltk.FreqDist -
Created feature set using:
contains(word)
- Algorithm: Naive Bayes (NLTK)
- Trained on processed tweet dataset
Training Size: 7449 tweets
Negative Tweets:
651 / 674 correct (~96.6%)
Positive Tweets:
51 / 176 correct (~28.9%)
-
High accuracy for Negative tweets
-
Low accuracy for Positive tweets
-
Possible reasons:
- Class imbalance
- Limited feature extraction
- Simple model
-
Use advanced models:
- Logistic Regression
- SVM
- Deep Learning (LSTM)
-
Use TF-IDF / Word Embeddings
-
Balance dataset
-
Deploy using Streamlit
-
Add real-time Twitter API
git clone https://github.com/Mehtab161/Twitter-Sentiment-Analysis.git
cd Twitter-Sentiment-Analysis
pip install -r requirements.txt
jupyter notebookSee requirements.txt
Mehtab Khan
If you like this project, give it a ⭐ on GitHub!