GitHub - Harshit-collab104/IMDB_Sentiment-Analysis: "Binary sentiment analysis on IMDB dataset using Logistic Regression and SVM.

1. IMDB Sentiment Classification

This project compares Logistic regression and Support Vector Machine(SVM) to classify the IMDB data set as "postive" or "negative"
To evaluate the models the evalutation metrics used are:
1. Accuracy
2. Precison
3. Recall
4. F1_score
5. Average Precision Score

2. Dataset

Source IMDB Sentient Dataset (50,000 labeled movie reviews)
Each review is classified as "Positive" or "Negative" (Binary Classification)
Data set format: CSV file with two columns:
1. 'review' -> The text containing the movie reviews
2. 'sentiment' -> Either "postive" or "negative"

3. Installation and Setup

Follow these steps to set up the project locally:

1. Clone the repository

git clone https://github.com/Harshit-collab104/IMDB_Sentiment-Analysis
cd <your-repo-folder>
Replace <your-repo-folder> with the path to the folder where you cloned the repository.

2. Create a virtual environment

#On windows 
python -m venv venv
venv\Scripts\activate

# On macOS/Linux
python3 -m venv venv
source venv/bin/activate

3. Install dependencies

pip install --upgrade pip
pip install -r requirements.txt

4. Run the main file

python main.py

4. Approach Used

Data preprocessing:
- Removes HTML tags, punctuations and stop_words
- Converted text to lowercase
- Applied TF-IDF vectorization
Models used:
- Logistic Regression
- Linear SVM
Evaluation Metrics:
- Accuarcy
- Precison Score
- Recall
- F1-score
- Average Precision score
Visualization:
- Confusion Matrices for both the models
- Bar chart comapring performance metrices

5. Results

Across all evaluation metrics, Logistic Regression outperformed Linear SVM on the IMDB sentiment dataset.

The table below shows the performance of both models on the IMDB sentiment dataset:

Model	Accuracy	Precision	Recall	F1-score	Average Precision
Logistic Regression	0.8953	0.8834	0.9127	0.8978	0.9602
Linear SVM	0.8865	0.8824	0.8938	0.8881	0.9543

Precision Difference - 0.10%

6. Visualizations

Confusion Matrix (Logistic Regression & SVM)
Training vs Testing Accuracy
Metrics Comparison (Accuracy, Precision, Recall, F1, Average Precision)

7. Conclusion

Both logistic Regression and SVM are effective for text and sentiment classification
TF-IDF and Linar models work well for high -dimensional text classification tasks
Based on the evaluation metrices- (Accuracy, Precision Score, Recall, F1-score and Average Precision) - Logistic Regression outperforms Linear SVM by about 0.1% in precision. This suggests that Logistic Regression effectively captures the linear separability of features in the dataset, providing more reliable predictions for distinguishing positive and negative reviews.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
results		results
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. IMDB Sentiment Classification

2. Dataset

3. Installation and Setup

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Run the main file

4. Approach Used

5. Results

6. Visualizations

7. Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

1. IMDB Sentiment Classification

2. Dataset

3. Installation and Setup

1. Clone the repository

2. Create a virtual environment

3. Install dependencies

4. Run the main file

4. Approach Used

5. Results

6. Visualizations

7. Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages