With the growth of online communities, it’s important to explore factors behind toxic vs. non-toxic comments. This project analyzes Reddit comments, focusing on toxicity detection, account age (in years), and subreddit patterns to uncover what drives negative interactions.
We aimed to answer the following key questions: • Do toxic comments appear more in certain subreddits? • Are older accounts less likely to post toxic comments? • What words are most linked to toxic behavior? • How balanced is the dataset between toxic and non-toxic comments?
| Description | Source |
|---|---|
| Raw dataset of Reddit comments | reddit.com |
- 🐍 Python Libraries: pandas, numpy, matplotlib,etc...
- 🧹 Text Cleaning: Lowercasing text , removing the username column , replacing the account_age_days with account_age_years
- 🔎 Features Extracted: subreddit, comment_text, account_age (years), toxicity label
- [leen binmueqal]
- [Ghalia Alkhaldie]
- [Rana Alnagashy]
- [Juri Alghamdi]
- [Aryam Almutairi]
Supervised by: Dr. [ Abeer Aldayel]
