We train on 25,000 annotated tweets from Hugging Face (hate, offensive, neither). Start with a TF-IDF + Logistic Regression baseline, then fine-tune DistilBERT. We clean the text and apply light augmentation. For evaluation we use class-balanced F1/precision/recall with per-class reports.
pyenv local 3.11.9 # most stable
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip setuptools wheel
pip install -r requirements.txt
source .venv/bin/activate
git pull
git add --all
git commit -m "Commit message"
git pull (just in case)
git push -u origin main
During preprocessing we created two cleaned versions of the train/val/test splits. These are going to be used for tf idf and bert respectively. We just removed true noise (links, HTML artifacts, inconsistent spacing) but kept the stop words because they are important for sentiment analysis.
Here is how the data was handled:
-
Inputs:
src/data/splits/{train,val,test}.csvwith columnstext,label -
Outputs:
src/data/processed_light/{train,val,test}.csvfor TF-IDF + Logistic Regressionsrc/data/processed_bert/{train,val,test}.csvfor BERT
-
Each output file keeps the original
text, addsclean_text, and preserveslabel.Here are the cleaning details:
- Replace
@usernamewith@user(keeps the mention signal) - Replace URLs with
[URL], then lowercase (so it appears as[url]) - Strip HTML entities
- Lowercase all text
- Collapse repeated whitespace
- We do not remove stopwords, pronouns, negations, contractions, or punctuation.
- Replace
@usernamewith@user - Replace URLs with
[URL](case is preserved) - Strip HTML entities
- Collapse repeated whitespace
- We keep the original casing and punctuation for bert.
As a result of our baseline analysis, we saved two models: majority class and tf-idf + logreg.
Location: models/majority/
Files:
majority.jsoncontains the majority class (always predicts class 1 -- 'offensive')metrics.txt- contains performance metrics
How to load:
import json
with open('models/majority/majority.json', 'r') as f:
model_info = json.load(f)
majority_class = model_info['majority_class']
Location: models/logreg/
Files:
tfidf.joblib- TF-IDF vectorizer (saved as object, not dictionary)logreg.joblib- Logistic Regression classifier (saved as object, not dictionary)
The above means it's improssible to load these with .get() method
Here is how to load:
import joblib
vectorizer = joblib.load('models/logreg/tfidf.joblib')
classifier = joblib.load('models/logreg/logreg.joblib')
text = "some text here"
text_vector = vectorizer.transform([text])
prediction = classifier.predict(text_vector)[0]
- Columns:
text,label,clean_text,prediction label: True label (0/1/2)prediction: Always 1 (offensive)
- Columns:
text,true_label,predicted_label true_label: Actual class (0/1/2)predicted_label: Model prediction (0/1/2)
We fine-tune a BERT classifier on the same tweet splits as the baselines, then repeat training on an augmented version of the training set to mitigate class imbalance.
Train/val/test sizes: 17,347 / 3,718 / 3,718 (clean). After augmentation (+2× for hate speech, +1× for neither), train becomes 22,263 while val/test stay the same. Class shares shift from ~77/17/6 (offensive/neither/hate) to ~60/26/14.
Weighted cross-entropy (scikit-learn “balanced” style) to counter imbalance:
- Clean train weights ≈
[5.7766, 0.4305, 1.9843]for[hate, offensive, neither]. - Augmented train weights ≈
[2.4712, 0.5525, 1.2733].
BERT encoder with a linear head, max length 128, AdamW (lr 3e-5, wd 0.01, warmup 10%), 4 epochs, early stopping on val macro-F1 (patience 2), fp16 enabled. Per-device batch size 16 with grad-accum 2 → effective batch 32.
On the clean test set, the augmented model slightly improves overall scores and trades a bit of hate-speech recall for higher precision—raising hate-speech F1 overall. It also improves offensive recall. Under perturbation, hate-speech recall for the augmented model is unchanged (clean vs. perturbed), while its precision/F1 dip modestly.
| Model | Accuracy | Macro F1 | Weighted F1 |
|---|---|---|---|
| BERT | 0.902 | 0.772 | 0.907 |
| Aug-BERT | 0.911 | 0.783 | 0.913 |
Original BERT
| Class | Precision | Recall | F1 |
|---|---|---|---|
| Hate speech | 0.414 | 0.570 | 0.479 |
| Offensive | 0.953 | 0.929 | 0.941 |
| Neither | 0.901 | 0.888 | 0.894 |
Aug-BERT
| Class | Precision | Recall | F1 |
|---|---|---|---|
| Hate speech | 0.474 | 0.547 | 0.508 |
| Offensive | 0.950 | 0.944 | 0.947 |
| Neither | 0.903 | 0.883 | 0.893 |
Our main observations
- Hate speech: precision went up from 0.414 to 0.474, recall decreased from 0.570 to 0.547, F1 climbed from 0.479 to 0.508.
- Offensive: recall increased from 0.929 to 0.944 (precision roughly unchanged).
- Overall we saw small but consistent gains in Accuracy and Macro F1.
On 100 perturbed test samples (char noise / synonym swap), macro-F1 deltas are small (~−0.015 Aug-BERT, ~−0.005 BERT). For Aug-BERT, hate-speech recall is identical clean vs. perturbed (stayed at 0.547), while precision drops (appr −0.05) and F1 drops (appr −0.03).
Files:
-
Metrics CSVs and predictions:
results/bert_original/{clean,perturbed}/metrics.csvresults/bert_original/{clean,perturbed}/test_predictions.csvresults/bert_augmented/{clean,perturbed}/metrics.csvresults/bert_augmented/{clean,perturbed}/test_predictions.csv
-
Confusion matrices & per-class plots are saved alongside the metrics under
results/.
- 0:
hate_speech - 1:
offensive - 2:
neither
.
├── LICENSE
├── README.md
├── models
│ ├── bert_augmented
│ │ ├── best_model
│ │ │ ├── config.json
│ │ │ ├── special_tokens_map.json
│ │ │ ├── tokenizer_config.json
│ │ │ └── vocab.txt
│ │ ├── class_weights.json
│ │ ├── dataset_info.json
│ │ ├── training_config.json
│ │ └── training_history.json
│ ├── bert_original
│ │ ├── best_model
│ │ │ ├── config.json
│ │ │ ├── special_tokens_map.json
│ │ │ ├── tokenizer_config.json
│ │ │ └── vocab.txt
│ │ ├── class_weights.json
│ │ ├── dataset_info.json
│ │ ├── training_config.json
│ │ └── training_history.json
│ ├── logreg
│ │ ├── logreg.joblib
│ │ └── tfidf.joblib
│ └── majority
│ ├── majority.json
│ └── metrics.txt
├── notebooks
│ ├── bert_train_eval.ipynb
│ ├── bert_training.ipynb
│ ├── bert_training_fixed.ipynb
│ ├── eval_visualisation.ipynb
│ ├── perturbation_tests.ipynb
│ ├── stage1_evaluation.ipynb
│ └── {01_eda.ipynb}
├── predictions
│ ├── bert
│ │ ├── bert_augmented
│ │ │ ├── metrics.csv
│ │ │ ├── test_predictions.csv
│ │ │ └── val_predictions.csv
│ │ └── bert_original
│ │ ├── metrics.csv
│ │ ├── test_predictions.csv
│ │ └── val_predictions.csv
│ ├── logreg
│ │ ├── metrics.txt
│ │ ├── test_predictions.csv
│ │ └── val_predictions.csv
│ └── majority
│ ├── majority_baseline_test_predictions.csv
│ └── majority_baseline_val_predictions.csv
├── requirements.txt
├── results
│ ├── bert_augmented
│ │ ├── clean
│ │ │ └── test_predictions.csv
│ │ └── perturbed
│ │ └── test_predictions.csv
│ ├── bert_eval_confusion_matrices.png
│ ├── bert_original
│ │ ├── clean
│ │ │ └── test_predictions.csv
│ │ └── perturbed
│ │ └── test_predictions.csv
│ ├── general_evals.png
│ ├── per_class_comparison.png
│ ├── stage1_confusion_matrices.png
│ ├── stage1_overall_metrics.csv
│ └── stage1_per_class_comparison.png
├── run_phase1.py
└── src
├── __init__.py
├── __pycache__
│ ├── __init__.cpython-311.pyc
│ └── config.cpython-311.pyc
├── config.py
├── data
│ ├── bad-words.txt
│ ├── data_comparison.py
│ ├── processed
│ ├── processed_bert
│ │ ├── test.csv
│ │ ├── test_perturbation.csv
│ │ ├── train.csv
│ │ ├── train_augmented.csv
│ │ └── val.csv
│ ├── processed_light
│ │ ├── test.csv
│ │ ├── train.csv
│ │ └── val.csv
│ ├── raw
│ │ └── hate_speech.csv
│ └── splits
│ ├── test.csv
│ ├── train.csv
│ └── val.csv
├── data_augmentation.py
├── data_loader.py
├── data_perturbation.py
├── majority_baseline.py
├── preprocessing.py
└── tf_idf_baseline.py