This project implements a multimodal hateful meme detection system using CLIP (ViT-B/32) to understand both images and text. A lightweight neural classifier is trained on top of CLIP embeddings to classify memes as: Hateful or Not Hateful
Facebook Hateful Memes Dataset
π’ Train: train.jsonl
π΅ Validation: dev.jsonl
β« Test: test.jsonl (no labels, inference only)
β‘ For fast experimentation, a filtered subset (~900 samples) was used. All JSON files were filtered to match existing images to avoid broken samples.
Backbone:
π₯ CLIP ViT-B/32 (Frozen Feature Extractor)
#Pipeline:
πΌοΈ Image β clip.encode_image
π Text β clip.encode_text
π Feature Normalisation
π Concatenation (Image + Text)
Linear(1024 β 256) + ReLU
Linear(256 β 2)
βοΈ Training Setup
π§ͺ Loss: Cross-Entropy Loss
π Optimiser: AdamW
π₯οΈ Acceleration: CUDA (GPU)
π§― Stability: Gradient Clipping
π Metrics: Accuracy, Precision, Recall, F1-score
π§ CLIP Backbone: Frozen for stability
π Results (Subset)
β Training Accuracy: Improved steadily
π Validation Accuracy & F1: Improved across epochs
πΌοΈ Inference Demo (Human Evaluation)
A demo script is included to:
π² Randomly pick a meme from test.jsonl
π Display the image + text
π€ Show the modelβs prediction (Hateful / Not Hateful)
π§ͺ This allows visual inspection of model behaviour without needing labels.
π Trained on a small subset of data
π§ CLIP backbone not fine-tuned yet
βοΈ Dataset is class-imbalanced
π§ͺ Test set has no labels (dev set used for evaluation)
π§ Future Work (Work in Progress)
π Train on the full dataset
π Fine-tune last layers of CLIP
βοΈ Handle class imbalance with weighted loss
π Add error analysis + confusion matrix
π§© Ensemble Moderation System (Planned):
Combine CLIP classifier + Vision-Language LLM (BLIP-2 / LLaVA)
Add rule-based heuristics for sensitive symbols & protected groups
Fuse predictions using a meta-classifier / decision logic
1οΈβ£ Download & unzip dataset 2οΈβ£ Update paths for JSON + image folders 3οΈβ£ Train model 4οΈβ£ Run inference demo on test samples
π OpenAI CLIP
π Facebook Hateful Memes Dataset
βοΈ PyTorch, Google Colab, Kaggle