Skip to content

rishranyal/Hateful-Meme-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Hateful-Meme-Detection

download (1)

This project implements a multimodal hateful meme detection system using CLIP (ViT-B/32) to understand both images and text. A lightweight neural classifier is trained on top of CLIP embeddings to classify memes as: Hateful or Not Hateful

Goal: Build a practical baseline for automated hateful content moderation.

Dataset

Facebook Hateful Memes Dataset

🟒 Train: train.jsonl

πŸ”΅ Validation: dev.jsonl

⚫ Test: test.jsonl (no labels, inference only)

⚑ For fast experimentation, a filtered subset (~900 samples) was used. All JSON files were filtered to match existing images to avoid broken samples.

Model Architecture

Backbone:

πŸ”₯ CLIP ViT-B/32 (Frozen Feature Extractor)

#Pipeline:

πŸ–ΌοΈ Image β†’ clip.encode_image

πŸ“ Text β†’ clip.encode_text

πŸ“ Feature Normalisation

πŸ”— Concatenation (Image + Text)

Classifier Head:

Linear(1024 β†’ 256) + ReLU

Linear(256 β†’ 2)

βš™οΈ Training Setup

πŸ§ͺ Loss: Cross-Entropy Loss

πŸš€ Optimiser: AdamW

πŸ–₯️ Acceleration: CUDA (GPU)

🧯 Stability: Gradient Clipping

πŸ“Š Metrics: Accuracy, Precision, Recall, F1-score

🧠 CLIP Backbone: Frozen for stability

πŸ“ˆ Results (Subset)

βœ… Training Accuracy: Improved steadily

πŸ“Š Validation Accuracy & F1: Improved across epochs

Interpretation: Model learns meaningful multimodal patterns

⚠️ Note: Scores are lower due to small dataset size and frozen CLIP

πŸ–ΌοΈ Inference Demo (Human Evaluation)

A demo script is included to:

🎲 Randomly pick a meme from test.jsonl

πŸ‘€ Display the image + text

πŸ€– Show the model’s prediction (Hateful / Not Hateful)

πŸ§ͺ This allows visual inspection of model behaviour without needing labels.

Limitations

πŸ“‰ Trained on a small subset of data

🧊 CLIP backbone not fine-tuned yet

βš–οΈ Dataset is class-imbalanced

πŸ§ͺ Test set has no labels (dev set used for evaluation)

🚧 Future Work (Work in Progress)

πŸ“ˆ Train on the full dataset

πŸ”“ Fine-tune last layers of CLIP

βš–οΈ Handle class imbalance with weighted loss

πŸ” Add error analysis + confusion matrix

🧩 Ensemble Moderation System (Planned):

Combine CLIP classifier + Vision-Language LLM (BLIP-2 / LLaVA)

Add rule-based heuristics for sensitive symbols & protected groups

Fuse predictions using a meta-classifier / decision logic

▢️ How to Run

1️⃣ Download & unzip dataset 2️⃣ Update paths for JSON + image folders 3️⃣ Train model 4️⃣ Run inference demo on test samples

Acknowledgements

πŸ”— OpenAI CLIP

πŸ“š Facebook Hateful Memes Dataset

βš™οΈ PyTorch, Google Colab, Kaggle

About

This project builds a multimodal hateful meme detector using CLIP (ViT-B/32) to understand both images and text, with a lightweight classifier head trained for moderation tasks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors