This project trains a BERT-based model to classify news headlines into categories such as Politics, Sports, Crime, etc., using a labeled dataset (news_dataset.csv).
Please note there is a possibility that some of the requirements need to be modify in order to fit on your system.
pip install -r requirements.txtThe project expects a trained model and tokenizer in the bert_news_model/ folder.
To train the model locally:
python train_bert.pyThis will save:
bert_news_model/→ trained BERT modellabel_encoder.pkl→ label encoder
To test the model on a sample headline:
python predict_bert.pytrain_bert.py: Fine-tunes BERT using your news dataset.predict_bert.py: Loads the model and classifies a test headline.news_dataset.csv: Your labeled training data (must havetext,labelcolumns).bert_news_model/: Output folder for the trained model.label_encoder.pkl: Stores label-to-index mapping.suppress_warnings.py: Silences TF/PyTorch startup logs.
Summary: Google released a new tool for developers.
Predicted Category: Technology
Summary: A man was arrested in Toronto after a stabbing incident.
Predicted Category: Crime
- Requires a GPU for faster training (
torch.cudashould be available). - Compatible with Windows, Linux, or WSL environments.