NLP Text Classification for Cybercrime Reporting

Project Overview

This project aims to develop NLP models for classifying text descriptions of cybercrimes into relevant categories and subcategories. The models are built using a range of techniques, from traditional machine learning methods to state-of-the-art NLP models.

Data Description

The training dataset consists of text descriptions of cybercrimes, categorized into various categories and subcategories. The key columns are:

category: The main category of the crime.
sub_category: The subcategory providing additional context.
crimeaditionalinfo: The raw text description for analysis.

Preprocessing

Text preprocessing includes:

Converting text to lowercase.
Removing special characters and punctuation.
Tokenization.
Removing stop words.
Stemming or lemmatization.

The preprocessed data is stored in data/cleaned_text.csv.

Modeling

Various models were trained, including:

Logistic Regression: Baseline model for classification.
BERT: Fine-tuned model for state-of-the-art performance.

Evaluation Metrics

The models were evaluated using:

Accuracy
Precision, Recall, F1-Score
Confusion Matrix

Installation

Clone the repository:

git clone https://github.com/yourusername/your-project-name.git
cd your-project-name

Install the required packages:
```
pip install -r requirements.txt
```

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
main.ipynb		main.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Text Classification for Cybercrime Reporting

Project Overview

Table of Contents

Data Description

Preprocessing

Modeling

Evaluation Metrics

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLP Text Classification for Cybercrime Reporting

Project Overview

Table of Contents

Data Description

Preprocessing

Modeling

Evaluation Metrics

Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages