GenAI-vs-Human-Hate

Contributors

Introduction

Hate speech is not only generated by individuals but is now also produced by AI systems. We propose to explore the differences between human and AI generated hate speech and develop a model capable of distinguishing between the two. Our research questions are:

Research Questions

Can we accurately classify whether the implicit hate speech is generated by humans or AI?
- What linguistic features are indicative of human vs AI hate?
Can using open source LLMs help us improve classification accuracy (i.e. address the imbalance problem)?
- Use similarity metric to see if AI generated data is similar to human data
- How do we want to sample data? Down sampling throws away data. Having 50/50 split of hate/non-hate is not a great idea either because the data is not representative of real world hate distributions.

Models

We will fine-tune on the following models for classification:

We will use the following open source LLMs:

Datasets

ToxiGen Dataset: A dataset of AI-generated toxic and hate speech targeting 13 groups, with 27.5k human validated rows.(Hartvigsen et al., 2022)
Implicit Hate: The ElSherief et al. (2021) dataset includes 22,056 tweets from prominent U.S. extremist groups, with 6,346 tweets labeled as containing implicit hate speech and fine-grained annotations for each message and its implications.

References

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection (Hartvigsen et al., ACL 2022)
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech (ElSherief et al., EMNLP 2021)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
notebooks		notebooks
src/implihate		src/implihate
tasks		tasks
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
invoke.yaml		invoke.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI-vs-Human-Hate

Contributors

Introduction

Research Questions

Models

Datasets

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GenAI-vs-Human-Hate

Contributors

Introduction

Research Questions

Models

Datasets

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages