GitHub - nbashyal/debiasing_LLM

This project aims to explore ways to measure and mitigate bias in GPT-2 language models. The project uses GPT-2 model and several techniques for measuring bias such as Regard score from Huggingface and WEAT score for embeddings. The project also runs Big-Bench tasks to evaluate the performance of the model.

The main focus of the project is to implement different debiasing techniques to mitigate the bias present in the model. Two techniques have been identified for this purpose:

Fine-tuning GPT-2 with transformer architecture changes: This technique involves freezing one or more layers of the GPT-2 model during the fine-tuning process. This will allow the model to learn from the dataset without the influence of the frozen layers, which can help reduce the bias in the output.

We will investigate the use of masking techniques to identify and modify the updates during the feedforward layers of the GPT-2 model. Specifically, we will de-weight the contribution of bias tokens during these updates to reduce their impact on the model's predictions. To achieve this, we will first identify the tokens that are associated with biased language using existing bias measurement techniques. Then, we will use a masking technique to zero out the gradients for these tokens during the feedforward layers. This approach will effectively reduce the contribution of the biased tokens to the model's predictions.By de-weighting the impact of the biased tokens in this way, we expect to mitigate the effects of bias in the GPT-2 model and improve its overall performance.

This project will be updated with the results of the experiments conducted using these debiasing techniques. The project code is available on GitHub and contributions from the community are welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Big_Bench_test.ipynb		Big_Bench_test.ipynb
HF_bias_evaluations.ipynb		HF_bias_evaluations.ipynb
README.md		README.md
WEAT_score_gender.ipynb		WEAT_score_gender.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages