LP-as-a-Judge

experiments on the use of linear classifier heads for llm-as-a-judge tasks.

Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes

Sharan Maiya, Yinhong Liu, Ramit Debnath, Anna Korhonen

PAPER

This repository reproduces the experiments in our paper, presented at ACL 2025.

Abstract

Large Language Models (LLMs) are often used as automated judges to evaluate text, but their effectiveness can be hindered by various unintentional biases. We propose using linear classifying probes, trained by leveraging differences between contrasting pairs of prompts, to directly access LLMs’ latent knowledge and extract more accurate preferences. Through extensive experiments using models of varying size from four different families and six diverse datasets assessing text quality evaluation and common sense reasoning, we demonstrate that both supervised and unsupervised probing approaches consistently outperform traditional generation-based judgement while maintaining similar computational costs. These probes generalise under domain shifts and can even outperform finetuned LLM evaluators with the same training data size. Our results suggest linear probing offers an accurate, robust and computationally efficient approach for LLM-as-judge tasks while providing interpretable insights into how models encode judgement-relevant knowledge.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

@inproceedings{maiya-etal-2025-improving,
    title = "Improving Preference Extraction In {LLM}s By Identifying Latent Knowledge Through Classifying Probes",
    author = "Maiya, Sharan  and
      Liu, Yinhong  and
      Debnath, Ramit  and
      Korhonen, Anna",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.444/",
    pages = "9061--9081",
    ISBN = "979-8-89176-251-0",
}

Funding

This work was support by the UKRI Centre for Doctoral Training in Application of Artificial Intelligence to the study of Environmental Risks [EP/S022961/1].

Contact

For any queries or information, contact Sharan Maiya.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
data		data
lpaaj.egg-info		lpaaj.egg-info
lpaaj		lpaaj
notebooks		notebooks
openrlhf @ d21a99c		openrlhf @ d21a99c
sft		sft
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LP-as-a-Judge

Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes

Sharan Maiya, Yinhong Liu, Ramit Debnath, Anna Korhonen

PAPER

Abstract

License

Citation

Funding

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LP-as-a-Judge

Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes

Sharan Maiya, Yinhong Liu, Ramit Debnath, Anna Korhonen

PAPER

Abstract

License

Citation

Funding

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages