Skip to content

maiush/LP-as-a-Judge

Repository files navigation

LP-as-a-Judge

experiments on the use of linear classifier heads for llm-as-a-judge tasks.

Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes

Sharan Maiya, Yinhong Liu, Ramit Debnath, Anna Korhonen

License: MIT

This repository reproduces the experiments in our paper, presented at ACL 2025.

Abstract

Large Language Models (LLMs) are often used as automated judges to evaluate text, but their effectiveness can be hindered by various unintentional biases. We propose using linear classifying probes, trained by leveraging differences between contrasting pairs of prompts, to directly access LLMs’ latent knowledge and extract more accurate preferences. Through extensive experiments using models of varying size from four different families and six diverse datasets assessing text quality evaluation and common sense reasoning, we demonstrate that both supervised and unsupervised probing approaches consistently outperform traditional generation-based judgement while maintaining similar computational costs. These probes generalise under domain shifts and can even outperform finetuned LLM evaluators with the same training data size. Our results suggest linear probing offers an accurate, robust and computationally efficient approach for LLM-as-judge tasks while providing interpretable insights into how models encode judgement-relevant knowledge.


License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

@inproceedings{maiya-etal-2025-improving,
    title = "Improving Preference Extraction In {LLM}s By Identifying Latent Knowledge Through Classifying Probes",
    author = "Maiya, Sharan  and
      Liu, Yinhong  and
      Debnath, Ramit  and
      Korhonen, Anna",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.444/",
    pages = "9061--9081",
    ISBN = "979-8-89176-251-0",
}

Funding

This work was support by the UKRI Centre for Doctoral Training in Application of Artificial Intelligence to the study of Environmental Risks [EP/S022961/1].

Contact

For any queries or information, contact Sharan Maiya.

About

experiments on the use of linear classifier heads for llm-as-a-judge tasks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors