experiments on the use of linear classifier heads for llm-as-a-judge tasks.
This repository reproduces the experiments in our paper, presented at ACL 2025.
Large Language Models (LLMs) are often used as automated judges to evaluate text, but their effectiveness can be hindered by various unintentional biases. We propose using linear classifying probes, trained by leveraging differences between contrasting pairs of prompts, to directly access LLMs’ latent knowledge and extract more accurate preferences. Through extensive experiments using models of varying size from four different families and six diverse datasets assessing text quality evaluation and common sense reasoning, we demonstrate that both supervised and unsupervised probing approaches consistently outperform traditional generation-based judgement while maintaining similar computational costs. These probes generalise under domain shifts and can even outperform finetuned LLM evaluators with the same training data size. Our results suggest linear probing offers an accurate, robust and computationally efficient approach for LLM-as-judge tasks while providing interpretable insights into how models encode judgement-relevant knowledge.
This project is licensed under the MIT License - see the LICENSE file for details.
@inproceedings{maiya-etal-2025-improving,
title = "Improving Preference Extraction In {LLM}s By Identifying Latent Knowledge Through Classifying Probes",
author = "Maiya, Sharan and
Liu, Yinhong and
Debnath, Ramit and
Korhonen, Anna",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-long.444/",
pages = "9061--9081",
ISBN = "979-8-89176-251-0",
}
This work was support by the UKRI Centre for Doctoral Training in Application of Artificial Intelligence to the study of Environmental Risks [EP/S022961/1].
For any queries or information, contact Sharan Maiya.