Skip to content

๐Ÿ—บ๏ธ Roadmap: Limestone LLM Judge Engineย #2

@fdefitte

Description

@fdefitte

Limestone Roadmap

This issue tracks the development roadmap for Limestone, our engine for building reliable LLM-as-a-judge evaluators.

โœ… Completed

  • Initial project setup and repository structure
  • Comprehensive README with expert sessions and structured evaluators
  • Contributing guidelines and Apache 2.0 license
  • Discussions enabled for community feedback

๐Ÿšง In Progress (Q1 2026)

  • Expert session framework - Capture and analyze expert feedback
  • Criteria extraction - Auto-extract structured criteria from sessions
  • Basic structured evaluators - Core judge building blocks
  • Reliability testing - Consistency and agreement validation

๐Ÿ”ฎ Planned (Q2 2026)

  • Alignment datasets - Generate expert-aligned training data
  • Judge optimization - Automatic prompt tuning and few-shot learning
  • Stress testing - Adversarial examples and robustness checks
  • Integration APIs - Connect with Cobalt, LangSmith, Braintrust

๐Ÿš€ Future

  • Expert interface - Web UI for session management and review
  • Advanced analytics - Bias detection, calibration, drift monitoring
  • Fine-tuning support - Custom model alignment workflows
  • Enterprise features - Audit trails, compliance, multi-tenant

Reliability Goals

Our north star: 100% reliable LLM judges

  • 95%+ agreement with expert consensus
  • <5% score drift over 30 days
  • Consistent scoring across identical inputs
  • Transparent reasoning and confidence estimates
  • Validated performance on adversarial examples

Community Priorities

Vote on features or suggest new approaches in our discussions!

Especially interested in:

  • Domain expertise contributions (code, content, support, etc.)
  • Evaluation science methodologies
  • Real-world reliability requirements

Want to contribute? Check out our Contributing Guide

Questions? Join the discussion or reach out on Discord

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions