Limestone Roadmap
This issue tracks the development roadmap for Limestone, our engine for building reliable LLM-as-a-judge evaluators.
โ
Completed
๐ง In Progress (Q1 2026)
๐ฎ Planned (Q2 2026)
๐ Future
Reliability Goals
Our north star: 100% reliable LLM judges
- 95%+ agreement with expert consensus
- <5% score drift over 30 days
- Consistent scoring across identical inputs
- Transparent reasoning and confidence estimates
- Validated performance on adversarial examples
Community Priorities
Vote on features or suggest new approaches in our discussions!
Especially interested in:
- Domain expertise contributions (code, content, support, etc.)
- Evaluation science methodologies
- Real-world reliability requirements
Want to contribute? Check out our Contributing Guide
Questions? Join the discussion or reach out on Discord
Limestone Roadmap
This issue tracks the development roadmap for Limestone, our engine for building reliable LLM-as-a-judge evaluators.
โ Completed
๐ง In Progress (Q1 2026)
๐ฎ Planned (Q2 2026)
๐ Future
Reliability Goals
Our north star: 100% reliable LLM judges
Community Priorities
Vote on features or suggest new approaches in our discussions!
Especially interested in:
Want to contribute? Check out our Contributing Guide
Questions? Join the discussion or reach out on Discord