🗺️ Roadmap: Limestone LLM Judge Engine

# Limestone Roadmap

This issue tracks the development roadmap for Limestone, our engine for building reliable LLM-as-a-judge evaluators.

## ✅ Completed

- [x] Initial project setup and repository structure
- [x] Comprehensive README with expert sessions and structured evaluators
- [x] Contributing guidelines and Apache 2.0 license
- [x] Discussions enabled for community feedback

## 🚧 In Progress (Q1 2026)

- [ ] **Expert session framework** - Capture and analyze expert feedback
- [ ] **Criteria extraction** - Auto-extract structured criteria from sessions
- [ ] **Basic structured evaluators** - Core judge building blocks
- [ ] **Reliability testing** - Consistency and agreement validation

## 🔮 Planned (Q2 2026)

- [ ] **Alignment datasets** - Generate expert-aligned training data
- [ ] **Judge optimization** - Automatic prompt tuning and few-shot learning
- [ ] **Stress testing** - Adversarial examples and robustness checks
- [ ] **Integration APIs** - Connect with Cobalt, LangSmith, Braintrust

## 🚀 Future

- [ ] **Expert interface** - Web UI for session management and review
- [ ] **Advanced analytics** - Bias detection, calibration, drift monitoring
- [ ] **Fine-tuning support** - Custom model alignment workflows
- [ ] **Enterprise features** - Audit trails, compliance, multi-tenant

## Reliability Goals

Our north star: **100% reliable LLM judges**

- 95%+ agreement with expert consensus
- <5% score drift over 30 days  
- Consistent scoring across identical inputs
- Transparent reasoning and confidence estimates
- Validated performance on adversarial examples

## Community Priorities

Vote on features or suggest new approaches in our [discussions](https://github.com/basalt-ai/limestone/discussions)!

Especially interested in:
- Domain expertise contributions (code, content, support, etc.)
- Evaluation science methodologies
- Real-world reliability requirements

---

**Want to contribute?** Check out our [Contributing Guide](https://github.com/basalt-ai/limestone/blob/main/CONTRIBUTING.md)

**Questions?** Join the discussion or reach out on [Discord](https://discord.gg/yW2RyZKY)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🗺️ Roadmap: Limestone LLM Judge Engine #2

Limestone Roadmap

✅ Completed

🚧 In Progress (Q1 2026)

🔮 Planned (Q2 2026)

🚀 Future

Reliability Goals

Community Priorities

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

🗺️ Roadmap: Limestone LLM Judge Engine #2

Description

Limestone Roadmap

✅ Completed

🚧 In Progress (Q1 2026)

🔮 Planned (Q2 2026)

🚀 Future

Reliability Goals

Community Priorities

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions