ai-eval

Star

Here are 7 public repositories matching this topic...

Mike-E-Log / ai-eval-atlas

Star

A self-taught engineer's structured map of the AI evaluation field.

awesome-list atlas llm-evaluation ai-eval

Updated May 22, 2026

kasimmj / claude-code-test-runner

Star

🧪 Evaluation framework for testing Claude Code skills at scale. Run regression suites across model versions.

testing pytest regression-testing claude ai-testing anthropic evals llm-evaluation claude-code ai-eval

Updated May 22, 2026

ianfh0 / deduce

Star

daily puzzle for ai agents

nextjs ai-agents anthropic daily-puzzle ai-eval

Updated Apr 15, 2026
TypeScript

klausners / prompt-optimizer

Star

Config-driven CLI that runs promptfoo evals, identifies low-scoring prompts, rewrites them via Claude API, and re-evaluates.

cli automation claude llm prompt-engineering llm-eval prompt-optimization promptfoo ai-eval

Updated Mar 26, 2026
TypeScript

KarmaEnchanter / mental-health-llm-eval

Star

Open evaluation harness for mental health LLM responses. 5 clinically-grounded rubrics, LLM-as-judge with bias controls, crisis-detection routing to 988 protocols.

psychology cbt ai-safety conversational-ai clinical-ai cohen-kappa ollama llm-evaluation llm-as-judge mental-health-ai ai-eval inter-rater-reliability eval-harness lifeline-988 open-source-eval

Updated May 24, 2026
Python

Mike-E-Log / ai-eval-toolkit

Star

Eval toolkit for LLM-as-judge calibration — Cohen's kappa, Kendall-tau, regression gates.

python mcp calibration kappa cohens-kappa inter-rater-agreement kendall-tau evals llm-evaluation llm-as-judge mt-bench ai-eval

Updated May 22, 2026
Python

Mike-E-Log / learn-ai-eval

Star

The Eval Codex — Claude-tutored AI-eval learning engine. Build eval expertise via guided practice.

learning tutorial spaced-repetition claude ai-engineering ai-evaluation anthropic llm-eval llm-evaluation llm-as-judge learning-engine ai-eval

Updated May 22, 2026
HTML

Improve this page

Add a description, image, and links to the ai-eval topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-eval topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-eval

Here are 7 public repositories matching this topic...

Mike-E-Log / ai-eval-atlas

kasimmj / claude-code-test-runner

ianfh0 / deduce

klausners / prompt-optimizer

KarmaEnchanter / mental-health-llm-eval

Mike-E-Log / ai-eval-toolkit

Mike-E-Log / learn-ai-eval

Improve this page

Add this topic to your repo