llmtesting

Here are 9 public repositories matching this topic...

sazed5055 / llmtest

pytest for LLM apps - Test for grounding failures, prompt injection, safety violations, and regressions

python testing machine-learning ai ci-cd pytest developer-tools quality-assurance claude qa-automation llm prompt-engineering chatgpt ai-testing anthropic llm-testing llm-validation llmtesting ai-agent-testing

Updated Mar 30, 2026
Python

avi350751 / test-llm-with-deepeval

Star

A hands-on exploration of Deepeval — an open-source framework for evaluating and red-teaming large language models (LLMs). This repository documents my journey of testing, benchmarking, and improving LLM reliability using custom prompts, metrics, and pipelines.

evals deepeval llmtesting

Updated Nov 2, 2025
Jupyter Notebook

avi350751 / bfsi-red-team

Star

Red teaming a banking and finance llm assistant

yaml cybersecurity redteam promptfoo aitesting llmtesting

Updated Nov 19, 2025

Yahya123-hub / LLM-Automation-Testing

Star

pytest automation-testing llmtesting groqllama

Updated Nov 8, 2025
Python

avi350751 / promptfoo-cicd

Star

Integrating promptfoo into CI/CD pipelines to automatically evaluate prompts, test for security vulnerabilities, and ensure quality before deployment.

promptfoo llmtesting

Updated Oct 24, 2025

avi350751 / autogen-playground

Star

This repo is my playground to experiment with autogen and use the same to converse, build pipelines and do LLM testing

mcp multiagent autogen llmtesting

Updated Oct 30, 2025
Python

airtasystems / AIRTA

Star

AIRTA is an open source, production-ready AI Risk Testing Agent. Point it at your chatbot, copilot, or API; build structured compliance tests from rubrics such as the EU AI Act and OECD; then assess every response against regulatory mandates.

ai oecd aisafety aitesting nistairmf euaiact llmtesting aicompliance

Updated May 28, 2026
Python

airtasystems / airta-red-team

Star

LLM Red Team security testing for LLMs and LLM-driven applications, built for red teams and whitehats. Generate adversarial suites from security playbooks (OWASP LLM, OWASP Agent, MITRE ATLAS, jailbreak, multimodal/file-upload), execute them against live targets via browser UI or HTTP API, and assess risk.

redteam aisafety redteam-tools aisecurity aitesting llmtesting aicompliance

Updated May 28, 2026
Python

Yahya123-hub / LLM-QA-Evaluation-Framework-with-Promptfoo-and-Deepeval

Star

ai-qa ai-testing promptfoo deepeval llmtesting

Updated Jun 2, 2026
Python

Improve this page

Add a description, image, and links to the llmtesting topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llmtesting topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llmtesting

Here are 9 public repositories matching this topic...

sazed5055 / llmtest

avi350751 / test-llm-with-deepeval

avi350751 / bfsi-red-team

Yahya123-hub / LLM-Automation-Testing

avi350751 / promptfoo-cicd

avi350751 / autogen-playground

airtasystems / AIRTA

airtasystems / airta-red-team

Yahya123-hub / LLM-QA-Evaluation-Framework-with-Promptfoo-and-Deepeval

Improve this page

Add this topic to your repo