sycophancy-detection

Here are 7 public repositories matching this topic...

EvXata / deepeval-bcg

Detects structural prompt weaknesses before they affect production outputs — improving the quality, consistency, and reliability of every future generation across the pipeline.

Updated May 22, 2026
Python

mellington194 / lacp-specification

Star

Lightweight Agentic Communication Protocol (LACP) & Tribunal Consensus Specification. A Layer 7 application protocol and governance standard for secure, type-safe, and consensus-driven coordination between autonomous AI agents. Includes formal JSON schemas for Context and Vote objects used in the Tribunal sycophancy detection system.

multi-agent-systems ai-safety consensus-protocol ai-governance defensive-publication sycophancy-detection

Updated Apr 29, 2026

mishi93999 / seatbelt

Star

Responsible AI auditing for LLMs and SLMs

fairness ai-safety bias-detection responsible-ai llm eu-ai-act ai-audit sycophancy-detection

Updated Apr 13, 2026
Python

MariusOpincariu-Phd / socio-technical-alignment-diagnostic

Star

A prototype for detecting sycophancy and power dynamics in educational platforms using cross-modal sentiment analysis.

ai-safety computational-social-science ethics-in-ai societal-impact sycophancy-detection

Updated Apr 21, 2026
Python

Khushi-Dhargawe / LLM-Governance-Evaluation

Star

Critical LLM evaluation across 3 structured experiments · Sycophancy detection · 4Ds AI governance framework · Python evaluation pipeline · UCC MSc Business Analytics

python critical-thinking ucc business-analytics ai-ethics prompt-engineering ai-governance llm-evaluation sycophancy-detection

Updated Jun 1, 2026
Python

synaptiai / lucid

Star

Open-source epistemic audit for your Claude conversation history. Applies eight published AI-safety research frameworks (Spiral-Bench, Sharma sycophancy, SycEval, BeliefShift, ITP, MedTrust-RAG) and produces citation-validated HTML reports.

ai-safety claude cli-tool conversation-analysis hackathon-project ai-alignment memory-audit anthropic llm-evaluation sycophancy-detection belief-drift

Updated Apr 26, 2026
Python

rishi-more-2003 / post-training-failure-evals

Star

Evaluation harness for detecting reward hacking, sycophancy, verbosity bias, and false-confidence failures in post-trained language models

pytorch calibration alignment ai-safety large-language-models rlhf llm-evaluation llm-as-judge direct-preference-optimization post-training-analysis reward-hacking sycophancy-detection

Updated May 30, 2026
Python

Improve this page

Add a description, image, and links to the sycophancy-detection topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sycophancy-detection topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sycophancy-detection

Here are 7 public repositories matching this topic...

EvXata / deepeval-bcg

mellington194 / lacp-specification

mishi93999 / seatbelt

MariusOpincariu-Phd / socio-technical-alignment-diagnostic

Khushi-Dhargawe / LLM-Governance-Evaluation

synaptiai / lucid

rishi-more-2003 / post-training-failure-evals

Improve this page

Add this topic to your repo