Multi-Agent Debate Protocols

This repository contains supporting materials for the preprint on debate-protocol design in multi-agent LLM systems. The study compares three debate protocols under matched prompting and decoding conditions to examine how protocol design affects peer-reference behavior, argument diversity, and consensus formation.

This is a support repository for reproducing the main results reported in the preprint, not the full private development workspace. The preprint PDF itself is not stored in this repository.

Included

data/: the dataset used in the study
scripts/: core scripts for running the primary protocol comparison, validating the judge, aggregating runs, and plotting aggregate results
demo_streamlit/: a lightweight interactive demo for illustrating the debate protocols

Not Included

local virtual environments
recovery logs and chat exports
large intermediate run folders
trained adapters and other heavy artifacts
private or local-only scratch files

Minimal first public version

If you want the smallest clean release, publish:

data/data.csv
scripts/rq1_quick_experiment.py
scripts/aggregate_rq1_runs.py
scripts/plot_rq1_aggregate.py
scripts/validate_judge_model.py
scripts/judge_validation_examples.json
demo_streamlit/

You can add the rest later if needed.

Reproducing the main protocol comparison

Install the script dependencies:

pip install -r requirements.txt

Validate the judge model:

python scripts/validate_judge_model.py \
  --examples scripts/judge_validation_examples.json \
  --judge-model mistral:latest \
  --out outputs/judge_validation.json

Run the primary comparison:

python scripts/rq1_quick_experiment.py \
  --data data/data.csv

Aggregate the run folders:

python scripts/aggregate_rq1_runs.py \
  --runs-root outputs \
  --out outputs/rq1_aggregate.json \
  --data data/data.csv

Plot the aggregate results:

python scripts/plot_rq1_aggregate.py \
  --input outputs/rq1_aggregate.json \
  --output outputs/rq1_aggregate.png

Notes

The Streamlit demo is intentionally simplified. It is useful for intuition and communication, but it does not reproduce the full experimental workflow reported in the preprint.
Aggregate result files can be added later if you want a snapshot of the exact plotted outputs, but they are not required for a clean first public release.

Possible later additions

add a repo-level requirements.txt or pyproject.toml
decide whether the Streamlit demo should live in this repo under demo/ or in a separate promotion-focused repo

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
demo_streamlit		demo_streamlit
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Agent Debate Protocols

Included

Not Included

Minimal first public version

Reproducing the main protocol comparison

Notes

Possible later additions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Debate Protocols

Included

Not Included

Minimal first public version

Reproducing the main protocol comparison

Notes

Possible later additions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages