This repository contains supporting materials for the preprint on debate-protocol design in multi-agent LLM systems. The study compares three debate protocols under matched prompting and decoding conditions to examine how protocol design affects peer-reference behavior, argument diversity, and consensus formation.
This is a support repository for reproducing the main results reported in the preprint, not the full private development workspace. The preprint PDF itself is not stored in this repository.
data/: the dataset used in the studyscripts/: core scripts for running the primary protocol comparison, validating the judge, aggregating runs, and plotting aggregate resultsdemo_streamlit/: a lightweight interactive demo for illustrating the debate protocols
- local virtual environments
- recovery logs and chat exports
- large intermediate run folders
- trained adapters and other heavy artifacts
- private or local-only scratch files
If you want the smallest clean release, publish:
data/data.csvscripts/rq1_quick_experiment.pyscripts/aggregate_rq1_runs.pyscripts/plot_rq1_aggregate.pyscripts/validate_judge_model.pyscripts/judge_validation_examples.jsondemo_streamlit/
You can add the rest later if needed.
Install the script dependencies:
pip install -r requirements.txtValidate the judge model:
python scripts/validate_judge_model.py \
--examples scripts/judge_validation_examples.json \
--judge-model mistral:latest \
--out outputs/judge_validation.jsonRun the primary comparison:
python scripts/rq1_quick_experiment.py \
--data data/data.csvAggregate the run folders:
python scripts/aggregate_rq1_runs.py \
--runs-root outputs \
--out outputs/rq1_aggregate.json \
--data data/data.csvPlot the aggregate results:
python scripts/plot_rq1_aggregate.py \
--input outputs/rq1_aggregate.json \
--output outputs/rq1_aggregate.png- The Streamlit demo is intentionally simplified. It is useful for intuition and communication, but it does not reproduce the full experimental workflow reported in the preprint.
- Aggregate result files can be added later if you want a snapshot of the exact plotted outputs, but they are not required for a clean first public release.
- add a repo-level
requirements.txtorpyproject.toml - decide whether the Streamlit demo should live in this repo under
demo/or in a separate promotion-focused repo