Collapse Index: SST-2 Public Validation

Reproducible demonstration showing Collapse Index detects brittleness that standard benchmarks miss.

🎯 Results

Metric	Value	Notes
Model	DistilBERT-SST2	HuggingFace public model
Benchmark Accuracy	90%+	SST-2 validation set
CI Score	0.275	Moderate instability (0-1 scale)
AUC (CI)	0.698	CI predicts flips reliably
AUC (Confidence)	0.515	Confidence barely predicts flips
ΔAUC	+0.182	CI is 18% better than confidence
Flip Rate	42.8%	214/500 base cases flip
High-Conf Errors	35	Model >90% confident but wrong
Dataset Size	2,000 rows	500 base × 4 variants each

📊 The Story

Standard benchmarks say: "Ship it! 90%+ accuracy."

Reality under perturbations: Nearly half of predictions silently flip when users make typos or rephrase naturally.

Why CI matters: Confidence scores barely predict brittleness (AUC 0.515). Collapse Index catches it reliably (AUC 0.698).

🚨 Silent failures: 13 silent failures where model >90% confident BUT CI detects collapse (CI ≤ 0.45). These bypass confidence-based monitoring and cause real user harm. (13 of 35 total high-conf errors)

🔬 Dataset

Base: 500 examples from SST-2 validation set (binary sentiment classification)
Perturbations: 3 variants per base using:
- Character-level typos (keyboard distance)
- Synonym substitution (WordNet)
- Natural paraphrasing patterns
Total: 2,000 rows (500 × 4 variants)
Format: CSV with columns: id, variant_id, text, true_label, pred_label, confidence

🚀 Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Generate Dataset (Optional)

The sst2_ci_demo.csv is included, but you can regenerate:

python generate_sst2_demo.py

This will:

Download SST-2 validation set (500 examples)
Generate 3 perturbations per example
Run DistilBERT-SST2 inference on all 2,000 rows
Save to sst2_ci_demo.csv

Takes ~3-5 minutes on CPU.

3. Verify Basic Metrics (Optional)

Validate flip rate and accuracy independently:

python validate_metrics.py

This verifies metrics that don't require the full CI pipeline.

4. Analyze with Collapse Index

Request evaluation from Collapse Index Labs

📁 Files

README.md - This file
requirements.txt - Python dependencies
generate_sst2_demo.py - Dataset generation script
sst2_ci_demo.csv - Full 2,000-row dataset with predictions

🔗 Links

Full Analysis: collapseindex.org/evals.html#validation
Collapse Index Labs: collapseindex.org
Model Used: huggingface.co/distilbert-base-uncased-finetuned-sst-2-english

📝 Citation

If you use this validation dataset in your research:

@misc{ci-sst2-validation,
  title={Collapse Index: SST-2 Public Validation},
  author={Kwon, Alex},
  year={2025},
  url={https://github.com/collapseindex/ci-sst2},
  note={Collapse Index Labs}
}

Author: Alex Kwon (collapseindex.org) · ORCID: 0009-0002-2566-5538

Please also cite the original SST-2 dataset:

@inproceedings{socher2013recursive,
  title={Recursive deep models for semantic compositionality over a sentiment treebank},
  author={Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D and Ng, Andrew Y and Potts, Christopher},
  booktitle={Proceedings of the 2013 conference on empirical methods in natural language processing},
  pages={1631--1642},
  year={2013}
}

⚖️ License

This Repository: MIT License (code and methodology)
SST-2 Dataset: Available via HuggingFace Datasets (cite original paper above)
DistilBERT Model: Apache 2.0

📧 Contact

Questions? Email ask@collapseindex.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Collapse Index: SST-2 Public Validation

🎯 Results

📊 The Story

🔬 Dataset

🚀 Quick Start

1. Install Dependencies

2. Generate Dataset (Optional)

3. Verify Basic Metrics (Optional)

4. Analyze with Collapse Index

📁 Files

🔗 Links

📝 Citation

⚖️ License

📧 Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
generate_sst2_demo.py		generate_sst2_demo.py
requirements.txt		requirements.txt
sst2_ci_demo.csv		sst2_ci_demo.csv
validate_metrics.py		validate_metrics.py

collapseindex/ci-sst2

Folders and files

Latest commit

History

Repository files navigation

Collapse Index: SST-2 Public Validation

🎯 Results

📊 The Story

🔬 Dataset

🚀 Quick Start

1. Install Dependencies

2. Generate Dataset (Optional)

3. Verify Basic Metrics (Optional)

4. Analyze with Collapse Index

📁 Files

🔗 Links

📝 Citation

⚖️ License

📧 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages