This project is a small AutoML-style NLP pipeline for binary fact checking. It
trains classifiers that decide whether a claim is supported by a piece of
evidence.
The current dataset is intentionally small and local: data.csv contains 30
rows with balanced labels:
1: supported / true claim-evidence pair0: unsupported / false claim-evidence pair
.
├── data.csv # Input dataset with claim, evidence, and label columns
├── features.py # Text feature extractors
├── models.py # Model factory for supported classifiers
├── run.py # Training, Optuna search, evaluation, and experiment saving
├── requirements.txt # Python dependencies
├── Makefile # Setup and run shortcuts
└── experiments/ # Saved trial configs and F1 scores
run.py performs the full experiment workflow:
- Load
data.csv. - Split the data into train and test sets with an 80/20 split.
- Use Optuna to run 10 trials.
- For each trial, choose one feature extractor:
v1:TfidfVectorizerv2:CountVectorizerwith unigram and bigram features
- For each trial, choose one model:
logreg:LogisticRegressionrf:RandomForestClassifier
- Train the selected model and evaluate it with F1 score.
- Save each trial under
experiments/exp_<trial_id>/.
Each experiment folder contains:
config.json: feature choice, model choice, and hyperparametersscore.json: F1 score for that trial
The recorded experiments currently report an F1 score of 0.8 for all 10
trials.
Create a virtual environment and install dependencies:
make setupThis runs:
python3 -m venv .venv
.venv/bin/pip install --upgrade pip
.venv/bin/pip install -r requirements.txtRun the AutoML search:
make runOr run the script directly:
.venv/bin/python run.pyThe script prints the label distribution, runs the Optuna study, saves trial
outputs to experiments/, and prints the best parameter set and best F1 score.
data.csv must contain these columns:
claim,evidence,labelExample:
"Paris is capital of France","Paris is the capital city of France",1
"Apple is a fruit","Apple Inc makes phones",0The script expects at least two labels and at least two samples per class.
Main libraries:
pandasscikit-learnoptunanumpy
mlflow and joblib are listed in requirements.txt, but the current scripts
do not use them yet.
- The dataset is very small, so the recorded scores should be treated as a demonstration result rather than a reliable benchmark.
train_test_splitusesrandom_state=42, but it does not currently stratify by label.- New feature extractors can be added in
features.pyand registered in theFEATURESdictionary inrun.py. - New models can be added in
models.pyand included in the Optuna objective inrun.py.