Fine tuning Mistral models to solve the Contract Understanding Atticus Dataset (CUAD) benchmark
- setup project using uv
- download CUAD under data/CUAD_v1
- setup Mistral API key as env variable
- run generate_dataset.py to create fine-tuning datasets and upload to Mistral
- use the helpers in utils.py (or the UI) to launch a job
- run_inference.py to generate predictions
- evaluate.py to evaluate the results
Tested on MacOS using python 3.12
| model | task | precision | recall |
|---|---|---|---|
| baseline | easy | 50% | 51% |
| fine-tuned | easy | 89% | 86% |
| baseline | hard | 16% | 16% |
| fine-tuned | hard | 29% | 29% |