Skip to content

interflop/InterflopBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InterFLOPBench

Floating-point error benchmarks used in numerical software analysis research. The repository contains a collection of small C programs that produce a variety of common floating-point anomalies. Each benchmark includes a metadata.json file with input datasets and the expected error category so that tools and LLMs can be exercised and evaluated.


Introduction

This repository aggregates programs exhibiting numerical issues, including:

Repository layout

examples/           # per-benchmark directories
  01_archimedes_func1/
    bench.c
    metadata.json
  02_archimedes_func2/
  ...
select_benchmark.py        
README.md                   

Categories of errors

Benchmarks are grouped into the following error classes (each referenced by number or name):

  1. Cancellation
  2. Overflow
  3. Underflow
  4. NaN
  5. Division by zero
  6. Comparison
  7. No error

These correspond to the errors field appearing in metadata.json for each dataset.


Usage

Several helper scripts allow you to explore the repository without writing custom code.

List all categories:

python3 select_benchmark_upgrade.py --list

Show inputs for a particular category (by name or number):

python3 select_benchmark_upgrade.py --category overflow
python3 select_benchmark_upgrade.py --category 3

Change the examples directory from the default (useful for testing):

python3 select_benchmark_upgrade.py --category cancellation --dir mybenchmarks

All scripts support -h/--help for additional options.


Adding a benchmark

To contribute a new case:

  1. Create a new NN_description/ directory under examples.
  2. Add a bench.c containing the tiny C program exhibiting the issue.
  3. Add a metadata.json describing one or more input vectors and listing the appropriate errors strings.

Examples in the repo can be used for guidance.

Citing InterFLOPBench

If you use InterFLOPBench in your research or projects, please cite the following paper:

Lisa Taldir, Muhammad Ahmad Saeed, David Defour, Pablo de Oliveira Castro, Eric Petit. Benchmarking Large Language Models on Floating-Point Error Classification. 2026. ⟨hal-05560550⟩

Links


About

Floating-Point Benchmark to test numerical software analysis tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages