Add LM regression for parameter optimization in Rust by g-bauer · Pull Request #350 · feos-org/feos

g-bauer · 2026-03-20T01:33:26Z

This PR adds parameter optimization utilities to feos-core and exposes Regressor objects to Python.

Edit: See current implementation examples and status in comment below.

… methods

…city datasets.

…structured dataset caches, added examples.

g-bauer · 2026-04-28T11:56:23Z

Update to latest changes:

Datasets (pure component)

VaporPressureDataset
LiquidDensityDataset
EquilibriumLiquidDensityDataset
EnthalpyOfVaporizationDataset
ResidualIsobaricHeatCapacityDataset

Datasets (binary mixture)

BubblePointDataset
DewPointDataset

Regressor (parameter optimisation object and utilities)

PureRegressor<T> and BinaryRegressor<T> — generic over any ParametersAD<N> implementation
DynRegressor — object-safe trait for runtime model dispatch (used by Python bindings)
Per-dataset weights
Choice of residual function: relative_difference (default), log_difference, difference
Choice of loss function: L2 (default) or Huber (IRLS)
Strategy for non-converged evaluations: Ignore (zero residual + Jacobian) or Penalty (fixed dimensionless penalty)

Datasets support construction from numpy arrays, CSV files, or record structs.
Not all rust objects are exposed as Python objects - very similar to our current setup in other APIs.
E.g. residual functions are defined through strings.

Todos

Parameter optimization has its own ParameterOptimizationError. We should include this into FeosError as variant. Currently we convert these into Python errors.
There is probably some boilerplate that can be solved with macros for the dataset constructors.

Open Decisions

I gated things behind ndarray and rayon. Any changes/cleaning up in that regard?
Should we expose more statistical quantities (parameter covariance matrix, Fisher information matrix, leverage, optimality criteria), or leave this to users? The unscaled Jacobian and residuals are accessible via predict().
I changed the ParametersAD trait quite a bit — parameter_names() and differentiable_parameters() are now separate, which resolves the "parameter order and indexing" synchronisation problem. We can discuss whether that interface is appropriate. Might be a future PR to reduce friction in the API.
Also in ParametersAD: non-differentiable parameters (e.g. association sites) are now declared via differentiable_parameters() and are rejected at construction time if passed to fit. Is that overkill?
Cleanup: I added docstrings that should be moved to proper theory docs.

AI usage

Lots of documentation and tests were generated by Claude. I changed a lot manually but we should discuss whether we are OK with LLM generated and manually checked/patched code and docstrings.
Basically the whole Python bindings were written by Claude (single shot in about 40 seconds). I thoroughly checked the code and changed things I found weird but it's ~95% Claude. I'd like to discuss whether that's something we are OK with and how/if we should go about this in future PRs (declaration of usage).

Usage Python

import feos
import pandas as pd

# Construct datasets from numpy arrays
df_vp = pd.read_csv("vapor_pressure.csv")
df_rho = pd.read_csv("liquid_density.csv")

vp = feos.VaporPressureDataset(
    df_vp.temperature_k.values,
    df_vp.vapor_pressure_pa.values,
)
rho = feos.LiquidDensityDataset(
    df_rho.temperature_k.values,
    df_rho.pressure_pa.values,
    df_rho.liquid_density_kmol_m3.values,
)

# or from CSV:
vp = feos.VaporPressureDataset.from_csv("vapor_pressure.csv")
dh = feos.EnthalpyOfVaporizationDataset.from_csv("dh_vap.csv")

# Define regressor
regressor = feos.PureRegressor(
    model=feos.EquationOfStateAD.PcSaftNonAssoc,
    datasets=[vp, rho, dh],
    params={"m": 2.0, "sigma": 3.0, "epsilon_k": 300.0, "mu": 0.0},
    fit=["m", "sigma", "epsilon_k"],
    residual="relative_difference",  # default
    weights=[1.0, 1.0, 2.0],        # optional per-dataset weights
)

# Run fit with custom configuration
result = regressor.fit(
    config=feos.RegressorConfig(
        patience=100,
        ftol=1e-12,
        xtol=1e-12,
        gtol=1e-12,
        stepbound=0.1,
        strategy=feos.NonConvergenceStrategy.penalty(10.0),  # default
    ),
    loss=feos.LossFunction.huber(0.1),  # optional; default is L2
)

# Print results
print(result)
print(result.all_parameters())

# Evaluate datasets — no argument uses the fitted parameters
dfs = [pd.DataFrame(d) for d in regressor.evaluate_datasets()]
# or pass explicit parameters
dfs = [pd.DataFrame(d) for d in regressor.evaluate_datasets(result.all_parameters())]

# Binary mixture example
bp = feos.BubblePointDataset.from_csv("bubble_point.csv")
bin_reg = feos.BinaryRegressor(
    model=feos.EquationOfStateAD.PcSaftNonAssoc,
    datasets=[bp],
    params={"m1": 2.0, ..., "k_ij": 0.01},
    fit=["k_ij"],
)
bin_result = bin_reg.fit()

Output (250 vapor pressures and 250 liquid densities for hexane):

RegressorResult(converged=true, n_eval=8, elapsed=2.8ms, aad=[vapor pressure: 6.65%, liquid density: 1.02%])
{'m': 2.9455258525936583, 'sigma': 3.8549088883947196, 'epsilon_k': 242.1332276164755, 'mu': 0.0}

Usage Rust

use feos::core::parameter_optimization::*;
use feos::pcsaft::PcSaftPure;
use std::collections::HashMap;
use std::path::Path;

fn main() {
    let vp = VaporPressureDataset::from_csv(Path::new("vapor_pressure.csv"))
        .expect("failed to read vapor pressure CSV");
    let rho = LiquidDensityDataset::from_csv(Path::new("liquid_density.csv"))
        .expect("failed to read liquid density CSV");

    let datasets = vec![
        PureDataset::VaporPressure(vp),
        PureDataset::LiquidDensity(rho),
    ];

    let params = HashMap::from([
        ("m".to_string(), 2.0),
        ("sigma".to_string(), 3.0),
        ("epsilon_k".to_string(), 300.0),
        ("mu".to_string(), 0.0),
    ]);

    let regressor = PureRegressor::<PcSaftPure<f64, 4>>::new(datasets, params, &["m", "sigma", "epsilon_k"])
        .expect("failed to build regressor");

    let config = RegressorConfig {
        patience: 100,
        ftol: 1e-12,
        xtol: 1e-12,
        gtol: 1e-12,
        stepbound: 0.1,
        strategy: NonConvergenceStrategy::Penalty(10.0),
        ..Default::default()
    };

    let (regressor, report) = regressor.fit(config);
    let result = regressor.to_result(&report);

    println!("converged:        {}", result.converged);
    println!("reason:           {}", result.termination_reason);
    println!("n_evaluations:    {}", result.n_evaluations);
    println!("aad_per_dataset:  {:?}", result.aad_per_dataset);
    println!("all_parameters:   {:?}", result.all_parameters());

    let ds_results = regressor.evaluate_datasets(&result.optimal_params);
}

Output:

converged:        true
reason:           Converged { ftol: true, xtol: false }
n_evaluations:    8
aad_per_dataset:  [Some(6.645721326146425), Some(1.0224205499118824)]
all_parameters:   {"m": 2.9455..., "sigma": 3.8549..., "epsilon_k": 242.133..., "mu": 0.0}

g-bauer added 4 commits April 1, 2026 12:03

Rebase on development

c543b2d

Added heat of vaporization and residual heat capacity datasets and AD…

3300c67

… methods

Added python bindings for heat of vaporization and isobaric heat capa…

2721b2c

…city datasets.

Fixed some bugs, generalised loss and residual function application, …

fe81486

…structured dataset caches, added examples.

g-bauer force-pushed the parameter_optimization branch from f69ecfe to fe81486 Compare April 27, 2026 20:12

Fixed tests

da7e2b0

g-bauer changed the base branch from main to development April 28, 2026 11:18

g-bauer marked this pull request as ready for review April 28, 2026 11:24

g-bauer requested a review from prehner April 28, 2026 11:25

prehner mentioned this pull request Apr 28, 2026

Breaking changes for Release v0.10.0 #339

Draft

g-bauer mentioned this pull request May 19, 2026

Datasets and properties #358

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LM regression for parameter optimization in Rust#350

Add LM regression for parameter optimization in Rust#350
g-bauer wants to merge 5 commits into
developmentfrom
parameter_optimization

g-bauer commented Mar 20, 2026 •

edited

Loading

Uh oh!

g-bauer commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

g-bauer commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

g-bauer commented Apr 28, 2026

Todos

Open Decisions

AI usage

Usage Python

Usage Rust

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

g-bauer commented Mar 20, 2026 •

edited

Loading