Skip to content

Add LM regression for parameter optimization in Rust#350

Open
g-bauer wants to merge 5 commits into
developmentfrom
parameter_optimization
Open

Add LM regression for parameter optimization in Rust#350
g-bauer wants to merge 5 commits into
developmentfrom
parameter_optimization

Conversation

@g-bauer
Copy link
Copy Markdown
Contributor

@g-bauer g-bauer commented Mar 20, 2026

This PR adds parameter optimization utilities to feos-core and exposes Regressor objects to Python.

Edit: See current implementation examples and status in comment below.

@g-bauer g-bauer force-pushed the parameter_optimization branch from f69ecfe to fe81486 Compare April 27, 2026 20:12
@g-bauer g-bauer changed the base branch from main to development April 28, 2026 11:18
@g-bauer g-bauer marked this pull request as ready for review April 28, 2026 11:24
@g-bauer g-bauer requested a review from prehner April 28, 2026 11:25
@g-bauer
Copy link
Copy Markdown
Contributor Author

g-bauer commented Apr 28, 2026

Update to latest changes:

Datasets (pure component)

  • VaporPressureDataset
  • LiquidDensityDataset
  • EquilibriumLiquidDensityDataset
  • EnthalpyOfVaporizationDataset
  • ResidualIsobaricHeatCapacityDataset

Datasets (binary mixture)

  • BubblePointDataset
  • DewPointDataset

Regressor (parameter optimisation object and utilities)

  • PureRegressor<T> and BinaryRegressor<T> — generic over any ParametersAD<N> implementation
  • DynRegressor — object-safe trait for runtime model dispatch (used by Python bindings)
  • Per-dataset weights
  • Choice of residual function: relative_difference (default), log_difference, difference
  • Choice of loss function: L2 (default) or Huber (IRLS)
  • Strategy for non-converged evaluations: Ignore (zero residual + Jacobian) or Penalty (fixed dimensionless penalty)

Datasets support construction from numpy arrays, CSV files, or record structs.
Not all rust objects are exposed as Python objects - very similar to our current setup in other APIs.
E.g. residual functions are defined through strings.

Todos

  • Parameter optimization has its own ParameterOptimizationError. We should include this into FeosError as variant. Currently we convert these into Python errors.
  • There is probably some boilerplate that can be solved with macros for the dataset constructors.

Open Decisions

  • I gated things behind ndarray and rayon. Any changes/cleaning up in that regard?
  • Should we expose more statistical quantities (parameter covariance matrix, Fisher information matrix, leverage, optimality criteria), or leave this to users? The unscaled Jacobian and residuals are accessible via predict().
  • I changed the ParametersAD trait quite a bit — parameter_names() and differentiable_parameters() are now separate, which resolves the "parameter order and indexing" synchronisation problem. We can discuss whether that interface is appropriate. Might be a future PR to reduce friction in the API.
  • Also in ParametersAD: non-differentiable parameters (e.g. association sites) are now declared via differentiable_parameters() and are rejected at construction time if passed to fit. Is that overkill?
  • Cleanup: I added docstrings that should be moved to proper theory docs.

AI usage

  • Lots of documentation and tests were generated by Claude. I changed a lot manually but we should discuss whether we are OK with LLM generated and manually checked/patched code and docstrings.
  • Basically the whole Python bindings were written by Claude (single shot in about 40 seconds). I thoroughly checked the code and changed things I found weird but it's ~95% Claude. I'd like to discuss whether that's something we are OK with and how/if we should go about this in future PRs (declaration of usage).

Usage Python

import feos
import pandas as pd

# Construct datasets from numpy arrays
df_vp = pd.read_csv("vapor_pressure.csv")
df_rho = pd.read_csv("liquid_density.csv")

vp = feos.VaporPressureDataset(
    df_vp.temperature_k.values,
    df_vp.vapor_pressure_pa.values,
)
rho = feos.LiquidDensityDataset(
    df_rho.temperature_k.values,
    df_rho.pressure_pa.values,
    df_rho.liquid_density_kmol_m3.values,
)

# or from CSV:
vp = feos.VaporPressureDataset.from_csv("vapor_pressure.csv")
dh = feos.EnthalpyOfVaporizationDataset.from_csv("dh_vap.csv")

# Define regressor
regressor = feos.PureRegressor(
    model=feos.EquationOfStateAD.PcSaftNonAssoc,
    datasets=[vp, rho, dh],
    params={"m": 2.0, "sigma": 3.0, "epsilon_k": 300.0, "mu": 0.0},
    fit=["m", "sigma", "epsilon_k"],
    residual="relative_difference",  # default
    weights=[1.0, 1.0, 2.0],        # optional per-dataset weights
)

# Run fit with custom configuration
result = regressor.fit(
    config=feos.RegressorConfig(
        patience=100,
        ftol=1e-12,
        xtol=1e-12,
        gtol=1e-12,
        stepbound=0.1,
        strategy=feos.NonConvergenceStrategy.penalty(10.0),  # default
    ),
    loss=feos.LossFunction.huber(0.1),  # optional; default is L2
)

# Print results
print(result)
print(result.all_parameters())

# Evaluate datasets — no argument uses the fitted parameters
dfs = [pd.DataFrame(d) for d in regressor.evaluate_datasets()]
# or pass explicit parameters
dfs = [pd.DataFrame(d) for d in regressor.evaluate_datasets(result.all_parameters())]

# Binary mixture example
bp = feos.BubblePointDataset.from_csv("bubble_point.csv")
bin_reg = feos.BinaryRegressor(
    model=feos.EquationOfStateAD.PcSaftNonAssoc,
    datasets=[bp],
    params={"m1": 2.0, ..., "k_ij": 0.01},
    fit=["k_ij"],
)
bin_result = bin_reg.fit()

Output (250 vapor pressures and 250 liquid densities for hexane):

RegressorResult(converged=true, n_eval=8, elapsed=2.8ms, aad=[vapor pressure: 6.65%, liquid density: 1.02%])
{'m': 2.9455258525936583, 'sigma': 3.8549088883947196, 'epsilon_k': 242.1332276164755, 'mu': 0.0}

Usage Rust

use feos::core::parameter_optimization::*;
use feos::pcsaft::PcSaftPure;
use std::collections::HashMap;
use std::path::Path;

fn main() {
    let vp = VaporPressureDataset::from_csv(Path::new("vapor_pressure.csv"))
        .expect("failed to read vapor pressure CSV");
    let rho = LiquidDensityDataset::from_csv(Path::new("liquid_density.csv"))
        .expect("failed to read liquid density CSV");

    let datasets = vec![
        PureDataset::VaporPressure(vp),
        PureDataset::LiquidDensity(rho),
    ];

    let params = HashMap::from([
        ("m".to_string(), 2.0),
        ("sigma".to_string(), 3.0),
        ("epsilon_k".to_string(), 300.0),
        ("mu".to_string(), 0.0),
    ]);

    let regressor = PureRegressor::<PcSaftPure<f64, 4>>::new(datasets, params, &["m", "sigma", "epsilon_k"])
        .expect("failed to build regressor");

    let config = RegressorConfig {
        patience: 100,
        ftol: 1e-12,
        xtol: 1e-12,
        gtol: 1e-12,
        stepbound: 0.1,
        strategy: NonConvergenceStrategy::Penalty(10.0),
        ..Default::default()
    };

    let (regressor, report) = regressor.fit(config);
    let result = regressor.to_result(&report);

    println!("converged:        {}", result.converged);
    println!("reason:           {}", result.termination_reason);
    println!("n_evaluations:    {}", result.n_evaluations);
    println!("aad_per_dataset:  {:?}", result.aad_per_dataset);
    println!("all_parameters:   {:?}", result.all_parameters());

    let ds_results = regressor.evaluate_datasets(&result.optimal_params);
}

Output:

converged:        true
reason:           Converged { ftol: true, xtol: false }
n_evaluations:    8
aad_per_dataset:  [Some(6.645721326146425), Some(1.0224205499118824)]
all_parameters:   {"m": 2.9455..., "sigma": 3.8549..., "epsilon_k": 242.133..., "mu": 0.0}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant