Skip to content

Add calibration package checkpoint to unified_calibration pipeline #534

@baogorek

Description

@baogorek

Context

unified_calibration.py's run_calibration() currently builds the matrix and fits weights in one monolithic call. This means any change to fitting hyperparams (lambda, lr, epochs) requires re-running the expensive clone-by-clone simulation that builds the sparse matrix.

Proposal

Save a calibration package artifact to disk as a natural intermediate step within run_calibration(), before calling fit_l0_weights. The package should contain everything needed to run fitting independently:

  • X_sparse (the calibration matrix)
  • targets_df
  • initial_weights
  • household_id_mapping
  • cd_household_indices
  • target_names

This is the same interface already used in the Kaggle-based calibration workflow (pickle dict), just produced automatically by the pipeline.

CLI changes

  • Add --build-only flag: stop after saving the package, skip fitting
  • Add --package-path flag: load a pre-built package and skip matrix construction

Why this matters

  1. Iteration speed: Tweak hyperparams without rebuilding the matrix
  2. Resilience: If the GPU fitting crashes, you don't redo the matrix
  3. Flexibility: The package can be downloaded and used in Kaggle, Modal, or locally
  4. Modal runner: remote_calibration_runner.py can split into CPU (build) and GPU (fit) functions, with the package as the handoff artifact

Related

Implementation notes

  • ~20-30 lines added to run_calibration() in unified_calibration.py
  • Package format: pickle dict (matching existing Kaggle workflow) or npz + parquet
  • The automated end-to-end pipeline remains the default; --build-only is opt-in

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions