-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Context
unified_calibration.py's run_calibration() currently builds the matrix and fits weights in one monolithic call. This means any change to fitting hyperparams (lambda, lr, epochs) requires re-running the expensive clone-by-clone simulation that builds the sparse matrix.
Proposal
Save a calibration package artifact to disk as a natural intermediate step within run_calibration(), before calling fit_l0_weights. The package should contain everything needed to run fitting independently:
X_sparse(the calibration matrix)targets_dfinitial_weightshousehold_id_mappingcd_household_indicestarget_names
This is the same interface already used in the Kaggle-based calibration workflow (pickle dict), just produced automatically by the pipeline.
CLI changes
- Add
--build-onlyflag: stop after saving the package, skip fitting - Add
--package-pathflag: load a pre-built package and skip matrix construction
Why this matters
- Iteration speed: Tweak hyperparams without rebuilding the matrix
- Resilience: If the GPU fitting crashes, you don't redo the matrix
- Flexibility: The package can be downloaded and used in Kaggle, Modal, or locally
- Modal runner:
remote_calibration_runner.pycan split into CPU (build) and GPU (fit) functions, with the package as the handoff artifact
Related
- Add calibration log to Github build artifacts #310 (calibration log as build artifact — the log is a natural output of the fit stage)
- PR Add census-block-first calibration pipeline (from PR #516) #531 (census-block calibration pipeline, where this design was discussed)
Implementation notes
- ~20-30 lines added to
run_calibration()inunified_calibration.py - Package format: pickle dict (matching existing Kaggle workflow) or npz + parquet
- The automated end-to-end pipeline remains the default;
--build-onlyis opt-in
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels