Skip to content

#122 Correction Package Refactor — Complete Redesign#133

Open
mmaclay wants to merge 16 commits into
mainfrom
122-refactor-correction-package-epic
Open

#122 Correction Package Refactor — Complete Redesign#133
mmaclay wants to merge 16 commits into
mainfrom
122-refactor-correction-package-epic

Conversation

@mmaclay
Copy link
Copy Markdown
Collaborator

@mmaclay mmaclay commented Mar 13, 2026

Copilot Summary:

This pull request introduces several improvements and refactors to the curryer.correction package, focusing on clarifying the public API, enhancing configuration and validation, and improving the geolocation error statistics pipeline. The changes include a comprehensive docstring and explicit exports in the package __init__.py, new and updated configuration dataclasses, a new Pydantic-based RegridConfig for grid handling, and a significant refactor and rename of the geolocation error statistics module to be more generic and mission-agnostic.

API and Documentation Improvements

  • Added a comprehensive module-level docstring to curryer/correction/__init__.py, describing the purpose, sub-module layout, and public API of the correction package. Explicitly re-exported key classes and functions at the package level and updated the __all__ list for clear public API exposure.

Configuration and Data Structures

  • Introduced PSFSamplingConfig and RegridConfig dataclasses in data_structures.py:
    • PSFSamplingConfig provides configuration for PSF sampling during image matching.
    • RegridConfig is a new Pydantic model for specifying output grid parameters, with field and model validators for consistency and error checking. [1] [2]
  • Updated imports to include Pydantic and removed unused Protocols.

Data I/O and Validation

  • Refactored dataio.py to clarify its role as a provider of validation helpers and S3 utilities, removed unused Protocol interface definitions, and updated documentation for clarity. [1] [2] [3]
  • Improved the example for science output validation to use a more generic CSV loading example.

Geolocation Error Statistics Refactor

  • Renamed geolocation_error_stats.py to error_stats.py and refactored it to be mission-agnostic:
    • Updated docstrings and configuration class to clarify that pass/fail evaluation is not part of the stats processor.
    • Added compute_percent_below for custom threshold queries.
    • Made ErrorStatsConfig the new configuration class, with improved variable name mapping and documentation.
    • Used a single source of truth for the WGS84 Earth radius via curryer.compute.constants.
    • Improved parameter and return documentation throughout. [1] [2] [3]

These changes collectively make the correction pipeline more modular, user-friendly, and robust, while clarifying the public API and improving configuration validation.

References:
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]


The is the PR for epic #122 - and it will track changes from the sub-issues.

Correction Package Refactor — Complete Redesign

Top-level tracking issue for a complete refactor and architectural redesign of curryer.correction.

Goals

  • Move to Pydantic config models; remove ad-hoc dataclasses
  • Remove user-supplied loader protocols; internalize data loading as config-driven
  • Split correction.py monolith into focused, testable modules
  • Add key deliverables: verification mode & GCP regridding tool
  • Lay groundwork for future CurryerConfig unifying all of curryer

PR Groupings & Sub-Issues

PR 1: Foundation — Naming & Module Split

  • Rename GeolocationConfig disambiguations & error_stats.py
  • Break up correction.py monolith into focused modules

PR 2: Config Redesign — Pydantic, Internalize Loading, Search Strategies

  • Introduce Pydantic CorrectionConfig with typed sub-models
  • Remove loader protocols — internalize data loading
  • Add SearchStrategy enum for deterministic parameter sweeps

PR 3: New Features — Verification & GCP Regridding

  • Add verification module
  • Add gcp_regrid module

PR 4: Test Cleanup & Future Planning

  • Restructure test_correction/
  • [ ] Tracking: extend pydantic config pattern to all of curryer

Dependency Graph

PR 1 (Foundation)
├─ Renames (unblocks everything)
└─ Module split (unblocks everything)
    │
    ▼
PR 2 (Config & Loading)
├─ Pydantic config (unblocks loader removal + search strategies)
├─ Remove protocols (depends on pydantic config)
└─ Search strategies (depends on pydantic config)
    │
    ▼
PR 3 (Features) — can start in parallel with PR 2 by stubbing config
├─ Verification module
└─ GCP regridding module
    │
    ▼
PR 4 (Cleanup)
├─ Test restructuring
└─ Future config tracking

Context

  • No external users of the correction package yet — breaking changes are acceptable
  • Pydantic introduced here will serve as the prototype for a future curryer-wide config system
  • CLARREO-specific code stays in tests/examples only; main library must be mission-agnostic

@mmaclay mmaclay linked an issue Mar 13, 2026 that may be closed by this pull request
9 tasks
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 81.77384% with 1009 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.32%. Comparing base (996cc37) to head (539ce06).

Files with missing lines Patch % Lines
curryer/correction/verification.py 60.08% 162 Missing and 24 partials ⚠️
curryer/correction/pipeline.py 39.86% 153 Missing and 19 partials ⚠️
curryer/correction/image_io.py 59.52% 112 Missing and 24 partials ⚠️
curryer/correction/parameters.py 61.79% 76 Missing and 26 partials ⚠️
curryer/correction/regrid.py 67.52% 64 Missing and 12 partials ⚠️
curryer/correction/config.py 81.15% 34 Missing and 25 partials ⚠️
curryer/correction/kernel_ops.py 51.57% 39 Missing and 7 partials ⚠️
tests/test_correction/clarreo/_pipeline_helpers.py 66.66% 39 Missing and 3 partials ⚠️
tests/test_correction/_synthetic_helpers.py 61.25% 30 Missing and 1 partial ⚠️
tests/test_correction/test_pipeline.py 81.34% 25 Missing ⚠️
... and 17 more

❌ Your project check has failed because the head coverage (87.11%) is below the target coverage (95.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #133      +/-   ##
==========================================
+ Coverage   73.62%   77.32%   +3.69%     
==========================================
  Files          67       94      +27     
  Lines       10655    13432    +2777     
  Branches     1204     1407     +203     
==========================================
+ Hits         7845    10386    +2541     
- Misses       2343     2525     +182     
- Partials      467      521      +54     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI and others added 8 commits April 9, 2026 11:14
…py (#132)

* Initial plan
* Rename GeolocationConfig classes and geolocation_error_stats.py
* Convert Google-style docstrings to numpy style in error_stats.py
* ruff
* breaking up modules

* correction shim for backwards compatibility

* configuration refactor

* descope pipeline to shorten

* show exports in init

* type fix
…135)

* add pydantic

* Introduce Pydantic models for configuration with typed sub-models

* Add unit tests for Pydantic-based correction configuration models

* Refactor: Remove legacy alias handling and update parameter access in configuration
* Refactor dataio module: Update documentation and rename data-loader protocol types

* Refactor CorrectionConfig: Simplify data loading process and enhance validation

* Refactor clarreo_data_loaders: Remove loader protocols and transition to config-driven data loading

* Refactor config: Introduce DataConfig for config-driven data loading and remove legacy loader protocols

* Refactor correction.py: Remove legacy loader functions and integrate DataConfig for improved data handling

* Refactor dataio.py: Remove loader protocols and update documentation for validation helpers

* Refactor image_match.py: Remove ImageMatchingFunc protocol and update output validation

* Refactor image_match.py: Remove ImageMatchingFunc protocol and update output validation

* Refactor pipeline.py: Remove mission-specific loader functions and implement internal file loading for telemetry and science data

* Refactor test_config.py: Add tests for DataConfig and remove legacy loader checks

* Refactor test_correction.py: Replace loader functions with DataConfig for file-based loading

* Refactor test_pairing.py: Remove redundant validation tests for pairing output

* Add CLARREO preprocessing script for telemetry and science data

* Refactor clarreo_data_loaders.py: Remove GCPLoader protocol and implement telemetry and science data loading functions

* load instead of open

* Refactor config.py: Remove GCP-related fields and clarify time field documentation

* Refactor correction.py: Add _resolve_gcp_pairs function to enhance data processing

* Refactor pipeline.py: Add _resolve_gcp_pairs function for GCP key validation

* Refactor test_config.py: Remove GCP-related assertions and simplify DataConfig tests

* Refactor test_correction.py: Remove 'corrected_timestamp' field from DataConfig instances
…137)

* Add SearchStrategy enum for deterministic parameter sweeps

* Add SearchStrategy enum and validation for correction analysis

* Add support for multiple search strategies in parameter set generation

* Add unit tests for parameter-set generation strategies

* Refactor search strategy validation to improve error messaging consistency

* Remove unused conversion functions for sigma to radians and seconds

* Refactor parameter set comparison in tests for consistency and clarity

* check parameter datatype

* Fix formatting in parameters.py for improved readability

* Update curryer/correction/parameters.py

* Update curryer/correction/parameters.py

* Update tests/test_correction/test_parameters.py

* Update tests/test_correction/test_parameters.py

* Add max_grid_sets parameter to limit GRID_SEARCH materialization

* Add search strategy enum for deterministic parameter sweeps and enforce max_grid_sets limit

* Add tests for max_grid_sets enforcement in GRID_SEARCH strategy
* cherry-pick regridding files from MM-104

* add regrid module and update imports in __init__.py

* add RegridConfig model with validation for GCP chip regridding parameters

* deprecate MATLAB file loading utilities in image_match; redirect to curryer.correction.image_io

* update import for load_image_grid_from_mat to use image_io module

* update import for integrated_image_match to use image_io module

* apply rng seed

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* remove cubic

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* check interpolation method

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* user helper for check point in cell

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* refactor test cases to use default_rng for random data generation

* fix: add missing newline for code readability

* test gcp NetCDF and HDF5 support

* refactor: remove unused tolerance variable in cell check

* Update curryer/correction/data_structures.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* vectorize regard

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* relax tolerance on grid boundaries

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* valid pixels only

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* refactor: improve documentation and streamline code in GCP regridding

* fix: update history attribute formatting in image_io.py

* docs: add gcp_regridding.md to contents

* fix: improve boundary handling in regrid.py to prevent NaN fill values

* feat: add example scripts for regridding GCP chips to NetCDF format

* fix: update variable names in image_io.py for consistency with GCP standards

* fix: enhance variable loading in image_io.py for compatibility with multiple naming conventions

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…rection loop (#138)

* add verification module for geolocation compliance checks

* add verification module for geolocation compliance checks

* add unit tests for verification module and its components

* handle HDF5 file loading in image_io.py when HDF4 library is unavailable

* improve HDF file loading in image_io.py to handle errors and fallback between HDF4 and HDF5

* add validation function and update verify parameters to include optional work_dir

* update verification tests to include work_dir parameter in verify function calls

* add minimal example for production verification workflow on geolocated observations

* add example script for weekly verification of geolocated observations

* refactor verification module to enhance key attribute handling and improve dataset aggregation logic

* refactor verification module to enhance key attribute handling and improve dataset aggregation logic

* enhance verification module to improve JSON serialization and summary output formatting
…for CLARREO (#146)

* remove tests

* add integration tests package

* refactor pytest configuration for test_correction to improve maintainability

* refactor test_dataio.py to use pytest and improve test structure for maintainability

* refactor test_image_match.py to use pytest and improve test structure for maintainability

* refactor test_pairing.py to use pytest and improve test structure for maintainability

* add __init__.py for CLARREO-specific correction tests

* add image-matching and pipeline runner helpers for CLARREO integration tests

* add synthetic data generation helpers for testing pipeline and e2e scenarios

* add unit tests for kernel operations in test_kernel_ops.py

* add tests for pipeline functions in test_pipeline.py

* add tests for results_io functions in test_results_io.py

* copy config

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* copy config

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/test_correction/clarreo/_image_match_helpers.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fix root_dir duplication and clarreo_cfg mutation in test files

Agent-Logs-Url: https://github.com/lasp/curryer/sessions/6ba9aa43-35f1-476a-a4dc-ad8eae24511b

Co-authored-by: mmaclay <21048535+mmaclay@users.noreply.github.com>

* Strengthen test_generate_clarreo_config_json and fix JSON round-trip bugs in load_config_from_json

Agent-Logs-Url: https://github.com/lasp/curryer/sessions/bbf5eb75-f1b0-46f8-af24-f2758fd30858

Co-authored-by: mmaclay <21048535+mmaclay@users.noreply.github.com>

* Refactor assertions in test_clarreo_config.py for clarity and maintainability; streamline field retrieval in config.py

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
@mmaclay mmaclay force-pushed the 122-refactor-correction-package-epic branch from 0a09d1c to d6bb926 Compare April 9, 2026 17:15
mmaclay and others added 2 commits April 10, 2026 14:57
* phase 1 api correction

* phase 2 api correction

* phase 3 api correction

* phase 4 api correction

* phase 5 api correction

* add test_error_stats

* Update curryer/correction/config.py

* Update curryer/correction/pipeline.py

* Update curryer/correction/pipeline.py

* Fix Windows temp file locking in _download_from_s3 and remove private _log_pairing_summary from package exports

* Update tests/test_correction/test_verification.py

* Fix inputs type hint in run_correction to use Sequence union type

* ruff

* Enhance documentation for dataio and io modules; clarify S3 URI support in pipeline
* remove unnecessary clarreo scripts

* Add correction examples including CLARREO configuration files and scripts

* Update clarreo_config and clarreo_data_loaders to clarify test fixture purpose and provide user-facing documentation references

* Add example configuration JSON for correction settings and geolocation

* Add SPICE boresight and rotation matrix retrieval for error statistics

* remove duplicate example verification

* rename regrid_gcp_chips script for improved organization

* point to correction user guide

* Enhance pairing function to support both .mat and NetCDF files for L1A observations and GCP chips

* Add function to load ImageGrid from NetCDF and refactor related code

* refactor: replace load_gcp_chip_from_netcdf with load_image_grid_from_netcdf in tests

* update example script to use load_image_grid_from_netcdf for GCP loading

* Add correction package user guide with workflows and configuration details

* Update README.md to enhance clarity and detail in correction package examples

* Refactor image matching helper functions with image io updates

* Fix import path for load_image_grid_from_mat in _pipeline_helpers.py

* Fix exception type in image loading test to expect KeyError for missing band_data

* Add tests for load_named_image_grid and handle NetCDF loading edge cases

* Refactor image loading functions to improve clarity and consistency

* Refactor image loading functions for improved clarity and consistency

* Refactor image loading functions for improved clarity and consistency

* Refactor image I/O imports and update regrid script to use save_image_grid function

* Update examples/correction/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update curryer/correction/image_io.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update curryer/correction/image_io.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update curryer/correction/verification.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update examples/correction/clarreo_config.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: update gcp_regridding.md to use load_image_grid instead of load_image_grid_from_netcdf

Agent-Logs-Url: https://github.com/lasp/curryer/sessions/701a0618-9092-41b0-b37b-5e0bd2c50e42

Co-authored-by: mmaclay <21048535+mmaclay@users.noreply.github.com>

* Clarify preprocessing instructions and telemetry file requirements in user guide

* Resolve file paths for image loading functions to support S3 URIs

* Clarify error handling for image matching override in verification.py

* Update examples/correction/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: update README.md example to use load_image_grid instead of load_image_grid_from_netcdf

Agent-Logs-Url: https://github.com/lasp/curryer/sessions/60284a19-74c8-45aa-9ab1-56668a149466

Co-authored-by: mmaclay <21048535+mmaclay@users.noreply.github.com>

* Resolve S3 URIs to local paths in image I/O functions

* Enhance image matching documentation and mid-frame epoch handling in pipeline.py

* Refactor image grid function names for clarity in Netcdf operations

* Refactor verification tests to raise ValueError for missing los_file/psf_file and add spatial pairing function for GCP chips

* Enhance one-off compliance check documentation with in-memory geolocated data example

* Refactor correction pipeline to improve module dependencies and maintainability

- Moved image matching and aggregation functions from pipeline.py to verification.py to align with the correct dependency direction.
- Updated imports in pipeline.py to reflect the new structure.
- Added backward compatibility for existing function calls in pipeline.py.

* Refactor verification imports and add tests for geolocated data handling

* Add function to pair geolocated datasets with GCP files

* Add geolocated_to_image_grid function to convert geolocated xr.Dataset to ImageGrid

* Add geolocation functions to pipeline and module exports

* Refactor correction module to include image I/O and verification functionalities

* Refactor imports in __init__.py to streamline pipeline and verification modules

* Update Correction Package User Guide and README for clarity and consistency

* Rename spice_path_handling.md for improved clarity and organization

* Update user guide, moving GCP chip regridding instructions and path shortening

* remote outdated reference

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* s3 path fix

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix: preserve s3:// URIs in gcp_pairs mode by using str() instead of Path(str())

Agent-Logs-Url: https://github.com/lasp/curryer/sessions/df7b0e5b-d6cd-427c-8d19-1f98dfa92b04

Co-authored-by: mmaclay <21048535+mmaclay@users.noreply.github.com>

* fix: add calibration paths to clarreo_config.json and support them in load_config_from_json; remove outdated NotImplementedError troubleshooting entry

Agent-Logs-Url: https://github.com/lasp/curryer/sessions/f3fd0742-0ca9-4c12-9d9e-a56dc833d0ae

Co-authored-by: mmaclay <21048535+mmaclay@users.noreply.github.com>

* Refactor error message formatting for unrecognized file extensions in image_io.py

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
@mmaclay mmaclay self-assigned this May 5, 2026
…update project version to 0.4.0b1 in pyproject.toml
@mmaclay mmaclay marked this pull request as ready for review May 5, 2026 14:46
Copilot AI review requested due to automatic review settings May 5, 2026 14:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes a broad refactor/redesign of curryer.correction, introducing a clearer public API surface, new configuration/data models (including Pydantic), expanded NetCDF result/structured-result support, and new documentation/examples/tests around verification and GCP-chip regridding.

Changes:

  • Added/updated correction package modules for structured results, NetCDF I/O, unified path resolution (incl. optional S3), pairing improvements, and PSF sampling config separation.
  • Added substantial pytest coverage for pipeline helpers, regridding, IO utilities, kernel ops, and CLARREO integration fixtures/helpers.
  • Added user-facing docs/examples for correction/verification and GCP chip regridding; bumped package version and dependencies (Pydantic v2).

Reviewed changes

Copilot reviewed 57 out of 60 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_correction/test_results_io.py New unit tests for NetCDF structure + checkpoint save/load/cleanup.
tests/test_correction/test_regrid.py New unit tests for regridding primitives and end-to-end regrid flow.
tests/test_correction/test_pipeline.py New tests for pipeline helper functions and position-column extraction behavior.
tests/test_correction/test_pairing.py Migrates pairing tests from unittest to pytest and simplifies fixtures/helpers.
tests/test_correction/test_kernel_ops.py New tests for offset application and calibration loading; adds extra-marked dynamic kernel test.
tests/test_correction/test_io.py New tests for resolve_path including S3 download behaviors and cleanup.
tests/test_correction/conftest.py Simplifies fixtures and centralizes sys.path manipulation for CLARREO test helpers.
tests/test_correction/clarreo/test_clarreo_e2e.py Adds CLARREO end-to-end integration tests covering upstream/downstream workflows.
tests/test_correction/clarreo/test_clarreo_dataio.py Adds AWS-credential-gated integration tests for S3 NetCDF discovery helpers.
tests/test_correction/clarreo/test_clarreo_config.py Adds tests for CLARREO config generation + JSON round-trip validation.
tests/test_correction/clarreo/conftest.py Adds session fixtures for CLARREO test data directories.
tests/test_correction/clarreo/clarreo_data_loaders.py Adds CLARREO preprocessing helpers as test fixtures; keeps loader-shim behavior.
tests/test_correction/clarreo/clarreo_config.py Updates CLARREO test fixture config factory docs/behavior to match config-driven loading.
tests/test_correction/clarreo/_pipeline_helpers.py Adds reusable upstream/downstream pipeline runners for integration tests.
tests/test_correction/clarreo/_image_match_helpers.py Adds discovery + error-variation helpers for deterministic image-match integration testing.
tests/test_correction/clarreo/init.py Marks CLARREO correction tests as a package.
tests/test_correction/clarreo_data_loaders.py Removes old top-level CLARREO loader module in favor of clarreo/ package.
tests/test_correction/_synthetic_helpers.py Adds synthetic pairing/matching + spacecraft-state generators for upstream testing.
tests/integration/init.py Adds integration test package marker.
pyproject.toml Bumps version to 0.4.0b1, updates authors, adds pydantic>=2.0, sets Beta classifier.
examples/correction/regrid_gcp_chips.py Adds a CLI-like batch regridding script for HDF GCP chips to NetCDF.
examples/correction/README.md Adds examples index and quickstart guidance for correction/verification/regridding.
examples/correction/example_run_correction.py Adds a correction-loop template script (intended dry-run when prerequisites missing).
examples/correction/example_config.json Adds a generic JSON template for correction configuration.
examples/correction/clarreo_config.py Adds a user-facing CLARREO config factory using public API models.
examples/correction/clarreo_config.json Adds a loadable CLARREO JSON config example.
docs/source/users.md Updates headings and adds correction-package entry-point documentation + links.
docs/source/spice_path_handling.md Adds detailed documentation for SPICE path-shortening strategies and configuration.
docs/source/gcp_regridding.md Adds end-user documentation for GCP chip regridding workflows and CLI usage.
docs/source/contents.rst Updates docs to include correction user guide and adjusts acknowledgement filename.
docs/source/changelog.md Adds 0.4.0b1 changelog entry summarizing the redesign and breaking changes.
curryer/correction/results.py Introduces structured Pydantic result models and summary-table builder for correction runs.
curryer/correction/results_io.py Adds config-driven NetCDF initialization, checkpointing, load/resume, and final save helpers.
curryer/correction/psf.py Switches motion convolution config typing from GeolocationConfig to PSFSamplingConfig.
curryer/correction/pairing.py Removes old Protocol interface, expands file pairing to .mat + NetCDF, adds in-memory dataset pairing helper.
curryer/correction/kernel_ops.py Centralizes offset application and kernel generation utilities with improved logging.
curryer/correction/io.py Adds unified local/S3 path resolution with temp-file tracking and atexit cleanup.
curryer/correction/image_match.py Removes Protocol + legacy mat-loading utilities; uses PSFSamplingConfig and keeps output validation.
curryer/correction/dataio.py Removes loader Protocols and clarifies module focus on validation + S3 utilities.
curryer/correction/data_structures.py Introduces PSFSamplingConfig rename and new Pydantic RegridConfig with validation.
curryer/correction/init.py Adds package-level docstring, explicit re-exports, and a curated __all__ public API.
Comments suppressed due to low confidence (1)

curryer/correction/data_structures.py:125

  • PSFSamplingConfig.motion_convolution_step_m is computed from the class default gcp_step_m (30.0) at definition time, so if a caller overrides gcp_step_m the motion_convolution_step_m default will not update accordingly. Consider making motion_convolution_step_m optional and computing it in post_init when not provided (or use a default_factory).
    gcp_step_m: float = 30.0
    motion_convolution_step_m: float = gcp_step_m / 20.0
    psf_lat_sample_dist_deg: float = 2.4397105613972e-05
    psf_lon_sample_dist_deg: float = 2.8737038710207e-05

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/source/gcp_regridding.md Outdated
Comment thread examples/correction/regrid_gcp_chips.py
Comment thread examples/correction/example_run_correction.py Outdated
Comment thread examples/correction/example_run_correction.py Outdated
mmaclay and others added 2 commits May 5, 2026 08:56
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot finished work on behalf of mmaclay May 5, 2026 15:04
@mmaclay mmaclay requested review from BStoneLASP and maxinelasp May 5, 2026 17:22
@mmaclay mmaclay requested a review from medley56 May 14, 2026 16:23
Copy link
Copy Markdown
Member

@medley56 medley56 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without really understanding all the algorithms, I mostly reviewed for style and factoring. My overarching question is which of these functions actually need to be exposed to users? If we're trying to reduce the amount of stuff users have to actually know about, it might be wise to only expose the API that you want them to actually use, rather than exporting everything in the nested correction/* module.

I'm approving. Feel free to re-request a review if you want me to look again.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a question of convention, but do you really want to export all the underlying functions and classes at the correction module level? I tend to shy away from this unless there is a clear user-facing API reason to do so. If this is the Curryer standard practice, then it's fine, but otherwise worth thinking about.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like an end-to-end test. It looks like it's testing each piece separately. No?


def _make_client(self, content: str = "downloaded content") -> MagicMock:
"""Return a mock S3 client that writes *content* to the target path."""
mock_client = MagicMock()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get better mocking of AWS resources using the moto library.

@@ -0,0 +1,580 @@
"""Tests for ``curryer.correction.results`` and Prompt-5 additions.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an LLM left a reference to your prompt in here. I'd recommend running a check over the changes to see if there are any other artifacts left over from prompting.

Comment thread pyproject.toml
[project]
name = "lasp-curryer"
version = "0.3.2"
version = "0.4.0b1"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the version you want to push?

from curryer.correction import correction
from curryer.correction.config import DataConfig

logger = logging.getLogger(__name__)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this logger being used and I'd recommend against logging in tests anyway, unless you are actually examining the content of the logs as part of the test.

import numpy as np
import pandas as pd

logger = logging.getLogger(__name__)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logger I see actually being used, but again, unless you actually care about logging contents as part of the test, I personally avoid logging in tests. You might want to see where else you're instantiating loggers in test modules to make sure you really want them.

Comment on lines +4 to +5
This module is a **test infrastructure helper**. It is not part of the
public API and is not intended as a user-facing example.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about this module. The top line says it's a fixture but these just look like functions? Examples?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually write this kind of thing as a pytest fixture that returns a function, but maybe that's overkill for these helper functions. Just be careful about import paths when using them. It's a little confusing/fragile if your tests are doing imports from helper modules that live in unexpected places.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it contains a lot of Configs but they don't live in config.py. Are these a substantially different domain? Should this be data_structure_config.py?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[EPIC] Correction Package Refactor / Redesign

4 participants