More CLI subsetting options#186
Conversation
|
@mattw-nws Do you have any thoughts on how we could better arrange the cli generally? I'd like to add these options but am a little hesitant to add more things to the command line. It worked ok when there was one model type and one forcing source, but it's starting to get to the point where --help is information overload. I'll probably start with these updates:
Then experiment with some of this stuff that I'm less sure aboutI also find myself always adding -sfr and they're not very descriptive like that so I was thinking just having one main entrypoint that does everything, then separate commands for when you want to explicitly just do one step. we can make the all in one command track changes and only rerun steps when needed. # does all data preparation, subset, forcings, config generation
ngiab-prep gage-10154200 --start ... --end ... --model ...
# to just subset
ngiab-prep subset gage-10154200
# just forcings
ngiab-prep force gage-10154200 --start ... --end ...I could add something like venv sourcing where you can have a working folder and then it can figure out some arguments based on whats already in the folder ngiab-prep gage-10154200 --start ... --end ... --model cfe
ngiab-prep source gage-10154200
ngiab-prep config --model lstm
ngiab-prep config --model dhbv2To quickly swap between different models or forcings. Some of that feels useful to me but I can't really tell until I try it for a long time and I probably don't use it the same way as the majority of people using this tool. |
There was a problem hiding this comment.
Pull request overview
This PR extends the ngiab_data_cli subsetting interface to support selecting multiple input features, reading feature IDs from a file, and selecting catchments via a lat/lon bounding box (with selectable spatial operation).
Changes:
- Allow
-i/--input_featureto accept multiple IDs and optionally treat a single argument as a file containing IDs. - Add
--boundsand--bounds-operationto select catchments by bounding rectangle usingintersectsorwithin. - Update CLI validation and execution flow to pass multiple selected features into upstream/subset operations.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
modules/ngiab_data_cli/arguments.py |
Updates CLI flags to accept multiple feature IDs and adds bounds-based selection options. |
modules/ngiab_data_cli/__main__.py |
Implements multi-feature parsing, bounds-to-catchments selection, and routes results into existing processing/subset logic. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def validate_input(args: argparse.Namespace) -> Tuple[str, str]: | ||
| """Validate input arguments.""" | ||
|
|
||
| feature_name = None | ||
| feature_names = [] | ||
| output_folder = None |
| if args.bounds: | ||
| if args.gage: | ||
| # Mainly to avoid confusing error message when falling through into args.input_feature block | ||
| raise ValueError("Cannot use both --bounds and --gage options at the same time.") |
| # look at the prefix for autodetection, if -g or -l is used then there is no prefix | ||
| if len(input_feature.split("-")) > 1: | ||
| prefix = input_feature.split("-")[0] | ||
| if prefix.lower() == "gage": | ||
| is_gage = True | ||
| elif prefix.lower() == "wb": | ||
| logging.warning("Waterbody IDs are no longer supported!") | ||
| logging.warning(f"Automatically converting {input_feature} to catid") | ||
| time.sleep(2) |
| if not args.output_name: | ||
| args.output_name = f"vpu-{args.vpu}" | ||
| validate_output_dir() | ||
| return args.vpu, args.output_name |
| if args.input_feature: | ||
| input_feature = args.input_feature.replace("_", "-") | ||
|
|
||
| if args.gage and not input_feature.startswith("gage-"): | ||
| input_feature = "gage-" + input_feature | ||
|
|
||
| # look at the prefix for autodetection, if -g or -l is used then there is no prefix | ||
| if len(input_feature.split("-")) > 1: | ||
| prefix = input_feature.split("-")[0] | ||
| if prefix.lower() == "gage": | ||
| args.gage = True | ||
| elif prefix.lower() == "wb": | ||
| logging.warning("Waterbody IDs are no longer supported!") | ||
| logging.warning(f"Automatically converting {input_feature} to catid") | ||
| time.sleep(2) | ||
|
|
||
| # always add or replace the prefix with cat if it is not a lat lon or gage | ||
| if not args.latlon and not args.gage: | ||
| input_feature = "cat-" + input_feature.split("-")[-1] | ||
|
|
||
| if args.latlon and args.gage: | ||
| raise ValueError("Cannot use both --latlon and --gage options at the same time.") | ||
|
|
||
| if args.latlon: | ||
| validate_hydrofabric() | ||
| feature_name = get_cat_id_from_lat_lon(input_feature) | ||
| logging.info(f"Found {feature_name} from {input_feature}") | ||
| elif args.gage: | ||
| validate_hydrofabric() | ||
| feature_name = get_cat_from_gage_id(input_feature) | ||
| logging.info(f"Found {feature_name} from {input_feature}") | ||
| else: | ||
| feature_name = input_feature | ||
|
|
||
| if len(args.input_feature) == 1 and Path(args.input_feature[0]).is_file(): |
| group.add_argument( | ||
| "-i", | ||
| "--input_feature", | ||
| "--input_file", | ||
| type=str, | ||
| help="ID of feature to subset, providing a prefix will automatically convert to catid, \n e.g. cat-5173 or gage-01646500 or wb-1234", | ||
| nargs='+', | ||
| help="ID of feature to subset, providing a prefix will automatically convert to catid, \n e.g. cat-5173 or gage-01646500 or wb-1234. \ | ||
| May include multiple space-separated IDs as arguments. \ | ||
| If exactly one argument is provided and the argument is an existing file, the file will be read and each line must contain a feature ID.", | ||
| ) |
| "--bounds", | ||
| type=str, | ||
| nargs=2, | ||
| help="Two lat,lon corners defining a bounding rectangle inside which to select features" |
Additional Subset Selection Options
This is somewhat draft to gage interest/input. They target the CLI, but these options could be added to the GUI as well.
These options may not all make 100% sense by themselves, but I am working up to something; the goal is to cut a small, non-hydrologically-independent domain out and use https://github.com/mattw-nws/CopyCat for filling the boundary conditions. Integration code for that is coming a little later, but I have a PoC working here now.
Allows subset selection based on new criteria
-idirectly on the CLI-iincluding a list of IDs, one per lineHow it was fixed / implemented
Basically extended existing options and functions. I believe the interaction with the existing options is sensible. Some additional documentation is probably needed.
Testing / Screenshots