Skip to content

More CLI subsetting options#186

Draft
mattw-nws wants to merge 3 commits intoCIROH-UA:mainfrom
mattw-nws:flexible-cli-selection
Draft

More CLI subsetting options#186
mattw-nws wants to merge 3 commits intoCIROH-UA:mainfrom
mattw-nws:flexible-cli-selection

Conversation

@mattw-nws
Copy link
Copy Markdown
Contributor

@mattw-nws mattw-nws commented Nov 24, 2025

Additional Subset Selection Options

This is somewhat draft to gage interest/input. They target the CLI, but these options could be added to the GUI as well.

These options may not all make 100% sense by themselves, but I am working up to something; the goal is to cut a small, non-hydrologically-independent domain out and use https://github.com/mattw-nws/CopyCat for filling the boundary conditions. Integration code for that is coming a little later, but I have a PoC working here now.

Allows subset selection based on new criteria

  1. Specify multiple feature ID options to -i directly on the CLI
  2. Provide a file path to -i including a list of IDs, one per line
  3. Provide a lat,lon bounding box (pair of coordinates), selecting all catchments either intersecting or fully within the box (code is there to support an arbitrary polygon, e.g. political boundary or HUC at a later date)

How it was fixed / implemented

Basically extended existing options and functions. I believe the interaction with the existing options is sensible. Some additional documentation is probably needed.

Testing / Screenshots

python -m ngiab_data_cli -sfr --output_root=~/ngiab_test --output_name=KY20220726_in --start 2022-07-26 --end 2022-07-28 -i cat-814110 cat-814111
python -m ngiab_data_cli -sfr --output_root=~/ngiab_test --output_name=KY20220726 --start 2022-07-26 --end 2022-07-28 --bounds 36.82,-84.156 37.972,-82.199 --bounds-operation within

@JoshCu
Copy link
Copy Markdown
Member

JoshCu commented Mar 25, 2026

@mattw-nws Do you have any thoughts on how we could better arrange the cli generally? I'd like to add these options but am a little hesitant to add more things to the command line.

It worked ok when there was one model type and one forcing source, but it's starting to get to the point where --help is information overload.
I was thinking about redoing it and moving it to typer instead of argparse to add some interactivity and grouping.

I'll probably start with these updates:

  • group all the models under a --model arg
  • make dates interactive so if you don't add them it will prompt you if needed
  • group all these subset options
Then experiment with some of this stuff that I'm less sure about

I also find myself always adding -sfr and they're not very descriptive like that so I was thinking just having one main entrypoint that does everything, then separate commands for when you want to explicitly just do one step. we can make the all in one command track changes and only rerun steps when needed.
e.g.

# does all data preparation, subset, forcings, config generation
ngiab-prep gage-10154200 --start ... --end ... --model ...
# to just subset
ngiab-prep subset gage-10154200
# just forcings
ngiab-prep force gage-10154200 --start ... --end ...

I could add something like venv sourcing where you can have a working folder and then it can figure out some arguments based on whats already in the folder
e.g.

ngiab-prep gage-10154200 --start ... --end ... --model cfe
ngiab-prep source gage-10154200
ngiab-prep config --model lstm
ngiab-prep config --model dhbv2

To quickly swap between different models or forcings.

Some of that feels useful to me but I can't really tell until I try it for a long time and I probably don't use it the same way as the majority of people using this tool.
It would be great to get your perspective, even if it's just that I'm overthinking this and lots of options are fine :)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the ngiab_data_cli subsetting interface to support selecting multiple input features, reading feature IDs from a file, and selecting catchments via a lat/lon bounding box (with selectable spatial operation).

Changes:

  • Allow -i/--input_feature to accept multiple IDs and optionally treat a single argument as a file containing IDs.
  • Add --bounds and --bounds-operation to select catchments by bounding rectangle using intersects or within.
  • Update CLI validation and execution flow to pass multiple selected features into upstream/subset operations.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
modules/ngiab_data_cli/arguments.py Updates CLI flags to accept multiple feature IDs and adds bounds-based selection options.
modules/ngiab_data_cli/__main__.py Implements multi-feature parsing, bounds-to-catchments selection, and routes results into existing processing/subset logic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 32 to 36
def validate_input(args: argparse.Namespace) -> Tuple[str, str]:
"""Validate input arguments."""

feature_name = None
feature_names = []
output_folder = None
if args.bounds:
if args.gage:
# Mainly to avoid confusing error message when falling through into args.input_feature block
raise ValueError("Cannot use both --bounds and --gage options at the same time.")
Comment on lines +67 to +75
# look at the prefix for autodetection, if -g or -l is used then there is no prefix
if len(input_feature.split("-")) > 1:
prefix = input_feature.split("-")[0]
if prefix.lower() == "gage":
is_gage = True
elif prefix.lower() == "wb":
logging.warning("Waterbody IDs are no longer supported!")
logging.warning(f"Automatically converting {input_feature} to catid")
time.sleep(2)
if not args.output_name:
args.output_name = f"vpu-{args.vpu}"
validate_output_dir()
return args.vpu, args.output_name
Comment on lines 51 to +55
if args.input_feature:
input_feature = args.input_feature.replace("_", "-")

if args.gage and not input_feature.startswith("gage-"):
input_feature = "gage-" + input_feature

# look at the prefix for autodetection, if -g or -l is used then there is no prefix
if len(input_feature.split("-")) > 1:
prefix = input_feature.split("-")[0]
if prefix.lower() == "gage":
args.gage = True
elif prefix.lower() == "wb":
logging.warning("Waterbody IDs are no longer supported!")
logging.warning(f"Automatically converting {input_feature} to catid")
time.sleep(2)

# always add or replace the prefix with cat if it is not a lat lon or gage
if not args.latlon and not args.gage:
input_feature = "cat-" + input_feature.split("-")[-1]

if args.latlon and args.gage:
raise ValueError("Cannot use both --latlon and --gage options at the same time.")

if args.latlon:
validate_hydrofabric()
feature_name = get_cat_id_from_lat_lon(input_feature)
logging.info(f"Found {feature_name} from {input_feature}")
elif args.gage:
validate_hydrofabric()
feature_name = get_cat_from_gage_id(input_feature)
logging.info(f"Found {feature_name} from {input_feature}")
else:
feature_name = input_feature

if len(args.input_feature) == 1 and Path(args.input_feature[0]).is_file():
Comment on lines 22 to 31
group.add_argument(
"-i",
"--input_feature",
"--input_file",
type=str,
help="ID of feature to subset, providing a prefix will automatically convert to catid, \n e.g. cat-5173 or gage-01646500 or wb-1234",
nargs='+',
help="ID of feature to subset, providing a prefix will automatically convert to catid, \n e.g. cat-5173 or gage-01646500 or wb-1234. \
May include multiple space-separated IDs as arguments. \
If exactly one argument is provided and the argument is an existing file, the file will be read and each line must contain a feature ID.",
)
"--bounds",
type=str,
nargs=2,
help="Two lat,lon corners defining a bounding rectangle inside which to select features"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants