639 allow the user to define input and output file names#734
Open
639 allow the user to define input and output file names#734
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This feature allows users to input two YAML files, input_settings.yml, and results_settings.yml. The file input_settings.yml allows users to specify the name of their input files as well as the paths and names of the input folders. A function called configure_input_settings has been added that adds a dictionary of these names, merged with default names, to the setup dictionary.
The function load_dataframe(), which is called to open all files in GenX, has been altered to use DuckDB (per Greg's suggestion). DuckDB can open files of type CSV, Parquet, and JSON, all of which can also be compressed (i.e. .gz), so users can now have input files of any of those types.
The file results_settings.yml can contain the desired names of the results file. Names in the YAML file can be entered with or without a file extension. In genx_settings.yml, two new keys can be added: ResultsFileType and ResultsCompressionType, whose defaults are both "auto_detect". Both of those keys are used in the function "write_output_files()".
The function write_output_files() has been added to write_outputs.jl. It uses DuckDB to save files according to a specified file type, which can be .csv, .csv.gz, .parquet, .json, or .json.gz. If filetype is set to "auto_detect", it will detect if the file name contains an extension (if no extension is present, .csv is used). If a filetype is set to something (eg .parquet) but that extension is not present in the filename, the extension is added. A compression type can also be specified, these are .gz for CSV and JSON files, and -snappy and -zstd for Parquet files. The compression type can also be auto_detected.
The goal is for write_output_files to replace all instances in which CSV.write is currently used. This is a work in progress and is only present in some places in GenX at the moment.
An example file, 10_three_zones_define_input, contains the aforementioned YAML files.
Edit 9/11/24: Multistage inputs can now also be defined using input_settings.yml. The structure is a bit different (uses indentation to make a separate subdictionary for each input stage), see 6_three_zones_w_multistage for an example YAML file. Results multistage file names can also be changed using results_settings.yml, but the file structure is the same as in single stage. I deleted 10_three_zones_define_input as it's the exact same as 1_three_zones, but added input_settings.yml and results_settings.yml to 1_three_zones. The function write_output_files now replaces CSV.write() in almost all instances. Documentation has also been updated to reflect new capabilities.
Notes from GenX Meeting 9/12
Side note, not brought up in the meeting: the results files specific to multistage (capacities_multi_stage etc) have not been tested with write_output_file(), but the code is present and commented out.
What type of PR is this? (check all applicable)
Related Tickets & Documents
Issue #639
Checklist
How this can be tested
Working on writing test functions. For now, testing can be done by altering the input and results YAML files in example 10 and ensuring the expected results follow.
Post-approval checklist for GenX core developers
After the PR is approved