Parameterised workflow
This document provides a template for running HydroBOT from a parameters file, as we might do when batch processing. As such, it typically wouldn’t be run through a notebook, but be called with Rscript
. That sort of setup can go a lot of different directions depending on use case, and we assume a user would be familiar with shell scripting and the particular idiosyncracies of their relevant batching system. Here, we demonstrate how to set up the parameter file and use it, and leave it to the user to build the script that gets called with Rscript
from the command line or as part of an external process.
We have the ability to have a default params file, a second params file that tweaks those defaults, as well as include params in Quarto header yaml. These different options are all used here.
Load the package
Structure of params files and arguments
The run_hydrobot_params
function takes four arguments: yamlpath
, which is a path to a yaml params file, passed_args
which can come from the command line, list_args
, and defaults
, which is another yaml file. This two-yaml approach lets us set most of the params in common across all runs, and only modify a subset with the yamlpath
file or passed_args
.
In all cases, the arguments end up in a list with two top-level items: ewr
and aggregation
, within which items can be added with names matching the arguments to [prep_run_save_ewrs()] and [read_and_agg()], respectively. This gives full control over those functions.
The package comes with a set of default parameters in system.file('yml/default_params.yml', package = 'HydroBOT')
. Users can (should) however create their own default yaml params file to set a standard set of defaults for a given project. See this file for basic structure.
The params.yml
file (or any other name, passed to yamlpath
) and passed_args
and list_args
then can be used to modify the default values. The idea is only a small subset of those defaults would be modified for a particular run.
In general, it is best to specify everything in terms of characters, logicals, or NULL. If there is a situation where that isn’t possible (bespoke spatial data, for example), it is possible to specify the aggsequence
and funsequence
with an R script. To do that, change the aggregation_def
entry of the aggregation
list to the path to that R script. For an example, see system.file('yml/params.R', package = 'HydroBOT')
.
Finally, [run_hydrobot_params()] ingests paths to these files (or passed command line or lists), turns their params into R arguments, and runs HydroBOT.
The arguments overwrite each other, so list_args
has highest precedence, followed by passed_args
, yamlpath
, and finally defaults
.
At present we do not provide yaml param options for the comparer. This is possible, but the possibilities are a bit too wide open. It is likely the user will want to explore the output, rather than generate parameterised output, though that may change in future.
Parameters
This section provides a look at the parameters being set in the various params files or passed in.
There are a number of parameters to set, mirroring those set in the notebook-driven runs of HydroBOT, e.g. running while saving steps.
Here, we provide example yaml that may appear in the files at defaults
or yamlpath
.
Additional parameters
Specify the aggregation sequence in R and pass the path to that file.
aggregation:
# aggregation sequences (need to be defined in R)
aggregation_def: 'toolkit_project/agg_params.R'
Directories
Input and output directories
ewr:
# Outer directory for scenario
output_parent_dir: 'toolkit_project'
# Preexisting data
# Hydrographs (expected to exist already)
hydro_dir: NULL
aggregation:
# outputs of aggregator
savepath: 'path/to/aggs'
Normally output_parent_dir
should point somewhere external (though keeping it inside or alongside the hydrograph data is a good idea.).
Setting the output directories to NULL
expects (in the case of hydro_dir) or builds (for savepath
) a standard toolkit directory structure, with output_parent_dir
as the outer directory, holding hydrographs
, aggregator_output
, and module_output
subdirectories.
Module arguments
Currently, just the EWR tool. Any argument in [prep_run_save_ewrs()] can be passed. Some examples are
ewr:
# Model type
model_format: 'IQQM - netcdf'
# output and return
outputType:
- summary
returnType: none
Aggregation settings
Any arguments to read_and_agg
. Some examples are
aggregation:
# What to aggregate
type: achievement
# Aggregation settings
groupers: scenario
aggCols: ewr_achieved
namehistory: FALSE
keepAllPolys: TRUE
Run HydroBOT
These examples are set not to evaluate in normal use, but show different ways of running the parameters.
This runs the toolkit using a yaml parameter file that modifies the default provided with HydroBOT.
run_hydrobot_params(yamlpath = file.path("workflows", "params.yml"))
! Unmatched links in causal network
• 11 from env_obj to Specific_goal
! Unmatched links in causal network
• 7 from Objective to target_5_year_2024
Passing arguments as text is tricky to get the yaml right for more than one argument, but it can be useful for command-line use, for example. Here, we demonstrate changing the output_parent_dir
and the namehistory
, noting that the number of spaces and newlines is critical to get it to work. In practice, this would need some tweaking to use with Rscript
and extract the string from commandArgs()
.
run_hydrobot_params(
yamlpath = file.path("workflows", "params.yml"),
passed_args = "ewr:\n output_parent_dir: 'hydrobot_scenarios'\naggregation:\n namehistory: TRUE"
)
We can also pass arguments in a list from R, which is a bit easier syntax. Here, we use it to only run a subset of the scenarios and put the outputs in the same directory, as well as specify aggregation with an R file.
run_hydrobot_params(list_args = list(
ewr = list(
output_parent_dir = "hydrobot_scenarios/hydrographs/base",
hydro_dir = "hydrobot_scenarios/hydrographs/base"
),
aggregation = list(
aggregation_def = "workflows/params.R",
auto_ewr_PU = TRUE
)
))
And finally, if the params are included in the parameters section of a Quarto notebook, they should get parsed. Quarto with R puts these in a list called params
, so we could just pass that. Unfortunately, the params
list in quarto isn’t full-featured yaml, and can’t do nested lists and does not currently work, but may work with !r
syntax in Rmarkdown. This is likely to not be very useful.
run_hydrobot_params(list_args = params)
Replication
The prep_run_save_ewrs
and read_and_agg
functions save metadata yaml files that are fully-specified parameters files. Thus, to replicate runs, we can run from the final yaml (after the aggregator), as it has all preceding steps.
run_hydrobot_params(yamlpath = "hydrobot_scenarios/aggregator_output/agg_metadata.yml")