WERP_toolkit_demo

This document provides a template for running the toolkit from a parameters file, as we might do when batch processing. As such, it typically wouldn’t be run through a notebook, but be called with Rscript. That sort of setup could go a lot of different directions depending on use case, so for now we’ll just demonstrate how to set up the parameter file and use it, and leave it to the user to build the script that gets called with Rscript from the command line or as part of an external process.

It is also possible to put the parameters in the yaml header of a quarto notebook (which we also do here, as an example. We provide examples of using the parameterised system several different ways below, including from external params.yml files, passed parameters, and the list Quarto generates in parameterised notebooks.

Load the package

library(werptoolkitr)

Structure of params files and arguments

The run_toolkit_params function takes three arguments: yamlpath, which is a path to a yaml params file, passed_args which can come from the command line, and defaults, which is another yaml file, and lets us set most of the params, and only modify a subset with the yamlpath file and passed_args.

The package comes with a set of default parameters in system.file('yml/default_params.yml', package = 'werptoolkitr'). Users can however create their own default yaml params file to set a standard set of defaults for a given project. See this file for available parameters.

The params.yml file (or any other name, passed to yamlpath) and passed_args and list_args then can be used to modify the default values. The idea is only a small subset of those defaults would be modified for a particular run.

The params.R file (pointed to with the yaml parameter aggregation_def) is necessary because aggregation specification needs R objects or syntax.

Finally, werptoolkitr::param_runner() ingests paths to these files (or passed command line or lists), turns their params into R arguments, and runs the toolkit.

The arguments overwrite each other, so list_args has highest precedence, followed by passed_args, yamlpath, and finally defaults.

Note- no comparer

At present we do not provide yaml param options for the comparer. This is possible in future, but the possibilities are a bit too wide open at present. It is likely the user will want to explore the output, rather than generate parameterised output, at least until some specific uses are settled on.

Parameters

This section provides a look at the parameters being set in the various params files or passed in.

There are a number of parameters to set, mirroring those set in the notebook-driven runs of the toolkit, e.g. running while saving steps.

Here, we provide example yaml that may appear in the files at defaults or yamlpath .

Additional parameters

Specify the aggregation sequence in R and pass the path to that file.

# aggregation sequences (need to be defined in R)
aggregation_def: 'toolkit_project/agg_params.R'

Directories

Input and output directories

# Outer directory for scenario
scenario_dir: 'toolkit_project'

# Preexisting data
# Hydrographs (expected to exist already)
hydro_dir: NULL

# Generated data
# EWR outputs (will be created here in controller, read from here in aggregator)
ewr_results: NULL

# outputs of aggregator. There may be multiple modules
# NULL doesn't save it, but holds in memory.
agg_results: NULL

Normally scenario_dir should point somewhere external (though keeping it inside or alongside the hydrograph data is a good idea.). But here, I’m generating test data, so I’m keeping it in the repo.

Setting the output directories to NULL builds a standard toolkit directory structure, with scenario_dir as the outer directory, holding hydrographs, aggregator_output, and module_output subdirectories.

Module arguments

Currently, just the EWR tool. More could be exposed here, but this is typically all we need.

# Model type
model_format: 'IQQM - NSW 10,000 years'

# Climate
climate: 'Standard - 1911 to 2018 climate categorisation'

# output and return
outputType:
  - summary

returnType: none

Aggregation settings

These are additional arguments to read_and_agg.

# What to aggregate
aggType: summary

# Aggregation settings
agg_groups: scenario
agg_var: ewr_achieved
aggReturn: FALSE
namehistory: FALSE
keepAllPolys: TRUE

Run the toolkit

These actually run the toolkit, so they are set not to evaluate in normal use- only one of the notebooks for this website actually runs the controller- the full toolkit with saving steps.

run_toolkit_params(yamlpath = file.path('full_toolkit', 'params.yml'))

Passing arguments as text is tricky to do for more than one argument, but it would be useful for command-line use, for example. Here, we demonstrate changing the climate sequence. In practice, this would need some tweaking to use with Rscript and extract the string from commandArgs() .

if (params$REBUILD_DATA) {
  run_toolkit_params(yamlpath = file.path('full_toolkit', 'params.yml'),
                     passed_args = "climate: 'NSW 10,000 year climate sequence'")
}

And finally, if the params are included in the parameters section of a Quarto notebook, they get parsed already, and so instead of reading them from yaml, we can use that list as if it were already read-in params.yml. Quarto with R puts these in a list called params, so we can just pass that.

if (params$REBUILD_DATA) {
  run_toolkit_params(list_args = params)
}