Scenario controller in detail

Author

Galen Holt

Load the package

library(werptoolkitr)

The controller primarily sets the paths to scenarios, calls the modules, and saves the output and metadata. In normal use, the series of steps below is wrapped. Once we have the directory and set any other needed parameters (e.g. point at a config file), we should just click go, and auto-generate the folder structure, run the ewr, and output the results. I’m stepping through that here so we can see what’s happening and where we need to tweak as formats change; this document is intended to expose some of the inner workings of the black box. Wrapped versions of the controller alone and the whole toolkit are available to illustrate this more normal use.

Setup

Set paths

We need to set the path to this demonstration. This should all be in a single outer directory project_dir, and there should be an inner directory with the input data /hydrographs. These would typically point to external shared directories. For this simple example though, we put the data inside the repo to make it self contained. The saved data goes to project_dir/module_output automatically. The /hydrographs subdirectory could be made automatic as well, but I’m waiting for the input data format to firm up.

project_dir = file.path('more_scenarios')
hydro_dir = file.path(project_dir, 'hydrographs')

Format

We need to pass the data format to the downstream modules so they can parse the data. Currently the demo csvs are created in a format that parses like IQQM, and the netcdf will be. The EWR tool (the only current module) has three options currently 1) 'Bigmod - MDBA', 2) 'IQQM - NSW 10,000 years', and 3) 'Source - NSW (res.csv)'. I’m exposing this for the example, but we can auto-set this to the IQQM default in normal use.

# Options
# 'Bigmod - MDBA'
# 'IQQM - NSW 10,000 years'
# 'Source - NSW (res.csv)'

model_format = 'IQQM - NSW 10,000 years'

Climate info

Like the format, allowance and climate are arguments to the EWR tool, so I set them here to be clear what we’re doing, but in general they would be set by default.

MINT <- (100 - 0)/100
MAXT <- (100 + 0 )/100
DUR <- (100 - 0 )/100
DRAW <- (100 -0 )/100

# A named list in R becomes a dict in python
allowance <- list('minThreshold' = MINT, 'maxThreshold' = MAXT, 'duration' = DUR, 'drawdown' = DRAW)

climate <- 'Standard - 1911 to 2018 climate categorisation'

Processing internals

Above, we’ve been setting variables that should be set by default to expose what they are. Now we move into the processing step, which is typically wrapped into a single function, but here we pull off the wrapping to step through the processing sequence. All of this is typically hidden in prep_run_save_ewrs, as in the wrapped example, but I’m exposing the steps here for easier viewing and testing, especially as the formats change.

Set up output directories

We get the information about the gauges and filepaths project_dir as a dict with make_scenario_info. The scenarios need to be in separate directories inside /hydro_dir, but the files in those directories could come in multiple arrangements. Currently, we allow multiple csvs of single-gauge hydrographs or single csvs of multiple-gauge hydrographs. Once the netcdf format settles down, we will include parsing that. If there are multiple gauges within each csv, they enter the dict as a ‘gauge’ value. If not, the ‘gauge’ value is just the filename, so these need to be gauge numbers. For this example, we have single csvs with multiple gauges.

The output directory and subdirs for scenarios is created by make_output_dir, which also returns that outer directory location. The EWR tool needs the paths to the gauge data as a list, so paths_gauges_R just unfolds the dict to give that. There will likely be continued changes to these functions as we settle on standard directory structures and data formats. These functions are currently all python wrapped by R for historical reasons, but as they are just directory creation and querying could be easily re-written in R to tighten up the package.

# Gives file locations as a dict- 
sceneinfodict <- make_scenario_info_R(hydro_dir)
Warning in normalizePath(path.expand(path), winslash, mustWork):
path[1]="C:/Users/galen/Documents/code/WERP/demo_pkg_updates/.venv/Scripts":
The system cannot find the path specified
# make the output directory structure
outpath <- make_output_dir_R(project_dir, sceneinfodict)
# unfold the sceneinfodict to make it easy to get the lists of paths and gauges
# everyhydro = paths_gauges(sceneinfodict)[0]

# The inner level turned into characters in R and needs to stay as a list. This is annoying but needed *only in R*
for (i in 1:length(sceneinfodict)) {
  for (j in 1:2) {
    sceneinfodict[[i]][[j]] <- as.list(sceneinfodict[[i]][[j]])
  }
}

# more list-making to work from R-Python
everyhydro <- as.list(paths_gauges_R(sceneinfodict)[[1]])

Run the ewr tool

Now we run the ewr tool with the parameters given and save the output.

There’s an issue with outputType = 'annual' in the version of the EWR tool this was built with. Until I update and test the new EWR tool, skip the annual data. There are still a number of messages printed by the EWR code, about pulling the data and gauge dependencies. Those are useful when using the code, but I’ll suppress printing them here since there are so many.

This is not actually run here for speed- the same thing is done in a notebook for the full toolkit.

# To make a list in python, need to have unnamed lists in R
if (!params$REBUILD_DATA) {
  outputType <- list('none')
}
if (params$REBUILD_DATA) {
  outputType <- list('summary', 'all')
}


ewr_out <- run_save_ewrs_R(everyhydro, outpath, model_format = model_format, allowance = allowance, climate = climate, outputType = outputType, datesuffix = FALSE, returnType = list('summary', 'all'))

Briefly, we can see that that has returned dataframes from the EWR (with some leftover pandas datetime environments that we could clean up if we wanted to use this in-memory). Typically, though, we just save this out. Because we do not actually run the chunk above for rendering this site, these dataframes are not available here.

names(ewr_out)
str(ewr_out$summary)
str(ewr_out$all)

Next steps

This now has the EWR outputs saved into project_dir/module_output/EWR and available for further processing with the aggregator.