Scenario controller internals

library(HydroBOT)

The controller primarily sets the paths to scenarios, calls the modules, and saves the output and metadata. In normal use, the series of steps below is wrapped with prep_run_save_ewrs, as shown here and here, which allows cleaner automation and saves the metadata. I’m stepping through the internal parts of prep_run_save_ewrs here to show more clearly what’s happening when that runs; this document is intended to expose some of the inner workings of the black box. Wrapped versions of the controller alone and in a combined workflow are available to illustrate this more normal use.

This document focuses on the parts of the internal flow that help understand what’s happening and those that may sometimes be useful to run ad-hoc for e.g. troubleshooting to see what HydroBOT is doing on a more granular level. This document skips over quite a lot of error-checking, safeguarding, and formatting, as well as parallelisation structure and metadata saving that happens within prep_run-save_ewrs.

Setup

Set paths

We need to identify the path to the hydrographs and set up directories for outpus. In use, the hydrograph paths would typically point to external shared directories. The cleanest, default, situation is for everything to be in a single outer directory project_dir, and there should be an inner directory with the input data /hydrographs.

Tip

Within /hydrographs, scenarios should be kept in separate folders, i.e. files for gauges or all gauges within a catchment, basin, etc, within directories for scenarios (see here). This allows cleaner scenario structures and parallelisation. Any given run needs all the locations within a scenario, but scenarios should run separately (possibly in parallel) because outcomes (e.g. EWRs, fish performance) cannot logically depend on other scenarios representing other hydrological sequences or climates. A common situation that is much more cumbersome is to have the directory structure reflect gauges or other spatial unit, and files within them per scenario. It is worth restructuring your files if this is the case.

It also works to point to a single scenario, as might be the case if HydroBOT runs off the end of a hydrology model that generates that scenario, e.g. /hydrographs/scenario1. This allows both targeting single scenarios for HydroBOT analysis, but also batching hydrology and HydroBOT together. By default, the saved data goes to project_dir/module_output automatically, though this can be changed, see the output_parent_dir and output_subdir arguments.

project_dir <- file.path("hydrobot_scenarios")
hydro_dir <- file.path(project_dir, "hydrographs")

Format

We need to pass the data format to the downstream modules so they can parse the data. Currently the demo csvs are created in a format that parses like Standard time-series, and the demo netcdfs parse in the bespoke IQQM - netcdf format. Any available option in the EWR tool will work, see ?prep_run_save_ewrs.

We also set the output type from the EWR tool. The ‘yearly’ is needed in most cases, but any option is available, see ?prep_run_save_ewrs. It must be a list in order to pass correctly to python.

model_format <- "Standard time-series"
outputType <- list("yearly", "summary")

Processing internals

All of this is typically hidden in prep_run_save_ewrs, as in the wrapped example, but I’m exposing the steps here for easier viewing and because some internal functions can be useful for troubleshooting, e.g. [find_scenario_paths()].

Set up output directories

We get the information about the gauges and filepaths project_dir with find_scenario_paths. The names of the resulting list of paths are the names of the scenarios. Note that including a scenarios argument to [prep_run_save_ewrs()] overrides this, allowing passing in this list explicitly if there is no good way to parse scenarios and file names (e.g. perhaps to run a subset of scenarios across several different directories).

# get the paths to all the hydrographs
hydro_paths <- find_scenario_paths(hydro_dir, type = "csv")

The output directory and subdirs for scenarios is created by make_output_dir, which also returns that outer directory location. Note that output_subdir allows subdirectories. This can be useful for running different analyses on the same set of hydrographs.

# set up the output directory
output_path <- make_output_dir(
  parent_dir = project_dir,
  scenarios = names(hydro_paths),
  module_name = "EWR",
  subdir = "example",
  ewr_outtypes = unlist(outputType)
)

This directory machinery makes the file_search and fill_missing arguments possible to ensure only a subset of files are run or missing files are able to be run if, for example, a long run crashed. See ?prep_run_save_ewrs and the main controller page.

Run the ewr tool

Now we run the ewr tool with the parameters given and save the output. The EWR tool is in python, and HydroBOT provides some linking python functions. This is all handled internally to HydroBOT, but here we need to import these functions to demonstrate. It is designed to loop over scenarios (in parallel with rparallel = TRUE), and so here we only run the first scenario.

The outputType argument we’ve seen earlier as it determines the saved outputs and so matters for the directory setup, while the returnType argument determines what gets returned to the active R session.

controller_functions <- reticulate::import_from_path("controller_functions",
  path = system.file("python",
    package = "HydroBOT"
  ),
  delay_load = TRUE
)

ewr_out <- controller_functions$run_save_ewrs(
  hydro_paths[[1]],
  output_path,
  model_format,
  outputType = outputType,
  returnType = list("summary"),
  scenario_name = names(hydro_paths)[1],
  scenarios_from = "directory"
)

Briefly, we can see that that has returned dataframes from the EWR. Typically, though, we just save this out.

ewr_out$summary

Without running prep_run_save_ewrs, we have not saved the metadata.

Next steps

This now has the EWR outputs saved into project_dir/module_output/EWR and available for further processing with the aggregator.