Getting started

Installing HydroBOT

The {HydroBOT} package needs to be installed to provide all functions used here. It also provides some necessary data for the causal network relationships, and canonical shapefiles that have been prepped.

Use your favourite installer:

devtools::install_github("galenholt/HydroBOT")

renv::install("galenholt/HydroBOT")

pak::pkg_install("galenholt/HydroBOT")

Python dependencies

HydroBOT depends on the py-ewr python package. In most cases, users do not need to do anything, and {reticulate} will install that package automatically on first use. If more control over python environments is desired, users will need to set up an environment containing py-ewr. There is a pyproject.toml file in this repo and the HydroBOT repo that should allow you to build an environment with poetry, for more detail see developer notes. The template repository provides installation templates that handle installing poetry, pyenv, py-ewr and HydroBOT itself on a range of specific systems (contact authors if needed).

Example workflow

The basic workflow is to point to the hydrographs and run the modules in the controller, then scale the outcomes in the aggregator, and synthesise outputs with the comparer. More details are available at those links, and several different workflows are demonstrated as well; a simple example follows.

A common way to run HydroBOT is to use a single document (i.e. not running the Controller, Aggregator, and Comparer as separate notebooks), but save the outputs at each step. This allows re-starting analyses if necessary and is particularly useful for large jobs, especially if batching over many scenarios. That said, in practice, once batching takes more than several minutes per stage, it often does make most sense to split into separate notebooks or scripts for each step.

Retaining everything in-memory is a very simple flip of a switch, demoed in its own doc, which parallels this one nearly exactly with the exception of not specifying an outputType in prep_run_save_ewrs() or a savepath in read_and_agg(). Saving is generally how workflows are run for large projects, and so is the example used here for getting started.

Load the package

library(HydroBOT)

Directories

Here, we will use the example hydrographs that come with HydroBOT.

Normally project_dir and hydro_dir would point somewhere external (and typically, having hydro_dir inside project_dir will make life easier).

hydro_dir <- system.file("extdata/testsmall/hydrographs", package = "HydroBOT")

project_dir <- file.path("test_dir")

# Generated data
# EWR outputs (will be created here in controller, read from here in aggregator)
ewr_results <- file.path(project_dir, "module_output", "EWR")

# outputs of aggregator. There may be multiple modules
agg_results <- file.path(project_dir, "aggregator_output")

We need the ‘yearly’ EWR outputs for HydroBOT processing, though we can get the EWR to return any of its options.

Note

If this is your first time using HydroBOT, and you haven’t set up a Python environment, this might take a while as {reticulate} does it for you. This is required because the EWR tool is in Python.

ewr_out <- prep_run_save_ewrs(
  hydro_dir = hydro_dir,
  output_parent_dir = project_dir,
  outputType = list("yearly"),
  returnType = list("yearly")
)

Aggregation

We need to define an aggregation sequence, which specifies the steps along space, time, and theme dimensions, as well as a matching sequence of aggregation functions at each step. The example here is a reasonable default for many situations, but should be considered carefully. Please see the aggregator section for a detailed treatment of the syntax and capabilities of these sequences.

For this example, if any of the ‘versions’ of an EWR pass, say it passes in step 2. If we think the ‘versions’ are actually separate EWRs for different purposes, we might use ‘ArithmeticMean’ in step 2 intead.

aggseq <- list(
  all_time = "all_time",
  ewr_code = c("ewr_code_timing", "ewr_code"),
  env_obj = c("ewr_code", "env_obj"),
  sdl_units = sdl_units,
  Target = c("env_obj", "Target"),
  mdb = basin,
  target_5_year_2024 = c("Target", "target_5_year_2024")
)

funseq <- list(
  all_time = "ArithmeticMean",
  ewr_code = "CompensatingFactor",
  env_obj = "ArithmeticMean",
  sdl_units = "ArithmeticMean",
  Target = "ArithmeticMean",
  mdb = "SpatialWeightedMean",
  target_5_year_2024 = "ArithmeticMean"
)

Run the aggregation. Use the auto_ewr_PU shortcut to auto-specify some needed grouping for EWR outputs (see here for more detail).

Read the messages. Most of these are saying it’s more explicit to avoid auto_ewr_PU = TRUE, but there are also messages saying that the causal network is missing links.

aggout <- read_and_agg(
  datpath = ewr_results,
  type = "achievement",
  geopath = bom_basin_gauges,
  causalpath = causal_ewr,
  groupers = "scenario",
  aggCols = "ewr_achieved",
  auto_ewr_PU = TRUE,
  aggsequence = aggseq,
  funsequence = funseq,
  saveintermediate = TRUE,
  namehistory = FALSE,
  keepAllPolys = FALSE,
  returnList = TRUE,
  add_max = FALSE,
  savepath = agg_results
)

ℹ EWR outputs auto-grouped
• Done automatically because `auto_ewr_PU = TRUE`
• EWRs should be grouped by `SWSDLName`, `planning_unit_name`, and `gauge` until aggregated to larger spatial areas.
• Rows will collapse otherwise, silently aggregating over the wrong dimension
• Best to explicitly use `group_until` in `multi_aggregate()` or `read_and_agg()`.

ℹ EWR outputs auto-grouped
• Done automatically because `auto_ewr_PU = TRUE`
• EWRs should be grouped by `SWSDLName`, `planning_unit_name`, and `gauge` until aggregated to larger spatial areas.
• Rows will collapse otherwise, silently aggregating over the wrong dimension
• Best to explicitly use `group_until` in `multi_aggregate()` or `read_and_agg()`
.
ℹ EWR outputs auto-grouped
• Done automatically because `auto_ewr_PU = TRUE`
• EWRs should be grouped by `SWSDLName`, `planning_unit_name`, and `gauge` until aggregated to larger spatial areas.
• Rows will collapse otherwise, silently aggregating over the wrong dimension
• Best to explicitly use `group_until` in `multi_aggregate()` or `read_and_agg()`
.
ℹ EWR gauges joined to larger units pseudo-spatially.
• Done automatically because `auto_ewr_PU = TRUE`
• Non-spatial join needed because gauges may inform areas they are not within
• Best to explicitly use `pseudo_spatial = 'sdl_units'` in `multi_aggregate()` or `read_and_agg()`.

! Unmatched links in causal network
• 4 from Target to target_5_year_2024

Comparer

Now we can make a couple quick plots to see what we’ve made. For more detail about plotting options and controls, see the comparer.

Maps and spatial scaling

map_example <- aggout$sdl_units |>
  dplyr::filter(env_obj == "NF1") |> # Need to reduce dimensionality
  plot_outcomes(
    outcome_col = "ewr_achieved",
    plot_type = "map",
    colorset = "ewr_achieved",
    pal_list = list("scico::lapaz"),
    pal_direction = -1,
    facet_col = "scenario",
    facet_row = "env_obj",
    sceneorder = c("down4", "base", "up4"),
    underlay_list = "basin"
  ) +
  ggplot2::theme(legend.position = "bottom")

map_example

Bars- SDL units and scenarios

SDL unit differences across all environmental objectives

catchcompare <- aggout$env_obj |>
  plot_outcomes(
    outcome_col = "ewr_achieved",
    colorset = "SWSDLName",
    pal_list = list("calecopal::lake"),
    sceneorder = c("down4", "base", "up4"),
    position = "dodge"
  )

catchcompare