Run full toolkit in memory

This document provides a template for running through the toolkit in a single document, retaining everything in-memory (no intermediate saving). Intermediate saving is a very simple flip of a switch, demoed in its own doc.

Load the package

library(werptoolkitr)
library(sf)

Structure

To run the toolkit, we need to provide paths to directories for input data and output data, as well as arguments for the aggregation.

One option is to do that in a parameters file, and then treat this as a parameterised notebook.

The other option is to have this be the parameterising file, so we can have a bit more text around the parameterisations. Not sure which makes more sense, but they’re not mutually exclusive, and the answer likely depends on whether we’re working interactively or want to fire off 1,000 runs.

Parameters

Directories

Input and output directories

Use the scenario_example/ directory created to capture a very simple demonstration case of 46 gauges in three catchments for 10 years.

Normally scenario_dir should point somewhere external (though keeping it inside or alongside the hydrograph data is a good idea.). But here, I’m generating test data, so I’m keeping it in the repo. I will probably change that when I move to Azure.

# Outer directory for scenario
project_dir = file.path('more_scenarios')

# Preexisting data
# Hydrographs (expected to exist already)
hydro_dir = file.path(project_dir, 'hydrographs')

# Generated data
# EWR outputs (will be created here in controller, read from here in aggregator)
ewr_results <- file.path(project_dir, 'module_output', 'EWR')

# outputs of aggregator. There may be multiple modules
agg_results <- file.path(project_dir, 'aggregator_output')

Controller

We use the default IQQM model format and climate categorisations, though those could be passed here as well (see controller).

Control output and return

To determine what to save and what to return to the active session, use outputType and returnType, respectively. Each of them can take a list of any of 'none', 'summary', 'annual', 'all'. For this demonstration I’ll not save anything and return summary to the active session.

outputType <- list('summary')
returnType <- list('summary') # list('summary', 'all')

Aggregator

To keep this simple, we use one aggregation list and the read_and_agg wrapper to only have to pass paths. See the more detailed documents for the different ways to specify those aggregation lists.

What to aggregate

The aggregator needs to know which set of EWR outputs to use (to navigate the directory or list structure). It should accept multiple types, but that’s not well tested, so for now just use one.

aggType <- 'summary'

We need to tell it the variable to aggregate, and any grouping variables other than the themes and spatial groups. Typically, scenario will be a grouper, but there may be others.

agg_groups <- 'scenario'
agg_var <- 'ewr_achieved'

Do we want it to return to the active session? For this demo, I’m keeping everything in the session, so set to TRUE.

aggReturn <- TRUE

How to aggregate

Fundamentally, the aggregator needs paths and two lists

  • sequence of aggregations

  • sequence of aggregation functions (can be multiple per step)

Here, I’m using an interleaved list of theme and spatial aggregations (see the detailed docs for more explanation), and applying only a single aggregation function at each step for simplicity. Those steps are specified a range of different ways to give a small taste of the flexibility here, but see the spatial and theme docs for more examples.

aggseq <- list(ewr_code = c('ewr_code_timing', 'ewr_code'),
               env_obj =  c('ewr_code', "env_obj"),
               resource_plan = resource_plan_areas,
               Specific_goal = c('env_obj', "Specific_goal"),
               catchment = cewo_valleys,
               Objective = c('Specific_goal', 'Objective'),
               mdb = basin,
               target_5_year_2024 = c('Objective', 'target_5_year_2024'))


funseq <- list(c('CompensatingFactor'),
               c('ArithmeticMean'),
               c('ArithmeticMean'),
               c('ArithmeticMean'),
               rlang::quo(list(wm = ~weighted.mean(., w = area, 
                                        na.rm = TRUE))),
               c('ArithmeticMean'),
               
               rlang::quo(list(wm = ~weighted.mean(., w = area, 
                                    na.rm = TRUE))),
               c('ArithmeticMean'))

Run the toolkit

Controller

This is not actually run here for speed- the same thing is done in a notebook for the full toolkit saving steps.

  ewr_out <- prep_run_save_ewrs_R(scenario_dir = hydro_dir, 
                                  output_dir = project_dir, 
                                  outputType = outputType,
                                  returnType = returnType)

Aggregator

Because the chunk above is not run, the needed EWR outputs are not available, but would be if it were run.

aggout <- read_and_agg(datpath = ewr_results, 
             type = aggType,
             geopath = bom_basin_gauges,
             causalpath = causal_ewr,
             groupers = agg_groups,
             aggCols = agg_var,
             aggsequence = aggseq,
             funsequence = funseq,
             saveintermediate = TRUE,
             namehistory = FALSE,
             keepAllPolys = TRUE,
             returnList = aggReturn,
             savepath = agg_results)

Quick check

Plotting will be developed in the comparer, this is just a quick check to see that there is data. Code borrowed from theme x space demo.

But only if we returned the output- since the chunks above are not run, we skip this as well.

if (aggReturn) {

  # Scenario data
scenarios <- jsonlite::read_json(file.path(hydro_dir, 
                                           'scenario_metadata.json')) |> 
  tibble::as_tibble() |> 
  tidyr::unnest(cols = everything())

  # plot
aggout$catchment %>% 
  dplyr::filter(Specific_goal == 'All recorded fish species') %>%
  left_join(scenarios, by = c('scenario' = 'scenario_name')) %>% 
  ggplot2::ggplot() +
  ggplot2::geom_sf(data = basin) +
  ggplot2::geom_sf(ggplot2::aes(fill = ewr_achieved)) +
  ggplot2::facet_grid(.~forcats::fct_reorder(scenario, delta))
}