Creating example scenarios

Author

Galen Holt

Overview

HydroBOT proper begins with hydrographs as inputs. The creation of those hydrographs, and particularly their modification to create scenarios is therefore typically a step that occurs prior to the use of HydroBOT. For the demonstrations here, however, we generate some example scenarios from historical hydrographs. For the purposes of capacity demonstration we simply multiply short hydrographs and put them in a standard HydroBOT input format here.

For this website, we use two sets of scenarios

  1. The small set that comes with hydrobot itself, useful for examples that run the modules and aggregation steps

  2. A larger set with more interesting structure, used for demonstrating the comparer. See information about them

This document generates that second set, which is mainly used to demonstrate the Comparer. Most other demonstrations just use the smaller set that comes with HydroBOT itself.

HydroBOT relevance

The creation of flow scenarios is not part of HydroBOT proper. Instead, HydroBOT expects to ingest hydrographs and then handles the ongoing response models, aggregation, and analyses.

Process

We pull a limited set of gauges for a limited time period to keep this dataset small. Primarily, we identify a set of gauges in two catchments, pull them for a short time period, and adjust them multiplicatively and additively to create modified scenarios, with the original data serving as the baseline scenario. Along the way, we examine the data in various ways to visualise what we’re doing and where.

Paths and other data

Set the data directory. These should usually point to external shared directories. For this simple example though, we put the data inside the repo to make it self contained.

scenario_dir <- "more_scenarios"
hydro_dir <- file.path(scenario_dir, "hydrographs")

Spatial datasets

We use spatial datasets provided by {HydroBOT}, which provides a standard set. These are visualised in a separate notebook. Relevant to this scenario creation, we are interested in the gauges, (HydroBOT::bom_basin_gauges) since this is what were contained in the EWR tool. We use the sdl_units dataset to obtain a subset of gauges for these simple scenarios.

Subset the gauges

We need multiple catchments for reasonable demonstration of spatial aggregation, so let’s use the Macquarie-Castlereagh, Namoi, Lachlan.

catch_demo <- sdl_units |>
  dplyr::filter(SWSDLName %in% c("Macquarie-Castlereagh", "Lachlan", "Namoi"))

Get relevant gauges

Cut the bom_basin_gauges from the whole country to just those four catchments

demo_gauges <- st_intersection(bom_basin_gauges, catch_demo)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries

How many are there?

demo_gauges |> nrow()
[1] 295

That’s a fair number, but they won’t all be in the EWR.

Extract their names

To feed to the gauge puller, we need their gauge numbers.

gaugenums <- demo_gauges$gauge

Find those relevant to HydroBOT

We have the list of gauges, but now we need to cut the list down to those in the EWR tool. There’s not any point in pulling gauges that do not appear in the response model (here, the EWR tool).

ewrs_in_pyewr <- get_ewr_table()

What are those gauges, and which are in both the ewr and the desired catchments?

ewrgauges <- ewrs_in_pyewr$Gauge |> unique()
ewr_demo_gauges <- gaugenums[gaugenums %in% ewrgauges]
length(ewr_demo_gauges)
[1] 47

47 isn’t too many.

Need to categorise them so we know what to pull. Let’s cut to just flow gauges for the demo for simplicity- scaling levels is less clear.

gauges_to_pull <- ewrs_in_pyewr |>
  filter(Gauge %in% ewr_demo_gauges & FlowLevelVolume == "F") |>
  dplyr::select(Gauge) |>
  unique() |>
  pull()

Get all the gauge data

Now we have a list of gauges, we need their hydrographs. We need a reasonable time span to account for temporal variation, but not too long- this is a simple case. Let’s choose 10 years.

starttime <- lubridate::ymd(20000701)
endtime <- lubridate::ymd(20210630)

Pull the data with hydrogauge

if (params$REBUILD_DATA) {
  demo_levs <- hydrogauge::get_ts_traces(
    portal = "NSW",
    site_list = gauges_to_pull,
    var_list = 141, # This is flow
    start_time = starttime,
    end_time = endtime
  )
}

This forcing should not be necessary, but some azure systems have old GDAL, which can’t read WKT2. Need to fix, but in the meantime, force with the crs number.

if (grepl("npd-dat", Sys.info()["nodename"])) {
  st_crs(basin) <- 4283
  st_crs(catch_demo) <- 4283
  st_crs(demo_gauges) <- 4283
}

Map the gauges

Looks reasonable. Probably overkill for testing, but can do a cut down version too.

(ggplot() +
  geom_sf(data = basin, fill = "lightsteelblue") +
  geom_sf(data = catch_demo, mapping = aes(fill = SWSDLName)) +
  geom_sf(data = demo_gauges, color = "black") +
  scale_fill_brewer(type = "qual", palette = 8))

Make test scenarios

To generate simple scenarios that cover a range of flow conditions, we multiply and divide the baseline data by scaling factors, yielding symmetric changes up and down in relative flow. We also add ‘adaptation’ options as additions of flow during Sept-Dec.

Make the data readable by EWR tool

The EWR tool has the capacity to pull gauge data directly, but because we have modified these data, they need to enter the EWR tool through scenario_handling, and so need to have a format the EWR tool can parse as output from one of the scenario generating tools it uses (IQQM, Source, etc). The current approach is to use Standard time-series as the type, which means we have to give the data the proper structure for that format.

This requries having a date column Date and other columns with gauge numbers for names, with ‘_flow’ or ‘_level’ after them. The EWR tool can have multiple gauges, with each having its own column. HydroBOT expects directories for each scenario, with any number of csvs of hydrographs inside, (e.g. it could be one csv with columns for each gauge, or each csv could have a single gauge), which is handled by find_scenario_paths internally to the “Scenario controller” part of HydroBOT. Here, we save a single csv with many gauges for each scenario.

Build scenarios and data structure

Set up the data structure

if (params$REBUILD_DATA) {
  demo_flow <- demo_levs |>
    dplyr::mutate(Date = lubridate::date(time)) |>
    # append the gauge type
    dplyr::mutate(site = paste0(site, "_flow")) |>
    dplyr::select(Date, value, site) |>
    tidyr::pivot_wider(
      id_cols = Date,
      names_from = site,
      values_from = value
    )
}

Now we want to set up the ‘scenarios’ as multiplications and additions, as well as the directory structure.

multipliers <- c(1.1, 1.25, 1.5, 2)

scenemults <- c(1 / rev(multipliers), 1, multipliers)
climate_code <- LETTERS[1:length(scenemults)]

scenenames <- c(
  paste0("climatedown", as.character(rev(multipliers))),
  "climatebase",
  paste0("climateup", as.character(multipliers))
) |>
  stringr::str_replace("\\.", "_") # deals with the 1.x

adaptadds <- c(0, 250, 6500, 12000)
adapt_code <- 1:length(adaptadds)

adapttime <- 8 # months = Sep,Oct,Nov,Dec

sceneaddsnames <- c(paste0("adapt", as.character(adaptadds))) |>
  stringr::str_replace("\\.", "_")

scenenames <- paste0(
  rep(scenenames,
    times = length(unique(adaptadds))
  ),
  stringr::str_sort(
    rep(sceneaddsnames,
      times = length(unique(scenemults))
    ),
    numeric = TRUE
  )
)

scenemults <- rep(scenemults, times = length(unique(sceneaddsnames)))
climate_code <- rep(climate_code, times = length(unique(sceneaddsnames)))

sceneadds <- sort(rep(adaptadds, times = length(unique(scenemults))))
adapt_code <- sort(rep(adapt_code, times = length(unique(scenemults))))
sceneadapttimes <- rep(adapttime, times = length(scenenames))

scenario_code <- paste0(climate_code, adapt_code)


# Add in the MAX scenario
scenenames <- c(scenenames, "MAX")
scenemults <- c(scenemults, 0)
sceneadds <- c(sceneadds, 500000)
sceneadapttimes <- c(sceneadapttimes, 0)
climate_code <- c(climate_code, "MAX")
adapt_code <- c(adapt_code, Inf)
scenario_code <- c(scenario_code, "MAX")

check <- data.frame(scenenames = scenenames)
check$scenemults <- scenemults
check$sceneadds <- sceneadds
check$sceneadapttimes <- sceneadapttimes

# the full scenarios
for (x in scenenames) {
  scenedir <- file.path(hydro_dir, x)
  if (!dir.exists(scenedir)) {
    dir.create(scenedir, recursive = TRUE)
  }
}

Create clean dataframes to save. Could be fancy with a function and a purr, but we’re just saving, so a simple loop should be fine.

HydroBOT does not write scenario metadata, since the scenarios will come from elsewhere. Instead, in an ideal world, scenarios will come with metadata. Here, since we’re creating the scenarios, make some metadata here.

if (params$REBUILD_DATA) {
  for (i in 1:length(scenenames)) {
    demo_flow |>
      dplyr::mutate(across(-Date, \(x) x * scenemults[i])) |>
      dplyr::mutate(across(-Date, \(x) ifelse(lubridate::month(Date) > sceneadapttimes[i],
        x + sceneadds[i], x
      ))) |>
      readr::write_csv(file.path(
        hydro_dir, scenenames[i],
        paste0(scenenames[i], ".csv")
      ))
    # make JSON metadata
    jsonlite::write_json(
      list(
        scenario_name = scenenames[i],
        flow_multiplier = scenemults[i],
        flow_addition = sceneadds[i],
        flow_addition_time = sceneadapttimes[i]
      ),
      path = file.path(hydro_dir, scenenames[i], "metadata.json")
    )
  }
}

That set of hydrographs can now be used as starting data for a demonstration of HydroBOT proper.

Make some metadata for all scenarios too (both yaml and json).

scenario_meta <- list(
  scenario = scenenames,
  flow_multiplier = scenemults,
  flow_addition = sceneadds,
  flow_addition_time = sceneadapttimes,
  climate_code = climate_code,
  adapt_code = adapt_code,
  scenario_code = scenario_code
)

# I don't know the format we'll be using, but this works to create yaml metadata
yaml::write_yaml(scenario_meta,
  file = file.path(hydro_dir, "scenario_metadata.yml")
)
# and this does the same with JSON
jsonlite::write_json(scenario_meta,
  path = file.path(hydro_dir, "scenario_metadata.json")
)