Hydrographs

library(HydroBOT)
library(ggplot2)
library(dplyr)

Overview

This notebook provides descriptive plots of the inputs to HydroBOT (hydrographs), as they are often necessary for understanding the outputs. It also shows how plot_outcomes() is not restricted only to module outcomes, but is instead flexible enough to use nearly any dataframe with quantitative values. Though we plot the hydrographs here, this is the same approach as any other line plot. Though these are plots of the inputs, they do still allow various sorts of comparisons, whether simple visual comparisons or calculated differences, relative flows, etc.

Demonstration setup

As usual, we need paths to the data, in this case the hydrographs.

project_dir <- file.path("more_scenarios")
hydro_dir <- file.path(project_dir, "hydrographs")

Get scenario metadata.

scenarios <- yaml::read_yaml(file.path(hydro_dir, "scenario_metadata.yml")) |>
  tibble::as_tibble()

We have a lot of hydrographs, so for this demonstration, we will often use a subset of gauges and scenarios.

gauges_to_plot <- c("412002", "419001", "422028", "421001")
scenarios_to_plot <- c("climatedown2adapt0", "climatebaseadapt0", "climateup2adapt0")

scenarios <- scenarios |>
  filter(scenario %in% scenarios_to_plot)

Standard scenario appearance

We will typically have a consistent look for the scenarios across the project, with a logical ordering and standard colours. Such standard colours are not included in the {HydroBOT} package because they are project/analysis- specific, but they could be set at project-level, e.g. in the .Rprofile, if desired.

Here, we use the special arguments refvals and refcols to make a colour palette from a standard {paletter} option (“ggsci::nrc_npg”) while setting a specific level to a specified value. We will use the codes (see scenario definitions) rather than the names to make plots readable.

There is a sceneorder argument to plot_outcomes() that lets us explicitly set the order of the scenarios. However, it is typically easiest to simply make the scenarios a factor, though we use the sceneorder argument here. It operates only on a column named ‘scenario’, though, so if other columns need to be ordered they should be made factors before feeding to plot_outcomes().

sceneorder <- forcats::fct_reorder(
  scenarios$scenario,
  scenarios$flow_multiplier
)

scene_pal <- make_pal(unique(scenarios$climate_code),
  palette = "ggsci::nrc_npg",
  refvals = "E", refcols = "black"
)

scene_pal

<colors>
black #E64B35FF #4DBBD5FF

We will also make a standard set of gauge colours

gauge_pal <- make_pal(unique(gauges_to_plot),
  palette = "nationalparkcolors::Denali"
)
gauge_pal

<colors>
#73979DFF #DADCD7FF #43200EFF #E16509FF

Plotting hydrographs

We first read in the hydrographs, with a bit of standardised data processing. The read_hydro function knows about the standard data organisation format, and so pulls hydrographs from all scenarios. The hydrographs are typically daily and so large. For simplicity here, we use the scenariofilter argument (which just does a regex on the filenames) to select a subset.

scenehydros <- read_hydro(hydro_dir,
  long = TRUE, format = "csv",
  scenariofilter = scenarios_to_plot
)

scenehydros <- scenehydros |>
  # these have EWR-specific _flow after the gauge name
  mutate(gauge = stringr::str_remove_all(gauge, "_[a-z]*")) |>
  # because gauges are not their own files, we have to filter them on read-in
  filter(gauge %in% gauges_to_plot) |>
  left_join(scenarios)

Joining with `by = join_by(scenario)`

Now, we can make a simple plot using the standard format and colours

plot_outcomes(scenehydros,
  outcome_col = "flow",
  outcome_lab = "Flow (ML/day)",
  x_col = "Date",
  colorset = "gauge",
  color_lab = "Gauge ID:",
  pal_list = gauge_pal,
  facet_row = "climate_code",
  facet_col = "gauge",
)

We can access the scales argument from ggplot2::facet_* and trans arguments of of ggplot2::scale_y_continuous if we want to change the look.

plot_outcomes(scenehydros,
  outcome_col = "flow",
  outcome_lab = "Flow (ML/day)",
  x_col = "Date",
  colorset = "gauge",
  color_lab = "Gauge ID:",
  pal_list = gauge_pal,
  facet_row = "climate_code",
  facet_col = "gauge",
  transy = "log10",
  scales = "free_y"
)

Baselining

All plotting functions provide the ability to set a base level and calculate changes in the other levels. First, we’ll show how that works under the hood, and then the more usual way directly in the plot_outcomes() call.

How it works

The baseline can be either a scenario name or a scalar. We could potentially use something else like historical daily means, but that is not currently implemented, largely because it hasn’t yet been necessary- one of the scenarios is almost always an obvious baseline.

Internally, the plot_outcome() function calls baseline_compare(), but baseline_compare() can be used externally as well (and is actually quite useful in a number of places to baseline long data).

For example, we might want to calculate the difference between the modified scenarios and the base scenario. To do this, we simply pass in the data, the columns defining the groups compare_col, and the base_lev, that is, the value in compare_col that is the baseline situation. We also need to tell it which column contains the values, and the function comp_fun we want to use for the comparison. Here, we ask for the difference to the baseline provided by the base scenario.

In short, this gets the difference of the 0.5x and 2x scenario compared to the historical. We use difference here because these are simple multiplicative scenarios, and so the relative comparison would be trivially 0.5 or 2. In general, though, relative change is often more appropriate.

dif_flow <- baseline_compare(scenehydros,
  compare_col = "scenario",
  base_lev = "climatebaseadapt0",
  values_col = "flow",
  comp_fun = difference,
  group_cols = c("Date", "gauge")
)

Now we can feed the dif_flow dataframe to plot_outcomes() directly. We see that, as expected, base is now a flat line at 0, while the 2x and 0.5x go up and down, respectively.

plot_outcomes(dif_flow,
  outcome_col = "difference_flow",
  outcome_lab = "Change in flow (ML/day)",
  x_col = "Date",
  colorset = "climate_code",
  color_lab = "Scenario",
  pal_list = scene_pal,
  # facet_row = 'climate_code',
  facet_col = "gauge"
)

Internal to plot_outcomes()

In practice, we would typically let the baselining happen internally to the plot_outcomes() function so we aren’t carrying around and keeping track of a bunch of altered dataframes. To replicate the above plot, we only have to feed plot_outcomes() the base_lev and comp_fun arguments in base_list. If we want to do something more unusual with the baselining, we will need to do it externally. We can though specify a custom function of x and y, here the extremely contrived sqrtdif.

Note

We would get a warning here about not being explicit about the group_cols argument to the baselining function if it were not included, which just groups by everything non-numeric if that argument is not given.

This automatically labels with a long but meaningful name.

sqrtdif <- function(x, y) {
  sqrt(abs(x - y))
}

plot_outcomes(scenehydros,
  outcome_col = "flow",
  x_col = "Date",
  colorset = "gauge",
  color_lab = "Gauge ID:",
  pal_list = gauge_pal,
  base_list = list(
    base_lev = "climatebaseadapt0",
    values_col = "flow",
    comp_fun = "sqrtdif",
    group_cols = c("Date", "gauge")
  ),
  facet_col = "climate_code"
)

One way this could be very useful is for lagged or windowed operations, which we do here very simply by getting the difference to the baseline one day previous. The use in practice would need to be developed carefully to address specific questions.

lagdif <- function(x, y) {
  x - lag(y)
}

We can change the outcome_lab as well, though to make sure information isn’t lost, the baselining is appended.

lagplot <- plot_outcomes(scenehydros,
  outcome_col = "flow",
  outcome_lab = "Lagged flow difference",
  x_col = "Date",
  colorset = "gauge",
  color_lab = "Gauge ID:",
  pal_list = gauge_pal,
  base_list = list(
    base_lev = "climatebaseadapt0",
    values_col = "flow",
    comp_fun = "lagdif",
    group_cols = c("Date", "gauge")
  ),
  facet_col = "climate_code"
)

lagplot

Because this is a ggplot object, we can change that y-label if we really don’t want the explanation.

lagplot + labs(y = "Lag 1 flow difference")

Netcdfs

Briefly, the read_hydro() function can also read .nc files, but is a bit more constrained as to their format. As a quick demonstration with the data provided with HydroBOT, we can read them in and plot.

nchydros <- read_hydro(system.file("extdata/ncdfexample/nchydros", package = "HydroBOT"),
  long = TRUE, format = "nc"
)

plot_outcomes(nchydros,
  outcome_col = "flow",
  outcome_lab = "Flow (ML/day)",
  x_col = "Date",
  colorset = "gauge",
  color_lab = "Gauge ID:",
  pal_list = "MoMAColors::Warhol",
  facet_row = "scenario",
  facet_col = "gauge",
)