Comparer overview

The Comparer has two components- underlying functions and structure to perform comparisons and other analyses, and plotting capabilities to produce some standardised plots that capture important data visualisation.

Most importantly for the plots implement through the plot_outcomes() function, the design philosophy is that an outcome to be plotted should be declared, and then however that outcome is presented (maps, bars, lines, etc.), the necessary data transformations happen inside plot_outcomes(), and so greatly enhancing testing and rigour while reducing the errors associated with copy-pasting or inadvertently mis-specifying steps. Moreover, plot_outcomes() is dimensionally aware and throws warnings when data is being silently overplotted and so producing misleading results (and alerting the user to forgotten dimensions).

There is quite a lot of flexibility built into all of the comparer, because different uses and different questions will require different outputs, whether that means different scales of analysis, different types of plots, or different numerical comparisons. Not only will these differ within projects, the act of finding an ‘ideal’ set of plots for any given project is necessarily iterative, and so the flexibility here provides the user with much control over that process.

While this is called the ‘Comparer’ and most plots use the function plot_outcomes(), it also contains other functionality related to analysis generally, and can produce plots that do not include comparisons or outcomes, e.g. hydrographs to simply illustrate historical flows.

Nearly all plots of outcomes are made with plot_outcomes, including bars, lines (including timeseries), heatmaps, and maps. This is because at their foundation, they area all plotting a quantitative outcome with grouping of some sort. The data preparation is the same across all of them, as well as many of the arguments to ggplot().

Nearly all plots (with the current exception of the causal networks) are made internally with ggplot2 and return ggplot2 objects, which can easily be further modified. The plot functions here wrap the ggplot2 to standardise appearance and data preparation and ensure dimensions are handled appropriately. Though it can be annoying to not use ggplot() directly to make the plots, one MAJOR advantage of the plotting function here is that any data changes that clean it for a given plot aren’t preserved, and so it’s far easier to keep the data clean, know what the data is, and avoid accidental overwriting or mislabelling of data. Further, the internal data manipulation remains the same whether the outcome is plotted as a y-axis, colour, etc. or the plot type changes.

Note

As should be clear, the intention here is not to define a small set of plots that are made every time, but instead to provide functions that allow a user to safely adjust plots to meet a variety of needs. That said, if a project does mature towards a standard set of plots for that project, these functions could clearly be incorporated into a dashboard or Shiny site. There is clear opportunity for reactivity with nearly all plots, allowing a user to select plot types, any filtering (espcially for networks, spatial units, etc), and produce the plot.

Important

Unlike calls to bare ggplot, data-variable names in arguments should generally be characters (e.g. when specifying the column name for the outcome_col). While the use of data masking in the tidyverse can be incredibly useful for interactive work, it is cumbersome for programming and difficult to make stable. It has not been a high priority to make work here.

Comparisons

Nearly all analyses will be comparisons of some kind, and so the Comparer provides the capacity to produce plots comparing scenarios, time, space, and themes, ass well as aggregation sequences. It can do these comparisons visually (e.g. plotting raw values next to each other), but also calculate comparison values and plot these. There are a few primary sorts of plots, all of which focus on plotting quantitative outcomes.

Finally, we provide an example using these approaches to compare aggregation choices.

Standardization

The plot_outcomes() function allows the user to specify theming and colour controls to maintain a standard look and calculation structure. More generally, {HydroBOT} provides some of this theming functionality with theme_hydrobot() and the ability to generate and manipulate colour palettes. These are described below. It is perhaps most important to know that these functions are also exposed to the user, and so these themes and standard approaches to appearance and baselining can be used for one-off ad-hoc plots as well. In any particular project, a user should set up relevant, consistent, and standard colour palettes, which can be used throughout and passed to plot_outcomes().

While the use of the plot_outcomes() function automates much of the processing and keeps the environment clean and reduces errors in data management and plotting, we will sometimes just want to throw together a quick ggplot call, where the theming and colour control will still come in handy. An intermediate approach is to call plot_prep(), which automates much of the colouring, data manipulation, (including baseline_compare()ing), and then make ad-hoc plots with the resulting dataframe.

Theme

HydroBOT provides the theme_hydrobot() ggplot theme that we use to get a consistent look, but other themes can always be used post-hoc. Additional theme arguments can be passed to it, if we want to change any of the other arguments in ggplot2::theme() on a per-plot basis. By default, theme_hydrobot is applied when making the plots inside plot_outcomes(), though it can be applied to any ggplot object.

Colour

HydroBOT does not enforce a standard set of colours, instead, it provides the user the tools they need to achieve the colour standardisations they need for a particular project or plot type. These colour sets will change between scenarios/projects and there are too many possibilities of what we might plot. It is generally good practice to enforce palettes within projects, and HydroBOT provides the tools to do this. In general, colours can either be specified manually (usually with the help of make_pal() to generate named color objects) or with {paletteer} palettes because of the wide range of options with standard interface and ability to choose based on names. A good reference for the available palettes is here, and demonstrations of colour specification are throughout the examples, but specifically bar plots.

In some cases, we can set multiple levels of colours based on different palettes, which can be a useful way to indicate grouping variables. This is available everywhere, but is best demonstrated in the bar plots and causal plots. Though it achieves a different purpose, there is also the ability to set separate colour palettes for different spatial scales in the same map.

Tip

Despite being written primarily for Australian uses, the functions here use American ‘color’ spelling for arguments. Where functions (e.g. plot_data_prep(), as_colors()) return dataframes or other objects indicating colour, those also use American spelling. This is because colour handling and plotting has a number of external dependencies that use and expect American spelling.

Internal calculations and structure

While plots are the typical outputs of the Comparer, it has a set of useful functions for preparing data, including calculating values relative to a baseline (baseline_compare()) using either default functions difference and relative, or with any user-supplied function.

There is an internal function plot_prep() that does all the data prep, including applying baseline_compare(), finding colours, and setting established scenario orders. This keeps plots and the data processing consistent, and dramatically reduces the error-prone copy-pasting of data processing with minor changes for different plots. Instead, we can almost always feed the plotting functions the same set of clean data straight out of the aggregator, and just change the arguments to the plot functions.

Baselining is available as a standalone function (baseline_compare()) and can be done automatically in the plot_outcomes() (and plot_prep()) functions. This capacity is demonstrated in all the plot examples, but in most detail in the hydrographs.

One critical issue, particularly with complex data, is being unaware of silently overplotted values. The plot_outcomes() function has internal checks that the number of rows of data matches the number of axes on which the data is plotted (including facets, colours, linetype, etc). This prevents things like plotting a map of env_obj data facetted only by scenario, and so each fill represents outcomes for all env_obj, which is meaningless but very easy to do. The exception is that points are allowed to overplot, though we can use the position = 'position_jitter' argument to avoid that, as is typical with ggplot().

Scenario information

The ‘scenarios’ used here for examples are a factorial combination of multiplicative and additive changes to flow, based on historical hydrographs. We use a more complex set for the Comparer examples than for the Controller and Aggregator in order to have something more interesting to plot.

In an ideal world, scenario metadata would be auto-acquired from the directory defining the hydrographs. In practice, that’s rarely available, but we can do it here for the example scenarios.

project_dir <- file.path("more_scenarios")
hydro_dir <- file.path(project_dir, "hydrographs")

scenarios <- yaml::read_yaml(file.path(hydro_dir, "scenario_metadata.yml")) |>
  tibble::as_tibble()

To scale flow, we apply nine flow multipliers, ranging from 0.5 to 2.0, to the historical hydrographs (Table 1). We refer to these as ‘climate’ scenarios, reflecting a common representation where entire hydrographs might shift to represent climate change. To achieve pulsed change for each of the ‘climate’ scenarios, four flow additions were applied including 1) no addition (baseline), 2) addition of 250 ML/d, 3) addition of 6500 ML/d, and 4) addition of 12000 ML/d (Table 1). These additional flows were added throughout the period of September to December. We refer to these scenarios as ‘climate adaptations’ because management options are often available in the form of altering water availability for short time periods through mechanisms like water releases, though the options here do not represent proposed actions. These scenarios should not be interpreted as potential climate impacts or adaptations, but instead as different ways flows might change (multiplicative or additive) and different magnitudes of change.

adapt_scenes <- scenarios |>
  dplyr::filter(scenario != "MAX") |>
  dplyr::mutate(flow_addition = as.integer(flow_addition)) |>
  dplyr::select(
    `Adaptation code` = adapt_code,
    `Flow addition (ML/d)` = flow_addition
  ) |>
  dplyr::distinct()

climate_scenes <- scenarios |>
  dplyr::filter(scenario != "MAX") |>
  dplyr::select(
    `Climate code` = climate_code,
    `Flow multiplier` = flow_multiplier
  ) |>
  dplyr::distinct()

adapt_scenes <- adapt_scenes |>
  dplyr::bind_rows(tibble::tibble(
    `Adaptation code` = rep(NA, nrow(climate_scenes) -
      nrow(adapt_scenes)),
    `Flow addition (ML/d)` = rep(NA, nrow(climate_scenes) -
      nrow(adapt_scenes))
  ))

climate_scenes |>
  dplyr::mutate(`Flow multiplier` = signif(`Flow multiplier`, 2)) |>
  dplyr::bind_cols(adapt_scenes) |>
  flextable::flextable() |>
  flextable::font(fontname = "Calibri") |>
  flextable::fontsize(size = 10, part = "all") |>
  flextable::set_table_properties(layout = "autofit", width = 1) |>
  flextable::vline(j = 2)
Table 1: Demonstration scenarios are a factorial combination of ‘climate’ (scaled flow) and ‘adaptation’ (pulsed additions).

Climate code

Flow multiplier

Adaptation code

Flow addition (ML/d)

A

0.50

1

0

B

0.67

2

250

C

0.80

3

6,500

D

0.91

4

12,000

E

1.00

F

1.10

G

1.20

H

1.50

I

2.00