Bar plots

library(HydroBOT)
library(ggplot2)
library(dplyr)
library(sf)

Overview

This notebook provides examples of creating bar plots, e.g. plots with one quantitative y-axis for outcome, and a qualitative x-axis. The x-axis is often, but not always, the scenarios. We also demonstrate here the ability to use colour and different colour palettes to include additional information, including spatial unit or type of response.

For a quantitative x-axis, we would typically use line plots.

Demonstration setup

As usual, we need paths to the data. We use the ‘more scenarios’ examples for all the plots, with processing as in the website workflow.

project_dir <- file.path("more_scenarios")
hydro_dir <- file.path(project_dir, "hydrographs")
agg_dir <- file.path(project_dir, "aggregator_output")

Read in the data

We read in the example data we will use for all plots.

agged_data <- readRDS(file.path(agg_dir, "achievement_aggregated.rds"))

That has all the steps in the aggregation, but most of the plots here will only use a subset to demonstrate.

To make visualisation easier, the SDL units data is given a grouping column that puts the many env_obj variables in groups defined by their first two letters, e.g. EF for Ecosystem Function. These correspond to the ‘Target’ level, but it can be useful to have the two groupings together for some examples.

If we had used multiple aggregation functions at any step, we should filter down to the one we want here, but we only used one for this example.

For simplicity here, we will only look at a small selection of the scenarios (multiplicative changes of 0.5,1, and 2). Thus, we make two small dataframes for our primary examples here.

scenarios_to_plot <- c("climatedown2adapt0", "climatebaseadapt0", "climateup2adapt0")

scenarios <- yaml::read_yaml(file.path(hydro_dir, "scenario_metadata.yml")) |>
  tibble::as_tibble()

basin_to_plot <- agged_data$mdb |>
  dplyr::filter(scenario %in% scenarios_to_plot) |>
  dplyr::left_join(scenarios, by = "scenario")

# Create a grouping variable
obj_sdl_to_plot <- agged_data$sdl_units |>
  dplyr::filter(scenario %in% scenarios_to_plot) |>
  dplyr::mutate(env_group = stringr::str_extract(env_obj, "^[A-Z]+")) |>
  dplyr::arrange(env_group, env_obj) |>
  dplyr::left_join(scenarios, by = "scenario")

Standard scenario appearance

We will typically have a consistent look for the scenarios across the project, with a logical ordering and standard colours. Such standard colours are not included in the {HydroBOT} package because they are project/analysis- specific, but they could be set at project-level, e.g. in the .Rprofile, if desired.

Here, we use the special arguments refvals and refcols to make a colour palette from a standard {paletter} option (“ggsci::nrc_npg”) while setting a specific level to a specified value. We will use the codes (see scenario definitions) rather than the names to make plots readable.

There is a sceneorder argument to plot_outcomes() that lets us explicitly set the order of the scenarios. However, it is typically easiest to simply make the scenarios a factor, though we use the sceneorder argument here. It operates only on a column named ‘scenario’, though, so if other columns need to be ordered they should be made factors before feeding to plot_outcomes().

sceneorder <- forcats::fct_reorder(
  basin_to_plot$scenario,
  basin_to_plot$flow_multiplier
)

scene_pal <- make_pal(unique(basin_to_plot$climate_code),
  palette = "ggsci::nrc_npg",
  refvals = "E", refcols = "black"
)

scene_pal

<colors>
black #E64B35FF #4DBBD5FF

Make bar plots

Scenario fills

Basin scale

We can make plots looking at how scenarios differ for each of the outcome categories for a simple case of only one outcome. This uses facet_wrapper to just wrap the single facet axis.

The colorset argument is the column that determines colour, while the pal_list defines those colours, here as a named colors object, but as we see below it can also be palette names.

plot_outcomes(basin_to_plot,
  outcome_col = "ewr_achieved",
  x_col = "climate_code",
  facet_wrapper = "Target",
  colorset = "climate_code",
  pal_list = scene_pal,
  sceneorder = sceneorder
)

HydroBOT retains the axis names as-is from the incoming dataframe, as they provide the true meaning of each value. But we can change them, either inside the plot_outcomes() function (here for y) or post-hoc with ggplot2::labs() (here for x). We can also set the sceneorder with a character vector if that’s easier than setting up a Factor or if we want to change them around for some reason. Because the outputs of plot_outcomes() are just ggplot objects, changing the labels outside the function can be very useful for checking that each axis is in fact what we think it is before giving it clean labels.

plot_outcomes(basin_to_plot,
  outcome_col = "ewr_achieved",
  y_lab = "Proportion Objectives\nAchieved",
  x_col = "climate_code",
  color_lab = "Scenario",
  facet_wrapper = "Target",
  colorset = "climate_code",
  pal_list = scene_pal,
  sceneorder = c("climateup2adapt0", "climatebaseadapt0", "climatedown2adapt0")
) +
  labs(x = "Scenario")

Another approach is to put other groupings on the x-axis, and colour by scenario.

plot_outcomes(basin_to_plot,
  outcome_col = "ewr_achieved",
  x_col = "Target",
  colorset = "climate_code",
  pal_list = scene_pal,
  sceneorder = sceneorder
)

We can pass position = 'dodge' to use dodged bars for clearer comparisons, particularly accentuating the variation in sensitivity of the different outcomes to the scenarios.

plot_outcomes(basin_to_plot,
  outcome_col = "ewr_achieved",
  x_col = "Target",
  colorset = "climate_code",
  pal_list = scene_pal,
  sceneorder = sceneorder,
  position = "dodge"
)

SDL units

We can use the aggregation step of env_obj and SDL units to demonstrate plotting that not only addresses the outcomes for scenarios, but how they differ across space.

First, we look at how the different scenarios perform for the Ecosystem Function objectives in each SDL unit. We also use the ggplot2 functionality to remove the x-axis label, since it is redundant with the colour.

Tip

Often when we have multiple dimensions, we’ll want to do a simple filter to relevant subsets of the data for readability. When this filtering occurs, it is almost always a good idea to do it on the fly (as here), to avoid errors associated with losing track of data manipulations and instead start each figure with the full dataset.

obj_sdl_to_plot |>
  filter(grepl("^EF", env_obj)) |>
  plot_outcomes(
    outcome_col = "ewr_achieved",
    facet_col = "env_obj",
    facet_row = "SWSDLName",
    colorset = "climate_code",
    pal_list = scene_pal,
    sceneorder = sceneorder
  ) +
  theme(axis.text.x = element_blank(), axis.title.x = element_blank())

We address a few ways to handle groups of outcome variables, one of the simplest is to simply facet these plots by those groups, with all the outcomes in the group getting their own bars. This puts the theme levels on x and colours by scenario, with the groups accentuated by facets. These can be stacked (position = 'stack'- the default) or dodged (demonstrated here).

dodgefacet <- obj_sdl_to_plot |>
  plot_outcomes(
    outcome_col = "ewr_achieved",
    x_col = "env_obj",
    colorset = "climate_code",
    facet_row = "SWSDLName",
    facet_col = "env_group",
    scales = "free_x",
    pal_list = scene_pal,
    sceneorder = sceneorder,
    position = "dodge"
  )

dodgefacet + theme(legend.position = "bottom") +
  labs(x = "Environmental Objective")

colours from outcomes

Rather than facetting, we can stack each of the outcome categories (here, Target groups). To do this, we simply change the colorset to “Target” instead of ‘scenario’ (the default x_col is scenario, so it remains on x, but it is generally better to specify it explicitly. This is especially true here, where the ‘climate_code’ column is a better description). We also change the pal_list to a {paletteer} name, providing the palette from which to choose colours for each Target. If we wanted to retain these colours across the project, we would define a palette for the Targets as we did above for scenarios.

This yields a more compact plot (Figure 1) that shows overall outcomes without as much duplication. The stacking here of the env_obj outcomes in each group while colouring them all the same is itself a sort of simple aggregation. The approach next with position = "dodge" is generally better (Figure 2).

plot_outcomes(basin_to_plot,
  outcome_col = "ewr_achieved",
  colorset = "Target",
  pal_list = list("scico::tokyo"),
  sceneorder = sceneorder
) +
  guides(fill = guide_legend(ncol = 2)) +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5))

In another example of colours from other information in the data, when we have multiple spatial units we might colour by them instead of colouring by outcome category. Here, we show how to colour by SDL unit (SWSDLName) instead of env_obj. We also explicitly use x_col to use climate code and not the full names.

plot_outcomes(obj_sdl_to_plot,
  outcome_col = "ewr_achieved",
  x_col = "climate_code",
  colorset = "SWSDLName",
  pal_list = list("ggsci::default_jama"),
  sceneorder = sceneorder
)

That shows that while all SDL units are affected by the changes in the flow, the Lachlan is less sensitive.

We can also use position = 'dodge' to have side-by-side bars instead of stacked (Figure 3). Note that in this case, where we’re colouring by SDL unit but there are many env_obj values, those env_obj no longer stack, and so we have to manually stack them by calculating their sum. This would not be the case if we were colouring by individual rows (env_obj)- see examples of that below.

obj_sdl_to_plot |>
  group_by(SWSDLName, scenario) |>
  summarise(ewr_achieved = sum(ewr_achieved, na.rm = TRUE)) |>
  ungroup() |>
  plot_outcomes(
    outcome_col = "ewr_achieved",
    colorset = "SWSDLName",
    pal_list = list("ggsci::default_jama"),
    sceneorder = sceneorder,
    position = "dodge"
  )

If we have multiple levels of groupings, we can colour by the groups directly if we don’t care what the individual env_objs are doing between them. This is very similar to the plots colouring by SDL unit above.

plot_outcomes(obj_sdl_to_plot,
  outcome_col = "ewr_achieved",
  x_col = "climate_code",
  colorset = "env_group",
  pal_list = list("scico::berlin"),
  facet_col = "SWSDLName",
  sceneorder = sceneorder
) +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5))

Grouped colours

HydroBOT can assign different colour palettes to different sets of outcomes, yielding what is essentially another axis on which we can plot information. We use this same ability across a number of plot types, particularly causal networks. For example, we might categorise the env_obj outcomes into the larger scale groups (e.g. ‘NF’, ‘EF’, etc). We can then assign each of these a separate palette, and so the individual env_objs get different colours chosen from different palettes.

Achieving this requires specifying two columns- the colorset, as above, is the column that determines colour. The colorgroups column specifies the groupings of those colorset values, and so what palette to use. Thus, the pal_list needs to be either length 1 (everything gets the same palette) or length(unique(data$colorgroups)). Note also that the colorset values must be unique to colorgroups- this cannot be a one-to-many mapping because each colorset value must get a colour from a single palette defined by the colorgroup it is in.

We demonstrate with env_obj variables mapped to larger environmental groups, making it easier to see at a glance the sorts of environmental objectives that are more or less affected, while also allowing views of the individual environmental objectives. Here we use facet_col and facet_row to ensure the SDL units don’t wrap around. We made the env_groups column when we chose the data initially.

# Create a palette list
env_pals <- list(
  EF = "grDevices::Purp",
  NF = "grDevices::Mint",
  NV = "grDevices::Burg",
  OS = "grDevices::Blues",
  WB = "grDevices::Peach"
)

# need to facet by space sdl unit and give it the colorgroup argument to take multiple palettes
obj_sdl_to_plot |>
  plot_outcomes(
    outcome_col = "ewr_achieved",
    x_col = "climate_code",
    colorgroups = "env_group",
    colorset = "env_obj",
    pal_list = env_pals,
    facet_col = "SWSDLName",
    facet_row = "."
  ) +
  theme(legend.key.size = unit(0.5, "cm"))

Adding facetting by those groups can make that easier to read if the goal is to focus on changes within groups, but more plots.

obj_sdl_to_plot |>
  plot_outcomes(
    outcome_col = "ewr_achieved",
    x_col = "climate_code",
    colorgroups = "env_group",
    colorset = "env_obj",
    pal_list = env_pals,
    facet_col = "SWSDLName",
    facet_row = "env_group"
  ) +
  theme(legend.key.size = unit(0.5, "cm"))

We could also split those bars sideways instead of stack them, but that likely makes more sense if there are fewer categories than here. We again use position = 'dodge', but now we don’t need to sum because we’re stacking each row already. I’ve flipped the facetting and taken advantage of the fact that these are just ggplot objects to remove the legend, making it very slightly easier to read (but harder to interpret). This gets very crowded with the full set of scenarios, so we can use the scenariofilter argument to cut it to just a few (here, base and multiply and divide by 1.5).

obj_sdl_to_plot |>
  plot_outcomes(
    outcome_col = "ewr_achieved",
    x_col = "climate_code",
    colorgroups = "env_group",
    colorset = "env_obj",
    pal_list = env_pals,
    facet_col = "SWSDLName",
    facet_row = "env_group",
    position = "dodge"
  ) +
  theme(legend.key.size = unit(0.5, "cm"))

Another approach to groups of outcomes without the colours explicitly grouped is to not use colorgroup, but instead just facet by the group and give every colorset value a colour from the same palette. Depending on the palette chosen and the breaks, this can be quicker, but will not accentuate groups as well.

obj_sdl_to_plot |>
  plot_outcomes(
    outcome_col = "ewr_achieved",
    colorgroups = NULL,
    colorset = "env_obj",
    pal_list = list("scico::berlin"),
    facet_row = "SWSDLName",
    facet_col = "env_group",
    scales = "free_x"
  ) +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5))

These plots are interesting, but in typical use, the plots above using facets for the groups or colouring by the groups themselves are likely to be easier to read, unless we really are interested in this level of granularity. Whatever approach we choose for a given plot, accentuating the differences between outcome groups can be a powerful interpretation tool.

Manual colour definition

Though the above examples using {paletteer} palettes are the easiest way to specify colouring, we don’t have to let the palettes auto-choose colours, and can instead pass colors objects, just as we do for scenarios. This can be particularly useful with small numbers of groups (defining too many colours is cumbersome- that’s what palettes are for) when we want to control which is which. Just as with scenarios, we use make_pal(). Here, we will use 'scico::berlin' as the base, but define several ‘reference’ values manually. This demonstration uses includeRef = TRUE so we replace the palette values with the refs, rather than choose them from the set of values with refs removed. This tends to yield better spread of colours (and lets us sometimes ref colours and sometimes not if we also used returnUnref). For example, maybe we want to sometimes really accentuate ecosystem function and native vegetation, but not in all plots.

First, we create the palettes with and without the (garish) ref values.

obj_pal <- make_pal(
  levels = unique(obj_sdl_to_plot$env_group),
  palette = "scico::lisbon",
  refvals = c("EF", "NV"), refcols = c("purple", "orange"), includeRef = TRUE, returnUnref = TRUE
)

Then we can create an accentuated plot sometimes, if, perhaps, we want to highlight how EF performed.

plot_outcomes(obj_sdl_to_plot,
  outcome_col = "ewr_achieved",
  x_col = "climate_code",
  colorset = "env_group",
  pal_list = obj_pal$refcols,
  facet_col = "SWSDLName",
  facet_row = ".",
  sceneorder = sceneorder
) +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5))

But for other plots maybe we don’t want that accentuation and we can use the unrefcols to retain the standard colouring- note that ‘NF’, ‘OS’, and ‘WB’ colours are unchanged.

plot_outcomes(obj_sdl_to_plot,
  outcome_col = "ewr_achieved",
  x_col = "climate_code",
  colorset = "env_group",
  pal_list = obj_pal$unrefcols,
  facet_col = "SWSDLName",
  facet_row = ".",
  sceneorder = sceneorder
) +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5))