Bar plots
Overview
This notebook provides examples of creating bar plots, e.g. plots with one quantitative y-axis for outcome, and a qualitative x-axis. The x-axis is often, but not always, the scenarios. We also demonstrate here the ability to use colour and different colour palettes to include additional information, including spatial unit or type of response.
For a quantitative x-axis, we would typically use line plots.
Demonstration setup
As usual, we need paths to the data. We use the ‘more scenarios’ examples for all the plots, with processing as in the website workflow.
Read in the data
We read in the example data we will use for all plots.
That has all the steps in the aggregation, but most of the plots here will only use a subset to demonstrate.
To make visualisation easier, the SDL units data is given a grouping column that puts the many env_obj
variables in groups defined by their first two letters, e.g. EF
for Ecosystem Function. These correspond to the ‘Target’ level, but it can be useful to have the two groupings together for some examples.
If we had used multiple aggregation functions at any step, we should filter down to the one we want here, but we only used one for this example.
For simplicity here, we will only look at a small selection of the scenarios (multiplicative changes of 0.5,1, and 2). Thus, we make two small dataframes for our primary examples here.
scenarios_to_plot <- c("climatedown2adapt0", "climatebaseadapt0", "climateup2adapt0")
scenarios <- yaml::read_yaml(file.path(hydro_dir, "scenario_metadata.yml")) |>
tibble::as_tibble()
basin_to_plot <- agged_data$mdb |>
dplyr::filter(scenario %in% scenarios_to_plot) |>
dplyr::left_join(scenarios, by = "scenario")
# Create a grouping variable
obj_sdl_to_plot <- agged_data$sdl_units |>
dplyr::filter(scenario %in% scenarios_to_plot) |>
dplyr::mutate(env_group = stringr::str_extract(env_obj, "^[A-Z]+")) |>
dplyr::arrange(env_group, env_obj) |>
dplyr::left_join(scenarios, by = "scenario")
Standard scenario appearance
We will typically have a consistent look for the scenarios across the project, with a logical ordering and standard colours. Such standard colours are not included in the {HydroBOT} package because they are project/analysis- specific, but they could be set at project-level, e.g. in the .Rprofile
, if desired.
Here, we use the special arguments refvals
and refcols
to make a colour palette from a standard {paletter} option (“ggsci::nrc_npg”) while setting a specific level to a specified value. We will use the codes (see scenario definitions) rather than the names to make plots readable.
There is a sceneorder
argument to plot_outcomes()
that lets us explicitly set the order of the scenarios. However, it is typically easiest to simply make the scenarios a factor, though we use the sceneorder
argument here. It operates only on a column named ‘scenario’, though, so if other columns need to be ordered they should be made factors before feeding to plot_outcomes()
.
sceneorder <- forcats::fct_reorder(
basin_to_plot$scenario,
basin_to_plot$flow_multiplier
)
scene_pal <- make_pal(unique(basin_to_plot$climate_code),
palette = "ggsci::nrc_npg",
refvals = "E", refcols = "black"
)
scene_pal
<colors>
black #E64B35FF #4DBBD5FF
Make bar plots
Scenario fills
Basin scale
We can make plots looking at how scenarios differ for each of the outcome categories for a simple case of only one outcome. This uses facet_wrapper
to just wrap the single facet axis.
The colorset
argument is the column that determines colour, while the pal_list
defines those colours, here as a named colors
object, but as we see below it can also be palette names.
plot_outcomes(basin_to_plot,
outcome_col = "ewr_achieved",
x_col = "climate_code",
facet_wrapper = "Target",
colorset = "climate_code",
pal_list = scene_pal,
sceneorder = sceneorder
)
HydroBOT retains the axis names as-is from the incoming dataframe, as they provide the true meaning of each value. But we can change them, either inside the plot_outcomes()
function (here for y) or post-hoc with ggplot2::labs()
(here for x). We can also set the sceneorder
with a character vector if that’s easier than setting up a Factor or if we want to change them around for some reason. Because the outputs of plot_outcomes()
are just ggplot objects, changing the labels outside the function can be very useful for checking that each axis is in fact what we think it is before giving it clean labels.
plot_outcomes(basin_to_plot,
outcome_col = "ewr_achieved",
y_lab = "Proportion Objectives\nAchieved",
x_col = "climate_code",
color_lab = "Scenario",
facet_wrapper = "Target",
colorset = "climate_code",
pal_list = scene_pal,
sceneorder = c("climateup2adapt0", "climatebaseadapt0", "climatedown2adapt0")
) +
labs(x = "Scenario")
Another approach is to put other groupings on the x-axis, and colour by scenario.
plot_outcomes(basin_to_plot,
outcome_col = "ewr_achieved",
x_col = "Target",
colorset = "climate_code",
pal_list = scene_pal,
sceneorder = sceneorder
)
We can pass position = 'dodge'
to use dodged bars for clearer comparisons, particularly accentuating the variation in sensitivity of the different outcomes to the scenarios.
plot_outcomes(basin_to_plot,
outcome_col = "ewr_achieved",
x_col = "Target",
colorset = "climate_code",
pal_list = scene_pal,
sceneorder = sceneorder,
position = "dodge"
)
SDL units
We can use the aggregation step of env_obj
and SDL units to demonstrate plotting that not only addresses the outcomes for scenarios, but how they differ across space.
First, we look at how the different scenarios perform for the Ecosystem Function objectives in each SDL unit. We also use the ggplot2
functionality to remove the x-axis label, since it is redundant with the colour.
Often when we have multiple dimensions, we’ll want to do a simple filter to relevant subsets of the data for readability. When this filtering occurs, it is almost always a good idea to do it on the fly (as here), to avoid errors associated with losing track of data manipulations and instead start each figure with the full dataset.
obj_sdl_to_plot |>
filter(grepl("^EF", env_obj)) |>
plot_outcomes(
outcome_col = "ewr_achieved",
facet_col = "env_obj",
facet_row = "SWSDLName",
colorset = "climate_code",
pal_list = scene_pal,
sceneorder = sceneorder
) +
theme(axis.text.x = element_blank(), axis.title.x = element_blank())
We address a few ways to handle groups of outcome variables, one of the simplest is to simply facet these plots by those groups, with all the outcomes in the group getting their own bars. This puts the theme levels on x and colours by scenario, with the groups accentuated by facets. These can be stacked (position = 'stack'
- the default) or dodged (demonstrated here).
dodgefacet <- obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "env_obj",
colorset = "climate_code",
facet_row = "SWSDLName",
facet_col = "env_group",
scales = "free_x",
pal_list = scene_pal,
sceneorder = sceneorder,
position = "dodge"
)
dodgefacet + theme(legend.position = "bottom") +
labs(x = "Environmental Objective")
colours from outcomes
Rather than facetting, we can stack each of the outcome categories (here, Target groups). To do this, we simply change the colorset
to “Target” instead of ‘scenario’ (the default x_col
is scenario, so it remains on x, but it is generally better to specify it explicitly. This is especially true here, where the ‘climate_code’ column is a better description). We also change the pal_list
to a {paletteer} name, providing the palette from which to choose colours for each Target. If we wanted to retain these colours across the project, we would define a palette for the Targets as we did above for scenarios.
This yields a more compact plot (Figure 1) that shows overall outcomes without as much duplication. The stacking here of the env_obj
outcomes in each group while colouring them all the same is itself a sort of simple aggregation. The approach next with position = "dodge"
is generally better (Figure 2).
plot_outcomes(basin_to_plot,
outcome_col = "ewr_achieved",
colorset = "Target",
pal_list = list("scico::tokyo"),
sceneorder = sceneorder
) +
guides(fill = guide_legend(ncol = 2)) +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5))
In another example of colours from other information in the data, when we have multiple spatial units we might colour by them instead of colouring by outcome category. Here, we show how to colour by SDL unit (SWSDLName) instead of env_obj
. We also explicitly use x_col
to use climate code and not the full names.
plot_outcomes(obj_sdl_to_plot,
outcome_col = "ewr_achieved",
x_col = "climate_code",
colorset = "SWSDLName",
pal_list = list("ggsci::default_jama"),
sceneorder = sceneorder
)
That shows that while all SDL units are affected by the changes in the flow, the Lachlan is less sensitive.
We can also use position = 'dodge'
to have side-by-side bars instead of stacked (Figure 3). Note that in this case, where we’re colouring by SDL unit but there are many env_obj
values, those env_obj
no longer stack, and so we have to manually stack them by calculating their sum. This would not be the case if we were colouring by individual rows (env_obj
)- see examples of that below.
obj_sdl_to_plot |>
group_by(SWSDLName, scenario) |>
summarise(ewr_achieved = sum(ewr_achieved, na.rm = TRUE)) |>
ungroup() |>
plot_outcomes(
outcome_col = "ewr_achieved",
colorset = "SWSDLName",
pal_list = list("ggsci::default_jama"),
sceneorder = sceneorder,
position = "dodge"
)
If we have multiple levels of groupings, we can colour by the groups directly if we don’t care what the individual env_obj
s are doing between them. This is very similar to the plots colouring by SDL unit above.
plot_outcomes(obj_sdl_to_plot,
outcome_col = "ewr_achieved",
x_col = "climate_code",
colorset = "env_group",
pal_list = list("scico::berlin"),
facet_col = "SWSDLName",
sceneorder = sceneorder
) +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5))
Grouped colours
HydroBOT can assign different colour palettes to different sets of outcomes, yielding what is essentially another axis on which we can plot information. We use this same ability across a number of plot types, particularly causal networks. For example, we might categorise the env_obj
outcomes into the larger scale groups (e.g. ‘NF’, ‘EF’, etc). We can then assign each of these a separate palette, and so the individual env_obj
s get different colours chosen from different palettes.
Achieving this requires specifying two columns- the colorset
, as above, is the column that determines colour. The colorgroups
column specifies the groupings of those colorset
values, and so what palette to use. Thus, the pal_list
needs to be either length 1 (everything gets the same palette) or length(unique(data$colorgroups))
. Note also that the colorset
values must be unique to colorgroups
- this cannot be a one-to-many mapping because each colorset
value must get a colour from a single palette defined by the colorgroup
it is in.
We demonstrate with env_obj
variables mapped to larger environmental groups, making it easier to see at a glance the sorts of environmental objectives that are more or less affected, while also allowing views of the individual environmental objectives. Here we use facet_col
and facet_row
to ensure the SDL units don’t wrap around. We made the env_groups
column when we chose the data initially.
# Create a palette list
env_pals <- list(
EF = "grDevices::Purp",
NF = "grDevices::Mint",
NV = "grDevices::Burg",
OS = "grDevices::Blues",
WB = "grDevices::Peach"
)
# need to facet by space sdl unit and give it the colorgroup argument to take multiple palettes
obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "climate_code",
colorgroups = "env_group",
colorset = "env_obj",
pal_list = env_pals,
facet_col = "SWSDLName",
facet_row = "."
) +
theme(legend.key.size = unit(0.5, "cm"))
Adding facetting by those groups can make that easier to read if the goal is to focus on changes within groups, but more plots.
obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "climate_code",
colorgroups = "env_group",
colorset = "env_obj",
pal_list = env_pals,
facet_col = "SWSDLName",
facet_row = "env_group"
) +
theme(legend.key.size = unit(0.5, "cm"))
We could also split those bars sideways instead of stack them, but that likely makes more sense if there are fewer categories than here. We again use position = 'dodge'
, but now we don’t need to sum because we’re stacking each row already. I’ve flipped the facetting and taken advantage of the fact that these are just ggplot objects to remove the legend, making it very slightly easier to read (but harder to interpret). This gets very crowded with the full set of scenarios, so we can use the scenariofilter
argument to cut it to just a few (here, base and multiply and divide by 1.5).
obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "climate_code",
colorgroups = "env_group",
colorset = "env_obj",
pal_list = env_pals,
facet_col = "SWSDLName",
facet_row = "env_group",
position = "dodge"
) +
theme(legend.key.size = unit(0.5, "cm"))
Another approach to groups of outcomes without the colours explicitly grouped is to not use colorgroup
, but instead just facet by the group and give every colorset
value a colour from the same palette. Depending on the palette chosen and the breaks, this can be quicker, but will not accentuate groups as well.
obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
colorgroups = NULL,
colorset = "env_obj",
pal_list = list("scico::berlin"),
facet_row = "SWSDLName",
facet_col = "env_group",
scales = "free_x"
) +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5))
These plots are interesting, but in typical use, the plots above using facets for the groups or colouring by the groups themselves are likely to be easier to read, unless we really are interested in this level of granularity. Whatever approach we choose for a given plot, accentuating the differences between outcome groups can be a powerful interpretation tool.
Manual colour definition
Though the above examples using {paletteer} palettes are the easiest way to specify colouring, we don’t have to let the palettes auto-choose colours, and can instead pass colors
objects, just as we do for scenarios. This can be particularly useful with small numbers of groups (defining too many colours is cumbersome- that’s what palettes are for) when we want to control which is which. Just as with scenarios, we use make_pal()
. Here, we will use 'scico::berlin'
as the base, but define several ‘reference’ values manually. This demonstration uses includeRef = TRUE
so we replace the palette values with the refs, rather than choose them from the set of values with refs removed. This tends to yield better spread of colours (and lets us sometimes ref colours and sometimes not if we also used returnUnref
). For example, maybe we want to sometimes really accentuate ecosystem function and native vegetation, but not in all plots.
First, we create the palettes with and without the (garish) ref values.
Then we can create an accentuated plot sometimes, if, perhaps, we want to highlight how EF performed.
plot_outcomes(obj_sdl_to_plot,
outcome_col = "ewr_achieved",
x_col = "climate_code",
colorset = "env_group",
pal_list = obj_pal$refcols,
facet_col = "SWSDLName",
facet_row = ".",
sceneorder = sceneorder
) +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5))
But for other plots maybe we don’t want that accentuation and we can use the unrefcols
to retain the standard colouring- note that ‘NF’, ‘OS’, and ‘WB’ colours are unchanged.
plot_outcomes(obj_sdl_to_plot,
outcome_col = "ewr_achieved",
x_col = "climate_code",
colorset = "env_group",
pal_list = obj_pal$unrefcols,
facet_col = "SWSDLName",
facet_row = ".",
sceneorder = sceneorder
) +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5))