Line plots (quantitative x)
Overview
This notebook provides examples of creating line plots, e.g. plots with one quantitative y-axis for outcome, and a quantitative x-axis. The x-axis is typically a quantitative representation of the scenario, e.g. a temperature increase, or for the demonstration here, the flow multiplier. This can be a very powerful analysis, as we can actually investigate numerical changes between scenarios, identifying potential nonlinearities, predictions, and interpolation.
These plots also have the ability to use colour and different colour palettes to include additional information, including spatial unit and type of response. These settings are dealt with more completely in the bar_plots, with much of the mechanics the same for lines, e.g. the use of colorgroups
and a list for pal_list
.
For a qualitative x, we would typically use bar plots.
Demonstration setup
As usual, we need paths to the data. We use the ‘more scenarios’ examples for all the plots, with processing as in the website workflow.
Read in the data
We read in the example data we will use for all plots.
That has all the steps in the aggregation, but most of the plots here will only use a subset to demonstrate.
To make visualisation easier, the SDL units data is given a grouping column that puts the many env_obj
variables in groups defined by their first two letters, e.g. EF
for Ecosystem Function. These correspond to the ‘Target’ level, but it can be useful to have the two groupings together for some examples.
If we had used multiple aggregation functions at any step, we should filter down to the one we want here, but we only used one for this example.
For simplicity here, we will only look at a small selection of the scenarios (multiplicative changes of 0.5,1, and 2). Thus, we make two small dataframes for our primary examples here.
scenarios_to_plot <- c("climatedown2adapt0", "climatebaseadapt0", "climateup2adapt0")
scenarios <- yaml::read_yaml(file.path(hydro_dir, "scenario_metadata.yml")) |>
tibble::as_tibble()
basin_to_plot <- agged_data$mdb |>
dplyr::filter(scenario %in% scenarios_to_plot) |>
dplyr::left_join(scenarios, by = "scenario")
# Create a grouping variable
obj_sdl_to_plot <- agged_data$sdl_units |>
dplyr::filter(scenario %in% scenarios_to_plot) |>
dplyr::mutate(env_group = stringr::str_extract(env_obj, "^[A-Z]+")) |>
dplyr::arrange(env_group, env_obj) |>
dplyr::left_join(scenarios, by = "scenario")
Standard scenario appearance
We will typically have a consistent look for the scenarios across the project, with a logical ordering and standard colours. Such standard colours are not included in the {HydroBOT} package because they are project/analysis- specific, but they could be set at project-level, e.g. in the .Rprofile
, if desired.
Here, we use the special arguments refvals
and refcols
to make a colour palette from a standard {paletter} option (“ggsci::nrc_npg”) while setting a specific level to a specified value. We will use the codes (see scenario definitions) rather than the names to make plots readable.
There is a sceneorder
argument to plot_outcomes()
that lets us explicitly set the order of the scenarios. However, it is typically easiest to simply make the scenarios a factor, though we use the sceneorder
argument here. It operates only on a column named ‘scenario’, though, so if other columns need to be ordered they should be made factors before feeding to plot_outcomes()
.
sceneorder <- forcats::fct_reorder(
scenarios$scenario,
scenarios$flow_multiplier
)
scene_pal <- make_pal(unique(scenarios$climate_code),
palette = "ggsci::nrc_npg",
refvals = "E", refcols = "black"
)
scene_pal
<colors>
black #E64B35FF #4DBBD5FF #00A087FF #3C5488FF #F39B7FFF #8491B4FF #91D1C2FF #DC0000FF #7E6148FF
Make line plots
We make two sorts of line plots- either straight lines through all the data points, which shows exactly what the results are, and ‘smoothed’ lines, which are fit to the data in some way to summarise a group of outputs. These can yield smoothed curves, but can also be linear regressions or other fits available from ggplot2::geom_smooth
.
Lines through all data
A simple plot would be to look at all the outcomes, separated by colour (though we ignore the added water scenarios). Even this simple plot is quite informative- we can see that the env_obj
outcomes are differently sensitive to both decreases and increases in flow, and that this differs across space.
sdl_line <- obj_sdl_to_plot |>
filter(adapt_code == 1) |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "flow_multiplier",
colorgroups = NULL,
colorset = "env_obj",
pal_list = list("scico::berlin"),
facet_row = "SWSDLName"
)
sdl_line
Separate values within groups
We might not care so much about individual outcomes, but about their groupings, and we can plot those in colour for each env_obj, according to the larger grouping it is part of by changing colorset = 'env_group'
. We need to use point_group
here to separate out the points for each env_obj
, since several will belong to each env_group
.
This plot also demonstrates the use of some additional arguments. We’re also using transx
to log the x-axis, which is particularly appropriate for the multiplicative flow scaling in this demonstration. We also log the y-axis with transoutcome
since we’re using a base_list
to baseline (with comp_fun = relative
to look at the multiplicative shift in each env_obj to baseline). We’re using various *_lab
arguments to adjust the labelling.
The group_cols
argument in base_list
is needed to specify unique rows for the baselining, and so must be chosen to provide appropriate groupings (unique levels) for that baselining. For example, if we are baselining scenarios, we need to include every column that might vary within scenarios. This might differ from point_group
, which only determines the visual grouping with group
in ggplot2::aes()
.
Scientifically, one important thing to note here is that the range on y (0-10) is much greater than the range on x (0.3 - 3), and so (unsurprisingly), some outcomes are disproportionately impacted by flow. Other outcome values are less than the relative shift in flow, and so there are others that are disproportionately insensitive. These disproportionate responses also depend on whether flows decrease or increase- they are not symmetric.
sdl_line_options <- obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "flow_multiplier",
y_lab = "Proportion met",
x_lab = "Change in flow",
transx = "log10",
transoutcome = "log10",
color_lab = "Environmental\ngroup",
colorset = "env_group",
pal_list = list("scico::berlin"),
point_group = "env_obj",
facet_row = "SWSDLName",
sceneorder = sceneorder,
base_list = list(
base_lev = "climatebaseadapt0",
comp_fun = "relative",
group_cols = c("env_obj", "polyID")
)
)
sdl_line_options
We can also give the groups different palettes, as demonstrated more completely in the bar plots and causal networks. Now, we don’t need point_group
anymore, since the colours are assigned to the unique env_obj
s.
# Create a palette list
grouplist <- list(
EF = "grDevices::Purp",
NF = "grDevices::Mint",
NV = "grDevices::Burg",
OS = "grDevices::Blues",
WB = "grDevices::Peach"
)
sdl_line_groups <- obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "flow_multiplier",
y_lab = "Proportion met",
x_lab = "Change in flow",
transx = "log10",
transoutcome = "log10",
color_lab = "Environmental\ngroup",
colorgroup = "env_group",
colorset = "env_obj",
pal_list = grouplist,
facet_row = "SWSDLName",
base_list = list(
base_lev = "climatebaseadapt0",
comp_fun = "relative",
group_cols = c("env_obj", "polyID")
)
)
sdl_line_groups
That’s fairly complex, so we can facet it, as we did with the bars to make the individual env_obj
s easier to see.
sdl_line_groups_facet <- obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "flow_multiplier",
y_lab = "Proportion met",
x_lab = "Change in flow",
transx = "log10",
transoutcome = "log10",
color_lab = "Environmental\ngroup",
colorgroup = "env_group",
colorset = "env_obj",
pal_list = grouplist,
facet_row = "SWSDLName",
facet_col = "env_group",
base_list = list(
base_lev = "climatebaseadapt0",
comp_fun = "relative",
group_cols = c("env_obj", "polyID")
)
)
sdl_line_groups_facet
The above is typically how we would go about this facetting, but it is worth reiterating that these are just ggplots, and so we can post-hoc add facetting. Using the version with only spatial facetting (Figure 1), we can add the env_group
facet on, matching Figure 2. Note that we re-build all the facets here, due to the specification of ggplot2::facet_grid
.
sdl_line_groups + facet_grid(SWSDLName ~ env_group)
As with the bar plots, we can colour by any column we want, and the spatial units is a logical choice. We again use point_group
, since multiple env_obj
rows are mapped to each colour (spatial unit). The overplotting gets unreadable here and so I’ve retained some facetting, but the best combination of colours, row facets, and column facets will be project dependent and also depend on the number of values being plotted. Another solution is to summarise the data with a smoother- see Section 4.2.
sdl_line_sdl <- obj_sdl_to_plot |>
# filter(env_group == "EF") |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "flow_multiplier",
y_lab = "Proportion met",
x_lab = "Change in flow",
transx = "log10",
transoutcome = "log10",
color_lab = "SDL unit",
colorset = "SWSDLName",
pal_list = list("ggsci::default_jama"),
point_group = "env_obj",
facet_col = "env_group",
base_list = list(
base_lev = "climatebaseadapt0",
comp_fun = "relative",
group_cols = c("env_obj", "polyID")
)
)
sdl_line_sdl
env_obj
.
Smoothing (fit lines)
We can use smoothing to fit lines through multiple points, e.g. if we want to group data in some way- maybe use it to put a line through the colour groups and ignore individual levels. We demonstrate here using them to illustrate unique outcomes, as well as more typical uses as lines of best fit that aggregate over a number of outcomes.
To get smoothed lines, we use smooth_arglist = list()
. By default, (just an empty list()
), that produces a loess fit (as with ggplot2::geom_smooth
, but we can also pass arguments to geom_smooth()
in smooth_arglist
, and so allow things like lm
and glm
fits.
Unique points
Fitting lines through unique points at each scenario level is a bit contrived vs. just plotting lines as above, but it can be useful if we want to accentuate nonlinear relationships. Linear fits are possible too, though these are typically less useful.
With unique points, this just fits a single curved line through each env_obj
. Recapitulating the above, we colour here from SDL unit.
sdl_smooth_sdl <- obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "flow_multiplier",
y_lab = "Proportion met",
x_lab = "Change in flow",
transx = "log10",
color_lab = "Catchment",
colorgroups = NULL,
colorset = "SWSDLName",
point_group = "env_obj",
pal_list = list("ggsci::default_jama"),
facet_row = "env_group",
base_list = list(
base_lev = "climatebaseadapt0",
comp_fun = "relative",
group_cols = c("env_obj", "polyID")
),
smooth_arglist = list()
)
sdl_smooth_sdl
And we can do the same for environmental groupings. This makes much more sense as an analysis if we fit the line through the group, as we do in Section 4.2.2. We use zero_adjust = 'auto'
to adjust values off zero for the log transform of y.
sdl_smooth_groups <- obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "flow_multiplier",
y_lab = "Proportion met",
x_lab = "Change in flow",
transx = "log10",
transoutcome = "log10",
color_lab = "Environmental grouping",
colorgroups = NULL,
colorset = "env_group",
point_group = "env_obj",
pal_list = list("scico::berlin"),
facet_row = "env_group",
facet_col = "SWSDLName",
base_list = list(
base_lev = "climatebaseadapt0",
comp_fun = "relative",
group_cols = c("env_obj", "polyID")
),
smooth_arglist = list(),
zero_adjust = "auto"
)
sdl_smooth_groups
Using method = 'lm'
in smooth_arglist
yields a linear fit (and any other method
accepted by ggplot2::geom_smooth()
should work). It does not recapitulate the simple lines above, however, because it fits the line through all the scenario data points, rather than simply joining them together. I have passed se = FALSE
to ggplot2::geom_smooth()
here because with unique groups the standard errors are enormous. Again, this is a bit contrived unless we want to look for a trend for each outcome at this level. A more typical way of looking at things would be to look at the trend across lots of values within some group, Section 4.2.2.
obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "flow_multiplier",
y_lab = "Proportion met",
x_lab = "Change in flow",
transx = "log10",
color_lab = "Environmental grouping",
colorset = "env_group",
point_group = "env_obj",
pal_list = list("scico::berlin"),
facet_row = "env_group",
facet_col = "SWSDLName",
base_list = list(
base_lev = "climatebaseadapt0",
comp_fun = "relative",
group_cols = c("env_obj", "polyID")
),
smooth_arglist = list(method = "lm", se = FALSE)
)
Fit multiple points
Fitting lines is most often associated with things like regression and loess smoothing, where we use it to aggregate over a number of datapoints to find the line of best fit. We can do that here, simply by not having all points accounted for across the facetting, point_group
, and colorset
.
group_cols
in base_list
should still include unique values, because group_cols
determines the baselining (e.g. what gets compared), not the plot groupings.
One example would be to perform the same analysis as in Figure 3, but instead of plotting each point, fit a line to show the mean change within each SDL unit. We’ve pulled env_obj
out of point_group
, but left it in group_cols
, because we still want each env_obj
baselined with itself, not to the mean of env_group
. Now, we can look at all the env_groups
, because there are far fewer lines and so the overplotting isn’t an issue.
We use a small add_eps
to avoid zeros and allow all data to be relativised and plotted.
sdl_fit_sdl <- obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "flow_multiplier",
y_lab = "Proportion met",
x_lab = "Change in flow",
transx = "log10",
transoutcome = "log10",
color_lab = "SDL unit",
colorset = "SWSDLName",
pal_list = list("ggsci::default_jama"),
facet_wrapper = "env_group",
base_list = list(
base_lev = "climatebaseadapt0",
comp_fun = "relative",
add_eps = "auto",
group_cols = c("env_obj", "polyID")
),
smooth_arglist = list()
)
sdl_fit_sdl
env_obj
.
We can make a very similar plot, looking at the environmental groups, a smooth fit of Figure 1 . We use a position
argument (which passes to {ggplot2}, and so has the same syntax) to see overplotted points, and an add_eps
to avoid zeros to relativise and plot all the data.
sdl_fit_groups <- obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "flow_multiplier",
y_lab = "Proportion met",
x_lab = "Change in flow",
transx = "log10",
transoutcome = "log10",
color_lab = "Environmental\ngroup",
colorset = "env_group",
pal_list = list("scico::berlin"),
facet_row = "SWSDLName",
facet_col = ".",
base_list = list(
base_lev = "climatebaseadapt0",
comp_fun = "relative",
add_eps = "auto",
group_cols = c("env_obj", "polyID")
),
smooth_arglist = list(),
position = position_jitter(width = 0.01, height = 0)
)
sdl_fit_groups
env_obj
.
As we saw above, we can use method = 'lm'
to plot a regression, though in general we do not expect these relationships to be linear, and mathematically characterising them will be a complex task that is not the purview of plotting.
A linear fit of the SDL units ( Figure 6 ) is one example of how this might work. It is useful to know here that deviations from a 1:1 line on logged axes as here means that the outcomes are responding disproportionately more (steeper) or less (shallower) than the underlying changes to flow.
sdl_lm_sdl <- obj_sdl_to_plot |>
plot_outcomes(
outcome_col = "ewr_achieved",
x_col = "flow_multiplier",
y_lab = "Proportion met",
x_lab = "Change in flow",
transx = "log10",
transoutcome = "log10",
color_lab = "SDL unit",
colorset = "SWSDLName",
pal_list = list("ggsci::default_jama"),
facet_wrapper = "env_group",
base_list = list(
base_lev = "climatebaseadapt0",
comp_fun = "relative",
add_eps = "auto",
group_cols = c("env_obj", "polyID")
),
smooth_arglist = list(method = "lm")
)
sdl_lm_sdl
env_obj
.