library(werptoolkitr)
library(dplyr)
Causal Network Plotting
Plots of causal networks
This document demonstrates and discusses the causal network plotting code. There are many reasons to present causal networks, each of which likely will require different characteristic plots. This develops some of the foundational work needed to allow a flexible range of those uses. Causal networks are inherently complex, and there are different aspects we might want to accentuate, and some capability we need to prune/focus them. Thus, {werptoolkitr} provides a number of helper functions and plotting options.
Why plot causal networks?
Show the structure of the network, reduce black box
Present results information reflected in node size/colour, edge colour/width, etc
Capacity requirements
Filtering the parts of the network plotted
Controlling plot attributes/aesthetics
Ability to incorporate toolkit outputs
Flexibility depending on particular goals
- Detailed, broad, area of focus, etc
Get the network relationships
First we need the relationships defining the network, which are extracted from tables in {werptoolkitr} and provided as the dataset werptoolkitr::causal_ewrs
.
Process data for network plots
Causal networks need an edges dataframe specifying all pairwise connections, and a nodes dataframe specifying the nodes and their attributes. We make those here (and not in the data cleaning stage) for a couple reasons.
The full set of possible links is massively factorial, and so we want to choose only those useful for the needs of a given analysis.
- Analyses may differ depending on network detail, spatial resolution, or use in the toolkit outside the network (e.g. theme aggregation)
A key feature of the network isn’t just the existence of connections, but their directionality, and so we want to specify that explicitly.
We want to have the general ability to make edges and nodes available for other uses, e.g. aggregations
- {werptoolkitr} exports
make_edges
andmake_nodes
, along with some other network-manipulation functions
- {werptoolkitr} exports
To build the nodes and edges for a specific plot or set of plots, we first build a dataframe of edges, and then extract nodes.
Construct edge dataframe
The edges
dataframe contains all pairwise links between nodes in from
and to
columns. To get that, we need to pass it dataframes specifying links. There will usually be multiple datasets, reflecting the differing scales of the nodes (as in the causal_ewrs
list provided by {werptoolkitr}). These dataframes may include many columns of potential nodes, e.g. they might provide the mapping for several steps in the network. Thus, we need to provide the node columns we actually want to map and their directionality- what are the ‘from-to’ pairings. We may want to filter to only some subset of nodes; for example we may only be interested in the environmental objectives related to waterbirds. Further, we will likely want to filter by geography, currently possible by either gauge or planning unit (deprecated).
As an example, we can make the relationships present at gauge 409025 linking EWRs to environmental objectives, environmental objectives to specific goals, Specific goals to Targets, and environmental objectives to 5-year targets. There are many more possible connections to include in the fromtos
, which to include will depend on the questions being asked. I’ve just chosen these for a quick demo.
<- make_edges(dflist = causal_ewr,
edges fromtos = list(c('ewr_code', 'env_obj'),
c('env_obj', 'Specific_goal'),
c('Specific_goal', 'Target'),
c('env_obj', 'target_5_year_2024')),
gaugefilter = '409025')
edges
There’s also the opportunity to filter the specific nodes to include with fromfilter
and tofilter
. This allows things like filtering the particular nodes within those node categories (e.g. env_obj
s related to waterbirds). However, it is typically better to use find_related_nodes
after creation of the network, as that does network-aware filtering.
Although we can specify defaults, this function is also reasonably generic and so can be used far beyond whatever defaults we set- it only depends on WERP-specific things in that the spatial filtering happens on gauge
(and those are cross-referenced). For a particular set of analyses, we would typically set default fromtos
lists, most relevantly in the Aggregator and Comparer. If we are producing plots for illustrating the network, there may be ad-hoc adjustments to that list.
Construct node dataframe
The node dataframe defines the ‘boxes’. The simplest way to make it is to extract it from the edges. Basically, we just grab all the nodes that are in either the from
or to
columns of the edges
df. The make_nodes
function does a bit more than just get unique node values from the edges
df, it also attaches a column specifying the node order, reflecting their sequence in the causal network. There is a default sequence specified for WERP EWRs, but others can be specified with the typeorder
argument. We expect that new default sequences will need to be created when new sorts of relationships come online.
<- make_nodes(edges)
nodes # look at that for demo
nodes
We can now create a minimal plot before digging back in to demo some options under the hood.
Plots
Plotting takes these dataframes, adds some attributes, and generates the plots. We use make_causal_plot
for this, which is mostly a wrapper and cleanup around functions to do final filtering of the data (find_related_nodes
), attach attributes/aesthetics (node_plot_atts
and causal_colors_general)
, and construct the plot itself using DiagrammeR
. There are also some switches around saving and printing.
We will look under the hood at those later (Plot setup- more options and next steps)- there’s quite a bit of functionality there to do things like isolate relevant parts of the network, colour within node groups or by the outcomes of the toolkit (e.g. change in relationships between scenarios or native fish populations).
First, let’s just make a default plot. By default it saves the network to an object so we can modify/use it later. We can turn that off (returnnetwork = FALSE
), along with switches for whether to render in place and/or save as png and pdf (save = TRUE
). If render = TRUE
, it renders in place just fine when using as a notebook, but doesn’t get picked up when the quarto renders to html (and presumably pdf etc). So for these demonstrations, we’ll use a two-step process of returning the network and then rendering it.
<- make_causal_plot(nodes, edges, render = FALSE)
default_network ::render_graph(default_network) DiagrammeR
That’s clearly a bit much, even just for a single gauge. One thing we could do would be to just drop the 5-year nodes, (now piping the network straight to render_graph
to avoid clutter). We do this drop simply here by filtering the nodes
dataframe. Without those nodes available, the relevant parts of the network just get dropped.
make_causal_plot(filter(nodes, NodeType != 'target_5_year_2024'),
render = FALSE) %>%
edges, ::render_graph() DiagrammeR
Node-relevant network
One thing we’re fairly likely to want to do is ask about the connections that relate to a node or a small set of nodes. To do that, we need to be able to traverse the network upstream and downstream, using the focalnodes
argument, which calls the find_related_nodes
function. This is a more complete network restructuring than just filtering a target level, as we did above, because it traces the full network and only returns nodes at any level that relate to the targets. Now we can include the 5-year targets again because we’ve reduce the nodes. Note that the focalnodes
don’t have to be related to each other or at the same level- find_related_nodes
prunes the network to all connections involving all the focalnodes
.
make_causal_plot(nodes, edges,
focalnodes = c('NF4', 'Sloanes froglet'), render = FALSE) %>%
::render_graph() DiagrammeR
Plot setup- more options and next steps
The above is running with defaults, but there’s quite a bit more capacity to change what is plotted and the look of the graphs. Primarily, I’ve focused on development related to how we’ll want to feed the outputs of the toolkit to the the network (e.g. shifts in the relationships or values of the nodes, such as fewer birds- see for example theme aggregation and overview presentation. That could be done with colour, node size, or edge penwidth.
Colour can be specified differently than is done by default above, such as coloring the nodes within the node groups by outcome, or assigning different node groups different color palettes, following the same idea as the generic colorgroups
and colorset
arguments in all the plotting functions. There’s an obvious step of making it interactive/dynamic, but that hasn’t been implemented yet.
Colour to indicate a value
Edges
Edges we want to be able to have colour in a column, as it would be if it came in as results from the toolkit. For example, we might want to colour the edges by change between scenarios, or strength of relationships. Down the track we could similarly alter penwidth as well.
As a demonstration, let’s add a value
column to edges
as a mock-up of the toolkit outputs and plot according to that. I’ll also use a continuous palette here rather than the default, since this is now a continuous variable. I’ll use a smaller network to make things visible. Note that above we removed the 5-year targets from the nodes
df, and here we use edges
since we’re already modifying it. Either approach can drop a set of nodes, though it’s generally easier to use the nodes since nodes can appear in either the from
or to
of the edges.
<- edges %>%
edgewithvals filter(totype != 'target_5_year_2024') %>%
mutate(value = rnorm(n()))
make_causal_plot(nodes,
edgewithvals,focalnodes = c('NF4', 'Sloanes froglet'),
edge_pal = list(value = 'viridis::plasma'),
edge_colorset = 'value', render = FALSE) %>%
::render_graph() DiagrammeR
Nodes (and single-colour edges)
We can also colour the nodes by results. Here I’ve also set the edges just to a single color - feeding edge_pal
or node_pal
a single character value specifying a colour or a character vector of length nrow
of the relevant dataframe will just insert those values and bypass the palettes. Again, I start by dummying up some ‘toolkit results’ in a value
column. Examples where these values do come out of EWR results is in the theme aggregation notebook and the overview presentation.
<- nodes %>%
nodewithvals filter(NodeType != 'target_5_year_2024') %>%
mutate(value = rnorm(n()))
make_causal_plot(nodewithvals,
edges,focalnodes = c('NF4', 'Sloanes froglet'),
edge_pal = 'black',
node_pal = list(value = 'scico::oslo'),
node_colorset = 'value', render = FALSE) %>%
::render_graph() DiagrammeR
Colour within node groups
We might want to use different colour palettes within the different node groups, but colour the nodes themselves within them. To do that, we set the *_pal
arguments as named lists of palettes, and also pass *_colorgroups
arguments so it knows how to split the data into those palettes. This parallels the use in plot_outcomes
for other plot types, where colorgroups
is the groups that get the palette, while colorset
are the individual units within each group that receive colors from the respective palette. Here, we demonstrate with nodes and plot the whole network so we can see what’s happening.
First, set the list of palettes. This could be set by default for the project, along with other default colors.
= list(ewr_code = 'viridis::mako',
node_list_c env_obj = 'viridis::plasma',
Specific_goal = 'scico::oslo',
Target = 'scico::hawaii',
target_5_year_2024 = 'scico::lisbon')
Then, make the network
make_causal_plot(nodes, edges,
edge_pal = 'black',
node_pal = node_list_c,
node_colorgroups = 'NodeType',
node_colorset = 'Name',render = FALSE) %>%
::render_graph() DiagrammeR
Groupings within node groups
It’s possible but gets rapidly bespoke to break those up into more discrete chunks. I’ve had a crude go at establishing a default though, where I grouped EWRs and environmental objectives by their main group, and lumped all the targets by year (though only including 5-year in this example). That default can be accessed by passing 'werp'
to as the node_colorset
. This is an end-run that builds a new column to use as colorset
, defined according to some defaults. The same thing could be done externally by creating a new column to define the colorset
outside the function and then making that column colorset
.
In that case, we might want to use a different set of palettes,
<- list(ewr_code = 'viridis::mako',
node_list_g env_obj = 'ggthemes::excel_Green',
Specific_goal = 'scico::oslo',
Target = 'calecopal::superbloom3',
target_5_year_2024 = 'calecopal::eschscholzia')
This lets us set the within-node_colorgroups palettes. The node_colorset = 'werp'
just creates a new column in the data that gives rows values we want (e.g. the first two letters of the ewr_codes), which are then used to choose colors from that particular palette. It’s a way to have fewer colors (and more meaningful colors) within the colorgroups.
make_causal_plot(nodes, edges,
edge_pal = 'black',
node_pal = node_list_g,
node_colorgroups = 'NodeType',
node_colorset = 'werp', render = FALSE) %>%
::render_graph() DiagrammeR
Directions from here
There’s clearly a lot more that could be done here. The nature of the output and the way I’ve set it up are really aimed at being interactive and usable to investigate the network.
The setup here is actually quite similar to the other plot outputs. In all cases, there is much opportunity to adjust the look of the plots to target different uses, but also the ability to establish a consistent default (and look). These plots lend themselves to both notebooks (as here), web, and interactive interfaces (Shiny, observablejs, etc) to investigate the networks themselves or use them to plot results. That gives flexibility to get at whatever the particular question is. Though causal networks are not a typical way to present outputs, here we see that they can be incredibly powerful for not only showing the relationships, but understanding how they change under different scenarios.