Causal Network Plotting

Author

Galen Holt

library(werptoolkitr)
library(dplyr)

Plots of causal networks

This document demonstrates and discusses the causal network plotting code. There are many reasons to present causal networks, each of which likely will require different characteristic plots. This develops some of the foundational work needed to allow a flexible range of those uses. Causal networks are inherently complex, and there are different aspects we might want to accentuate, and some capability we need to prune/focus them. Thus, {werptoolkitr} provides a number of helper functions and plotting options.

Why plot causal networks?
- Show the structure of the network, reduce black box
- Present results information reflected in node size/colour, edge colour/width, etc
  - See theme aggregation
Capacity requirements
- Filtering the parts of the network plotted
- Controlling plot attributes/aesthetics
- Ability to incorporate toolkit outputs
- Flexibility depending on particular goals
  - Detailed, broad, area of focus, etc

Get the network relationships

First we need the relationships defining the network, which are extracted from tables in {werptoolkitr} and provided as the dataset werptoolkitr::causal_ewrs.

Process data for network plots

Causal networks need an edges dataframe specifying all pairwise connections, and a nodes dataframe specifying the nodes and their attributes. We make those here (and not in the data cleaning stage) for a couple reasons.

The full set of possible links is massively factorial, and so we want to choose only those useful for the needs of a given analysis.
1. Analyses may differ depending on network detail, spatial resolution, or use in the toolkit outside the network (e.g. theme aggregation)
A key feature of the network isn’t just the existence of connections, but their directionality, and so we want to specify that explicitly.
We want to have the general ability to make edges and nodes available for other uses, e.g. aggregations
1. {werptoolkitr} exports make_edges and make_nodes, along with some other network-manipulation functions

To build the nodes and edges for a specific plot or set of plots, we first build a dataframe of edges, and then extract nodes.

Construct edge dataframe

The edges dataframe contains all pairwise links between nodes in from and to columns. To get that, we need to pass it dataframes specifying links. There will usually be multiple datasets, reflecting the differing scales of the nodes (as in the causal_ewrs list provided by {werptoolkitr}). These dataframes may include many columns of potential nodes, e.g. they might provide the mapping for several steps in the network. Thus, we need to provide the node columns we actually want to map and their directionality- what are the ‘from-to’ pairings. We may want to filter to only some subset of nodes; for example we may only be interested in the environmental objectives related to waterbirds. Further, we will likely want to filter by geography, currently possible by either gauge or planning unit (deprecated).

As an example, we can make the relationships present at gauge 409025 linking EWRs to environmental objectives, environmental objectives to specific goals, Specific goals to Targets, and environmental objectives to 5-year targets. There are many more possible connections to include in the fromtos, which to include will depend on the questions being asked. I’ve just chosen these for a quick demo.

edges <- make_edges(dflist = causal_ewr, 
               fromtos = list(c('ewr_code', 'env_obj'), 
                              c('env_obj', 'Specific_goal'), 
                              c('Specific_goal', 'Target'), 
                              c('env_obj', 'target_5_year_2024')),
               gaugefilter = '409025')

edges

There’s also the opportunity to filter the specific nodes to include with fromfilter and tofilter. This allows things like filtering the particular nodes within those node categories (e.g. env_objs related to waterbirds). However, it is typically better to use find_related_nodes after creation of the network, as that does network-aware filtering.

Although we can specify defaults, this function is also reasonably generic and so can be used far beyond whatever defaults we set- it only depends on WERP-specific things in that the spatial filtering happens on gauge (and those are cross-referenced). For a particular set of analyses, we would typically set default fromtos lists, most relevantly in the Aggregator and Comparer. If we are producing plots for illustrating the network, there may be ad-hoc adjustments to that list.

Construct node dataframe

The node dataframe defines the ‘boxes’. The simplest way to make it is to extract it from the edges. Basically, we just grab all the nodes that are in either the from or to columns of the edges df. The make_nodes function does a bit more than just get unique node values from the edges df, it also attaches a column specifying the node order, reflecting their sequence in the causal network. There is a default sequence specified for WERP EWRs, but others can be specified with the typeorder argument. We expect that new default sequences will need to be created when new sorts of relationships come online.

nodes <- make_nodes(edges)
# look at that for demo
nodes

We can now create a minimal plot before digging back in to demo some options under the hood.

Plots

Plotting takes these dataframes, adds some attributes, and generates the plots. We use make_causal_plot for this, which is mostly a wrapper and cleanup around functions to do final filtering of the data (find_related_nodes), attach attributes/aesthetics (node_plot_atts and causal_colors_general), and construct the plot itself using DiagrammeR. There are also some switches around saving and printing.

We will look under the hood at those later (Plot setup- more options and next steps)- there’s quite a bit of functionality there to do things like isolate relevant parts of the network, colour within node groups or by the outcomes of the toolkit (e.g. change in relationships between scenarios or native fish populations).

First, let’s just make a default plot. By default it saves the network to an object so we can modify/use it later. We can turn that off (returnnetwork = FALSE), along with switches for whether to render in place and/or save as png and pdf (save = TRUE). If render = TRUE, it renders in place just fine when using as a notebook, but doesn’t get picked up when the quarto renders to html (and presumably pdf etc). So for these demonstrations, we’ll use a two-step process of returning the network and then rendering it.

default_network <- make_causal_plot(nodes, edges, render = FALSE)
DiagrammeR::render_graph(default_network)

That’s clearly a bit much, even just for a single gauge. One thing we could do would be to just drop the 5-year nodes, (now piping the network straight to render_graph to avoid clutter). We do this drop simply here by filtering the nodes dataframe. Without those nodes available, the relevant parts of the network just get dropped.

make_causal_plot(filter(nodes, NodeType != 'target_5_year_2024'),
                 edges, render = FALSE) %>% 
  DiagrammeR::render_graph()

Node-relevant network

One thing we’re fairly likely to want to do is ask about the connections that relate to a node or a small set of nodes. To do that, we need to be able to traverse the network upstream and downstream, using the focalnodes argument, which calls the find_related_nodes function. This is a more complete network restructuring than just filtering a target level, as we did above, because it traces the full network and only returns nodes at any level that relate to the targets. Now we can include the 5-year targets again because we’ve reduce the nodes. Note that the focalnodes don’t have to be related to each other or at the same level- find_related_nodes prunes the network to all connections involving all the focalnodes.

make_causal_plot(nodes, edges, 
                 focalnodes = c('NF4', 'Sloanes froglet'), render = FALSE) %>% 
  DiagrammeR::render_graph()

Plot setup- more options and next steps

The above is running with defaults, but there’s quite a bit more capacity to change what is plotted and the look of the graphs. Primarily, I’ve focused on development related to how we’ll want to feed the outputs of the toolkit to the the network (e.g. shifts in the relationships or values of the nodes, such as fewer birds- see for example theme aggregation and overview presentation. That could be done with colour, node size, or edge penwidth.

Colour can be specified differently than is done by default above, such as coloring the nodes within the node groups by outcome, or assigning different node groups different color palettes, following the same idea as the generic colorgroups and colorset arguments in all the plotting functions. There’s an obvious step of making it interactive/dynamic, but that hasn’t been implemented yet.

Colour to indicate a value

Edges

Edges we want to be able to have colour in a column, as it would be if it came in as results from the toolkit. For example, we might want to colour the edges by change between scenarios, or strength of relationships. Down the track we could similarly alter penwidth as well.

As a demonstration, let’s add a value column to edges as a mock-up of the toolkit outputs and plot according to that. I’ll also use a continuous palette here rather than the default, since this is now a continuous variable. I’ll use a smaller network to make things visible. Note that above we removed the 5-year targets from the nodes df, and here we use edges since we’re already modifying it. Either approach can drop a set of nodes, though it’s generally easier to use the nodes since nodes can appear in either the from or to of the edges.

edgewithvals <- edges %>% 
  filter(totype != 'target_5_year_2024') %>% 
  mutate(value = rnorm(n()))

make_causal_plot(nodes,
                 edgewithvals,
                 focalnodes = c('NF4', 'Sloanes froglet'),
                 edge_pal = list(value = 'viridis::plasma'),
                 edge_colorset = 'value', render = FALSE) %>% 
  DiagrammeR::render_graph()

Nodes (and single-colour edges)

We can also colour the nodes by results. Here I’ve also set the edges just to a single color - feeding edge_pal or node_pal a single character value specifying a colour or a character vector of length nrow of the relevant dataframe will just insert those values and bypass the palettes. Again, I start by dummying up some ‘toolkit results’ in a value column. Examples where these values do come out of EWR results is in the theme aggregation notebook and the overview presentation.

nodewithvals <- nodes %>% 
  filter(NodeType != 'target_5_year_2024') %>% 
  mutate(value = rnorm(n()))

make_causal_plot(nodewithvals,
                 edges,
                 focalnodes = c('NF4', 'Sloanes froglet'),
                 edge_pal = 'black',
                 node_pal = list(value = 'scico::oslo'),
                 node_colorset = 'value', render = FALSE) %>% 
  DiagrammeR::render_graph()

Colour within node groups

We might want to use different colour palettes within the different node groups, but colour the nodes themselves within them. To do that, we set the *_pal arguments as named lists of palettes, and also pass *_colorgroups arguments so it knows how to split the data into those palettes. This parallels the use in plot_outcomes for other plot types, where colorgroups is the groups that get the palette, while colorset are the individual units within each group that receive colors from the respective palette. Here, we demonstrate with nodes and plot the whole network so we can see what’s happening.

First, set the list of palettes. This could be set by default for the project, along with other default colors.

node_list_c = list(ewr_code = 'viridis::mako', 
                   env_obj = 'viridis::plasma', 
                   Specific_goal = 'scico::oslo', 
                   Target = 'scico::hawaii', 
                   target_5_year_2024 = 'scico::lisbon')

Then, make the network

make_causal_plot(nodes, edges, 
                 edge_pal = 'black',
                 node_pal = node_list_c,
                 node_colorgroups = 'NodeType',
                 node_colorset = 'Name',render = FALSE) %>% 
  DiagrammeR::render_graph()

Groupings within node groups

It’s possible but gets rapidly bespoke to break those up into more discrete chunks. I’ve had a crude go at establishing a default though, where I grouped EWRs and environmental objectives by their main group, and lumped all the targets by year (though only including 5-year in this example). That default can be accessed by passing 'werp' to as the node_colorset. This is an end-run that builds a new column to use as colorset, defined according to some defaults. The same thing could be done externally by creating a new column to define the colorset outside the function and then making that column colorset.

In that case, we might want to use a different set of palettes,

node_list_g <- list(ewr_code = 'viridis::mako', 
                    env_obj = 'ggthemes::excel_Green', 
                    Specific_goal = 'scico::oslo', 
                    Target = 'calecopal::superbloom3', 
                    target_5_year_2024 = 'calecopal::eschscholzia')

This lets us set the within-node_colorgroups palettes. The node_colorset = 'werp' just creates a new column in the data that gives rows values we want (e.g. the first two letters of the ewr_codes), which are then used to choose colors from that particular palette. It’s a way to have fewer colors (and more meaningful colors) within the colorgroups.

make_causal_plot(nodes, edges, 
                 edge_pal = 'black',
                 node_pal = node_list_g,
                 node_colorgroups = 'NodeType',
                 node_colorset = 'werp', render = FALSE) %>% 
  DiagrammeR::render_graph()

Directions from here

There’s clearly a lot more that could be done here. The nature of the output and the way I’ve set it up are really aimed at being interactive and usable to investigate the network.

The setup here is actually quite similar to the other plot outputs. In all cases, there is much opportunity to adjust the look of the plots to target different uses, but also the ability to establish a consistent default (and look). These plots lend themselves to both notebooks (as here), web, and interactive interfaces (Shiny, observablejs, etc) to investigate the networks themselves or use them to plot results. That gives flexibility to get at whatever the particular question is. Though causal networks are not a typical way to present outputs, here we see that they can be incredibly powerful for not only showing the relationships, but understanding how they change under different scenarios.