I’m pleased to announce that ggdag 0.2.0 is now on CRAN! ggdag links the dagitty package, which contains powerful algorithms for analyzing causal DAGs, with the unlimited flexibility of ggplot2. ggdag coverts dagitty objects to a tidy DAG data structure, which allows you to both analyze your DAG and plot it easily in ggplot2. Let’s look at an example for a causal diagram of the effect of smoking on cardiac arrest.
smoking_ca_dag <- dagify(cardiacarrest ~ cholesterol, cholesterol ~ smoking + weight, smoking ~ unhealthy, weight ~ unhealthy, labels = c("cardiacarrest" = "Cardiac\n Arrest", "smoking" = "Smoking", "cholesterol" = "Cholesterol", "unhealthy" = "Unhealthy\n Lifestyle", "weight" = "Weight"), latent = "unhealthy", exposure = "smoking", outcome = "cardiacarrest") %>% tidy_dagitty() smoking_ca_dag
# A DAG with 5 nodes and 5 edges # # Exposure: smoking # Outcome: cardiacarrest # Latent Variable: unhealthy # # A tibble: 6 × 9 name x y direction to xend yend circular label <chr> <dbl> <dbl> <fct> <chr> <dbl> <dbl> <lgl> <chr> 1 cardiacarrest -4.45 0.709 <NA> <NA> NA NA FALSE "Card… 2 cholesterol -3.31 1.41 -> cardiacarrest -4.45 0.709 FALSE "Chol… 3 smoking -2.75 2.59 -> cholesterol -3.31 1.41 FALSE "Smok… 4 unhealthy -1.59 2.47 -> smoking -2.75 2.59 FALSE "Unhe… 5 unhealthy -1.59 2.47 -> weight -2.00 1.38 FALSE "Unhe… 6 weight -2.00 1.38 -> cholesterol -3.31 1.41 FALSE "Weig…
The tidy DAG structure looks like a
tibble. ggdag 0.2.0 also prints
some information about the DAG at the top.
ggdag(smoking_ca_dag, text = FALSE, use_labels = "label")
Here, smoking does increase the risk of cardiac arrest, but it’s also confounded by an unmeasured variable, a tendency towards an unhealthy lifestyle. That means that there are two open paths from smoking to cardiac arrest: the causal path through cholesterol and the backdoor path through weight. (This DAG is probably not quite right, because smoking also affects weight, but we’ll leave it as is for demonstration purposes.)
If you used ggdag 0.1.0, you may notice a big difference here: ggdag plots now look a lot more like base ggplot2 plots. While this has been the case in the development version for some time, one of the bigger mistakes in the initial release of ggdag was too much out-of-box customization. ggdag now does a much better job getting out of the way of ggplot2’s incredible system for aesthetics and themes. Let’s analyze the paths in the smoking DAG but take advantage of tools from ggplot2 to customize the plot.
ggdag_paths(smoking_ca_dag, text = FALSE, use_labels = "label", shadow = TRUE) + theme_dag(base_size = 14) + theme(legend.position = "none", strip.text = element_blank()) + # set node aesthetics scale_color_manual(values = "#0072B2", na.value = "grey80") + # set label aesthetics scale_fill_manual(values = "#0072B2", na.value = "grey80") + # set arrow aesthetics ggraph::scale_edge_color_manual(values = "#0072B2", na.value = "grey80") + ggtitle("Open paths from smoking to cardiac arrest")
There are also many new themes available, each of which is prefixed with
What else is new?
This release ensures compatibility with ggraph 2.0.0 and also fixes a number of bugs (see the news section of the pkgdown site). In addition to better support for ggplot2 aesthetic functions, ggdag also now has better support for working directly in tidygraph/ggraph. ggraph is essential to ggdag’s geoms, but you might prefer to work with the full toolkit from that package.
library(tidygraph) library(ggraph) tblgraph_dag <- as_tbl_graph(smoking_ca_dag) tblgraph_dag
# A tbl_graph: 5 nodes and 5 edges # # A directed acyclic simple graph with 1 component # # A tibble: 5 × 1 name <chr> 1 cholesterol 2 smoking 3 unhealthy 4 weight 5 cardiacarrest # # A tibble: 5 × 9 from to x y direction xend yend circular label <int> <int> <dbl> <dbl> <chr> <dbl> <dbl> <lgl> <chr> 1 1 5 -3.31 1.41 -> -4.45 0.709 FALSE "Cholesterol" 2 2 1 -2.75 2.59 -> -3.31 1.41 FALSE "Smoking" 3 3 2 -1.59 2.47 -> -2.75 2.59 FALSE "Unhealthy\n Lifestyle" # ℹ 2 more rows
tblgraph_dag %>% ggraph() + geom_node_text(aes(label = name)) + geom_edge_link(aes( start_cap = label_rect(node1.name), end_cap = label_rect(node2.name) ), arrow = arrow()) + theme_graph()
tidygraph is designed to work with network data rather than causal diagrams, so many of the features are not as useful for causal DAGs as the algorithms from dagitty. However, tidygraph and ggraph have many tools for manipulating network-like data that are very powerful.
Miss the old look?
A lot has changed in the look of ggdag, but the old style hasn’t gone
away. You can set the old theme with
theme_dag_gray() and set the
stylized nodes with
geom_dag_node() (instead of
stylized argument in the quick plotting functions.
ggdag(confounder_triangle(), stylized = TRUE) + theme_dag_gray()