Posts

The annual meeting of the Society for Epidemiologic Research (SER) took place June 18-21. The past two years, I’ve collected Twitter data (2018, 2019). The data were collected with the excellent rtweet package, and the data collection code was based on related code by Mike Kearney, the author of rtweet. Setup # for everything else :) library(tidyverse) # for tidy eval library(rlang) # for labeling tweets in plots library(ggrepel) # for network graphs library(ggraph) library(tidygraph) # for text analysis library(tidytext) Since the data were collected over several days, I’m going to read the saved data straight from GitHub.

CONTINUE READING

I’m pleased to announce the CRAN release of partition 0.1.0. partition is a fast and flexible data reduction framework that minimizes information loss and creates interpretable clusters. partition uses agglomorative clustering: it starts from the ground up, matching pairs of variables and assessing the amount of information that would be explained by their reduction. If the information is above this user-specified threshold, the data is reduced. This type of reduction is particularly useful in very redundant data, such as high-resolution genetic data.

CONTINUE READING

TL;DR: Why should I use here? The here package makes it easier to use sub-directories within projects It’s robust to other ways people open and run your code Like its base R cousin, file.path(), it writes paths safely across operating systems Like a lot of people, when I learned R, I was taught to put setwd() and rm(list = ls()) at the beginning of scripts. Getting rid of any leftovers in the environment and setting the working directory so I can use relative paths made sense to me.

CONTINUE READING

Last week, I presented ggdag at JSM in Vancouver. As you can imagine, I had a lot of conversations with people about DAGs, confounding, colliders, and all the types of bias that can arise in research. One strange type of bias came up a couple of times that I don’t see discussed very often: measuring either the effect you are studying (x) or a variable along a confounding pathway (z) incorrectly can make it appear as if there is an interaction between x and z, even if there isn’t one.

CONTINUE READING

I’m pleased to announce the release of ggdag 0.1.0 on CRAN! ggdag uses the powerful dagitty package to create and analyze structural causal models and plot them using ggplot2 and ggraph in a tidy, consistent, and easy manner. You can use dagitty objects directly in ggdag, but ggdag also includes wrappers to make DAGs using a more R-like syntax: # install.packages("ggdag") library(ggdag) dag <- dagify(y ~ x + z, x ~ z) %>% tidy_dagitty() dag ## # A tibble: 4 x 8 ## name x y direction to xend yend circular ## <chr> <dbl> <dbl> <fct> <chr> <dbl> <dbl> <lgl> ## 1 x 3.

CONTINUE READING

Update with markovifyR Thanks to Maëlle Salmon, who referred me to this post by Julia Silge and Nick Larsen, I explored doing this using the markovifyR package, and the results are unbelievable. See the bottom of the post for an updated batch of sonnets! Original post I recently saw Katie Jolly’s post, in which she produced Rupi Kuar-style poems using Markov Chains in R. I absolutely loved it, so I decided to try it with Shakespeare’s 154 sonnets using her post as a skeleton.

CONTINUE READING