Stochastic Shakespeare: Sonnets Produced by Markov Chains in R

Update with markovifyR

Thanks to Maëlle Salmon, who referred me to this post by Julia Silge and Nick Larsen, I explored doing this using the markovifyR package, and the results are unbelievable. See the bottom of the post for an updated batch of sonnets!

Original post

I recently saw Katie Jolly’s post, in which she produced Rupi Kuar-style poems using Markov Chains in R. I absolutely loved it, so I decided to try it with Shakespeare’s 154 sonnets using her post as a skeleton.

Downloading and cleaning the sonnets

In addition to markovchain and tidyverse, I’m going to use the gutenberger package to download the sonnets.

library(gutenbergr)
library(tidyverse) 
library(markovchain) 
shakespeare <- gutenberg_works(title == "Shakespeare's Sonnets") %>% 
  pull(gutenberg_id) %>% 
  gutenberg_download(verbose = FALSE)

shakespeare
## # A tibble: 2,625 x 2
##    gutenberg_id text                                          
##           <int> <chr>                                         
##  1         1041 THE SONNETS                                   
##  2         1041 ""                                            
##  3         1041 by William Shakespeare                        
##  4         1041 ""                                            
##  5         1041 ""                                            
##  6         1041 ""                                            
##  7         1041 ""                                            
##  8         1041 "  I"                                         
##  9         1041 ""                                            
## 10         1041 "  From fairest creatures we desire increase,"
## # … with 2,615 more rows

Because the sonnets are in gutenberger, they’re already in a nice format to work with. I just need to do a little cleaning up: like Katie, I removed the punctuation, but I also have to clear out the sonnet titles, which were Roman numerals, and some title info.

#  a little function to make life easier
`%not_in%` <- function(lhs, rhs) {
  !(lhs %in% rhs)
}

#  remove new lines symbol, sonnet Roman numerals, and punctation
#  and split into vector
bills_words <- shakespeare %>% 
  mutate(text = text %>% 
    str_trim() %>% 
    str_replace_all("--", " ") %>% 
    str_replace_all("[^[:alnum:][:space:]']", "") %>% 
    str_replace_all("^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$", 
                    "") %>% 
    str_to_lower()) %>% 
  filter(text %not_in% c("the sonnets", "by william shakespeare", "", " ")) %>% 
  pull(text) %>% 
  str_split(" ") %>% 
  unlist() 

I’m also going to extract the punctuation and assess how many of each there are for when I actually assemble the sonnets later.

punctuation <- shakespeare %>% 
  pull(text) %>% 
  str_extract_all("[^[:alnum:][:space:]']") %>% 
  unlist()

punctuation_probs <- punctuation[punctuation %not_in% c("-", "(", ")")] %>% 
  table() %>% 
  prop.table()

Fit the Markov Chain

Now fit the Markov Chain with the vector of words.

#  fit a Markov Chain
sonnet_chain <- markovchainFit(bills_words)
cat(markovchainSequence(n = 10, markovchain = sonnet_chain$estimate), collapse =  " ")

with her audit by time whose shadow since i cannot

And finally, here are a few functions to piece together lines to make them look like a sonnet using the walk() function from purrr to print out the lines (since it’s a side effect). No, they’re not actually iambic pentameter :(

write_a_line <- function(n_lines = 1) {
  walk(1:n_lines, function(.x) {
  # put together lines of more or less average length
    lines <- markovchainSequence(n = sample(c(6:9), 1), 
                               markovchain = sonnet_chain$estimate) %>% 
      paste(collapse = " ")
  
  #  add end-of-line punctuation based on their occurence 
  end_punctuation <- ifelse(.x == n_lines, ".", 
                            sample(names(punctuation_probs), 
                                   size = 1, 
                                   prob = punctuation_probs))
  cat(paste0(lines, end_punctuation, "  \n"))
  })
}

psuedosonnet <- function() {
  walk(1:3, function(.x) {
    write_a_line(4)
    cat("  \n")
  })
  
  write_a_line(2)
}

Generating the sonnets

Let’s try it out.

Psuedosonnet 1:

set.seed(154)
psuedosonnet()

wing and folly doctorlike controlling skill who knows the,
again assur’d and given grace but when all date:
shook three till my mind these blenches,
in sweetest things deem’d not so dear delight.

the mouths of love swearing in these,
and soon to woe compar’d with;
at your bounty cherish she is ’greeing and yet,
by lies buried age and simple savour pitiful thrivers.

am now with intelligence as the stage presenteth nought,
which it for me suffering my?
days making lascivious grace is not thy days are:
expired for myself and tombs of my.

right my heart another gay why dost,
of my use your self thou.

Psuedosonnet 2:

psuedosonnet()

past reason hunted and checked with thee;
or changes right or at least so strong,
the ashes of glass his prescriptions,
him to decay lest eyes be false borrowed face.

and all kinds of love her blood that,
o none but in these offices so.
and his spring within the view!
to weigh how thy book of.

by a tomb of public means which,
or who moving points on to?
can yet do see my state and age unbred?
muse brings forth the orient when.

thy hair the world away to show thee:
one by day when in love things.

Psuedosonnet 3:

psuedosonnet()

or else receiv’st not fade die but.
grief to mend to stay whilst;
reeks i hold thee devouring time that,
with you pattern of such day my love happy.

hate me last so bright who leaves thy beauty.
each friend and your true that one that,
whilst like a vengeful canker blooms?
and therefore my bad thus to.

should i not counted fair whose,
state but day arising from hands to;
of flower with his presence grace impiety,
only herald to the beast that plea.

eyelids open wide in posterity thou send’st from,
nor shall have err’d and kept unused.

Psuedosonnet 4:

psuedosonnet()

that pine to my home of thine own,
love being crown’d with that man’s;
what need’st thou usurer why of this thy love’s,
with sovereign cure i question make him that i.

to the ear confounds him have drawn,
fight and beauty should my muse and strangely.
liv’d alone stands but one so,
whilst i alone o in the lines that brightness.

creature the lease dost wake elsewhere but me;
evil tempteth my love that which still,
i am attainted that in the thing to.
it thy sweet deaths be but those.

that our appetite which he is hanging still:
inhabit on death to make one.

Psuedosonnet 5:

psuedosonnet()

and confounds in me not my?
alchemy to all my comfort now i send,
the pleasure thine own desert and no hatred in,
with disdain lest guilty goddess go since.

seem right myself i’ll fight after you,
wane so love and simple truth.
ah do dispense you like of time despite,
to store harsh featureless and by.

my reason is daily new lo,
love thee i’ll run and in themselves forsake me;
thou through windows to play as ’twixt a,
me both and to love of.

still my bonds in disgrace with newer:
morrow to slavery my side against the weary.

Update

Alright, this time I’m going to try it with the markovifyR package. I’m basically going to do the same cleaning as above, but this time I’ll be putting entire sentences, punctuation and all, into the Markov model. The markovify_text() function also accepts start words, so I thought it might look good to start with a sample of 100 starting words from the sonnets and construct the lines from there.

library(markovifyR)
#  same as above, but maintain as sentences and keep punctuation
bills_sentences <- shakespeare %>% 
  mutate(text = text %>% 
    str_trim() %>% 
    str_replace_all("--", " ") %>% 
    str_replace_all("^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$", 
                    "") %>% 
    str_to_lower()) %>% 
  filter(text %not_in% c("the sonnets", "by william shakespeare", "", " "))

#  fit the Markov Chain
markovify_model <-
  generate_markovify_model(
    input_text = bills_sentences$text,
    markov_state_size = 2L,
    max_overlap_total = 25,
    max_overlap_ratio = .85
  )

#  generate a sonnet
markovify_sonnet <- function() {
  lines <- markovify_text(
      markov_model = markovify_model,
      maximum_sentence_length = 75,
      output_column_name = 'sonnet_line',
      count = 50,
      tries = 1000, 
      start_words = sample(generate_start_words(markovify_model)$wordStart, 100),
      only_distinct = TRUE,
      return_message = FALSE) %>% 
    filter(str_count(sonnet_line, "\\w+") > 5 & str_count(sonnet_line, "\\w+") < 10) %>% 
    slice(sample(1:n(), 14)) %>% 
    mutate(id = 1:n()) %>% 
    select(id, sonnet_line) 
  
   #  add a period to the last line if the last charachter isn't punctuation 
   #  that ends a sentence  
   last_line <- lines[lines$id == 14, "sonnet_line"]
   lines[lines$id == 14, "sonnet_line"] <- str_replace(last_line, 
                                                       ".$(?<!//.//!//?|[:alnum:])", ".")
   
   #  print in a sonnet-like format
   walk(1:14, function(.x) {
     cat(lines$sonnet_line[.x], " \n")
     
     #  add a space every four lines
     if (.x %% 4 == 0) cat("\n") 
   })
}

Markovify Sonnet 1:

markovify_sonnet()

borne on the ashes of his spring;
then, beauteous niggard, why dost thou use
stealing away the treasure of his great verse,
for through the painter and hath stell’d,

in thy glass and she with me,
who, in despite of space i would be forgot,
accuse me thus: that i in your decay
of him, i’ll live in doubt,

mad in pursuit and in my heart.
grant, if thou survive my well-contented day,
stealing away the treasure of his living hue?
feeding on that which he toil’d:

there lives more life in one of thine,
spending again what is most evident.

Markovify Sonnet 2:

markovify_sonnet()

nor taste, nor smell, desire to be won,
unlearned in the least of them my life decay;
although thou steal thy sweet self too cruel:
say that thou none lov’st is most dear,

too base of thee hast left behind,
will be true despite thy scythe and crooked knife.
these offices, so oft as mine,
may still seem love to thee,

deserves the travail of a conquer’d woe;
although thou steal thy sweet self grow’st.
come in the distraction of this excess,
drawn after you, you pattern of all his growth

be, as thy sweet will making addition thus.
is more than i have scanted all.

Markovify Sonnet 3:

markovify_sonnet()

poor soul, the centre of my lovers gone,
what hast thou this becoming of their faces,
who lead thee in their body’s force,
want nothing that the thought of hearts shouldst owe.

eat up thy charge? is this thy golden time.
stealing away the treasure of thy lusty days;
unlearned in the praise thereof spends all his might,
poor soul, the centre of my heart;

for you in me thou lov’st those
deserves the travail of a former child!
when as thy sweet self resemble,
the prey of worms, my body that he may

hate of my lameness, and i are one;
mad in pursuit and in my verse

Markovify Sonnet 4:

markovify_sonnet()

speak of the thing she would have stay;
within the level of your love.
oaths of thy love, though much, is not so?
nay, if thou thy sins are;

looking on thee in such sort,
still losing when i took my way,
kind is my love, you know,
beshrew that heart that makes my heart to sway?

thus have i slept in your decay
hers by thy picture in my purpose bred,
both find each other, and i desperate now approve
love’s eye is my invention spent,

some fresher stamp of the dead, which now appear
who all their praises are but prophecies

Markovify Sonnet 5:

markovify_sonnet()

but that which governs me to thee resort.
or whether shall i live, supposing thou art old,
advantage on the rarities of nature’s truth,
hast thou, the master mistress of my harmful deeds,

thou best of dearest, and mine eye awake:
although thou steal thee all thy glory live.
ah! yet doth beauty like a makeless wife;
nay, if you were once unkind befriends me now,

whereto all bonds do tie me day by day,
are vanishing, or vanished out of their faces,
presume not on thy soft cheek for complexion dwells
if thou dost seek to have years told:

although i swear it to myself i do,
lo! in the world’s fresh ornament.

Well, call me Shockedspeare.

Exit, pursued by a bear

Avatar
Malcolm Barrett
PhD Student in Epidemiology

I am an R developer and a PhD student in Epidemiology at the University of Southern California. My work in public health has spanned on-ground clinical education and research for clinical and cohort studies. Previously, I was an intern at RStudio, and I served two years in AmeriCorps at federally-qualified health centers in Michigan and New York City.