Macroanalysis: Potential and A Pitfall

In 1995, Vassar professor Donald Foster used stylometrics to assert that the previously anonymous “A Funeral Elegy” was written by William Shakespeare.[1] In May 2002, an article by Gilles D. Monsarrat in The Review of English Studies refuted Foster’s claim, using close reading to conclude the poem’s author was another English dramatist—John Ford.[2] Upon reviewing Monsarrat’s article, Foster recanted his original claim in June 2002, noting “I know good evidence when I see it and I predict that Monsarrat will win the day.”[3] Matthew Jockers refers to this incident, this “kerfuffle,” as a public failure of computing in the humanities, yet it frames the argument he proceeds to make in his book, Macroanalysis: Digital Methods and Literary History.[4]

In my roles as both student and teacher, I utilized close reading, the “…careful observation, the sustained, concentrated reading of text” to decipher meaning.[5] Jockers, an Associate Dean and Professor of English at the University of Nebraska, attempts to introduce fellow literary historians to the possibilities of distant reading or macroanalysis, the process of understanding literature not by studying particular texts, but by aggregating and analyzing large amounts of data.[6] To be clear, Jockers does not ever advocate the primacy of one method over another; rather, he suggests a “blended approach,” in the hopes of helping literary historians broaden their understanding of the larger contexts in which individual works of literature exist with the production of new and different evidence.[7]

The rest of Macroanalysis is devoted to illustrating the ways in which this new and different evidence is produced, using a database of Irish American literature as his corpus. As a historian interested in digital methods (but lacking experience working with them), my reading focused more on tracking the strategies Jockers applied to his corpus than focusing on the results of the use of those strategies. In a straightforward manner, Jockers introduces readers to text analysis, topic modeling, measuring lexical richness, quantitative authorship attribution, linear regression analysis, and probabilistic latent sematic indexing. The names of these strategies sound daunting, and readers’ eyes may glaze over when they first come across them, but the manner in which they are employed are easily comprehended. Having studied statistics and used SYSTAT in the past, I was able to follow the sections on F-tests, p-scores, and correlation coefficients easily.[8]

If one comes to read Macroanalysis, I suspect they are already “members of the choir”, so to speak. For those interested in digital methods and the possibilities they may hold for their work, this text does a great job in explaining a range of digital methods available and the evidence each method can yield. As I read, I had several moments where I considered integrating these methods into my future historical research. That said, I am not necessarily convinced that the book succeeds in swaying skeptics within his discipline. Jockers is writing with an audience of fellow literary scholars in mind; he is careful to make no attempt to convince them to abandon close reading. Much of the evidence produced via the application of digital methods throughout the text does raise questions for future research and interpretative work. Yet, for all his straightforward explanation of the selection, use, and results of digital methods, Jockers never sets his colleagues on a course toward beginning to use these methods for themselves. What do next steps look like for a seasoned literary scholar who reads Macroanalysis and is interested in attempting text analysis? How do they create a corpus? What should they do if they are on a campus that lacks resources for humanities computing? These are questions that, if addressed, could go a long way in expanding the ranks of digital literary scholars.

