“Gender and Big Data: Finding or Making Stereotypes?”
CHPDC Annual Lecture
Professor of English and Director of the Initiative for Digital Humanities, Media, and Culture, Texas A&M University
Friday, April 15 at 4:00 p.m. Bunge Room, 4207 Helen C. White Hall
Reception with light refreshments to follow.
In his book Macroanalysis, Matthew Jockers argues that we have reached a “tipping point.” Now that we have so much digital data, we can use techniques and methodologies used to explore big data: text mining, topic modeling, machine learning, named entity recognition, etc. Two problems confront digital literary historians of women writers who wish to apply these methodologies. First, the number of women writers who published works before 1800 in Britain and America, as well as the number of their publications that have been preserved, is small compared to men, a problem compounded by how few works by early modern women writers are currently being digitized: roughly 4% of 307,000 volumes in the Early English Books Online and Eighteenth-Century Collections Online were written by women writers. Second, many of the data analysts currently comparing what they call “female writing” to “male writing” propagate rather than interrogate stereotypes about women and women writers. Sociologists have worked on such problems, and in this talk, I will outline some of their strategies and discuss how literary critics who wish to perform macroanalysis might make use of them. Data scientists in the commercial world have worked on the problem of representing minorities “fairly” even when they are represented by a small sample. Thanks to the robust history of feminist theory and criticism, we have the means for generating vocabularies, taxonomies, and ontologies for semantic searching and supervised topic modeling that differ from those generated through big-data techniques that naïvely privilege historically oppressive discourses. Second, the need to shift from quantitative to qualitative analysis (and back again) is augmented when analyzing textual data produced by minorities. I argue that, once again, the concern for social justice enhances intellectual work by effectively demonstrating the inadequacies of claiming “new” discoveries based upon “statistical significance” alone. DHRN is a part of the Borghesi-Mellon Interdisciplinary Workshops in the Humanities, sponsored by the Center for the Humanities at the University of Wisconsin-Madison, with support from Nancy and David Borghesi and the Andrew W. Mellon Foundation.