Machine Learning Powers Early Modern Text Study

University of Chicago Press Journals

In the last two decades, mass digitization has dramatically changed the landscape of scholarly research. The ability to search digital transcriptions of sources for specific keywords saves valuable time, and scholars are no longer confined to archives and libraries if they wish to comb through a text. However, with the spread of digital transcriptions come new concerns surrounding the labor required to enable such accessibility. A new article in The Sixteenth Century Journal suggests methods for researchers to obtain transcriptions of digitized early modern sources while also avoiding unethical labor practices.

" Unlocking the Digitized Archive of Early Modern Print: The Automatic Transcription of Early Modern Printed Books ," by authors Serena Strecker and Kimberly Lifton, begins with a brief history of the two kinds of software used to produce transcriptions. Optical Character Recognition (OCR) software has proven itself well-suited to transcribing late 19th-century and 20th-century works, but the irregularities common in early modern print render OCR inadequate for reliable transcription of these sources.

Instead, early modern scholars have turned to Handwritten Text Recognition (HTR) technology. Transkribus, the leading HTR software, allows users to either consult publicly available transcription software models or to train their own models. In their comparison of various HTR models tested on a selection of pages from four 16th-century exempla collections, Strecker and Lifton highlight Transkribus's ability to facilitate the creation of purpose-built transcription models tailored to the specifications of a scholar's desired source in five basic steps.

Using Transkribus's public models, researchers can generate the training data necessary to train their own highly accurate models. This process, the authors argue, makes it "no longer necessary nor desirable" to rely on outsourced labor, such as the labor of graduate students or workers in the Global South.

"With the accurate and automated transcription of early modern print no longer a goal but a reality, the field of early modern studies must consider what combination of human labor and machine learning technology will be accepted, supported, and will ultimately shape the future of research," the authors conclude. "Only by insisting on ethical labor practices can scholars avoid either exacerbating inequities within the academic hierarchy or perpetuating the lasting inequalities of colonialism."


The Sixteenth Century Journal (SCJ) publishes research and inquiry related to the sixteenth century broadly defined (1450-1650) in all fields and all world regions. The international readership and authorship of the SCJ include leaders in their fields as well as early career scholars. As its subtitle, The Journal of Early Modern Studies, indicates, the SCJ is an interdisciplinary journal, with articles in history, art history, literature, religious studies, gender studies, the history of science, music, material culture, and many other fields.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.