Degree Name
MS (Master of Science)
Program
Computer and Information Sciences
Date of Award
12-2025
Committee Chair or Co-Chairs
Ahmed AL Doulat
Committee Members
Brian Bennett, Chelsie Dubay
Abstract
This thesis explores the cultural influence of historical events on English-language fiction published between 1820 and 1929. Using a corpus of 30,256 digitized books from Project Gutenberg, Latent Dirichlet Allocation (LDA) topic modeling was applied to identify recurring themes across eleven decades. The study sought to determine whether historically significant events could be detected within fictional narratives. One clear instance emerged: Napoleon Bonaparte and the Napoleonic Wars appeared explicitly in the 1820s corpus. Beyond this, several thematic patterns were observed—such as maritime language in the 1840s, national identity in the 1880s, and youth-oriented dialogue in the early 20th century—that plausibly align with events and cultural shifts of their time. The findings highlight both the potential and limitations of culturomic approaches to fiction. This research, therefore, functions as a proof-of-concept, demonstrating how open-source tools and large digital corpora can be leveraged to study cultural history through literature.
Document Type
Thesis - unrestricted
Recommended Citation
Freeman, Michael A., "Topic Modeling and Culturomic Analysis of 30,000 Books Over 100 Years Using Gensim" (2025). Electronic Theses and Dissertations. Paper 4641. https://dc.etsu.edu/etd/4641
Copyright
Copyright by the authors.
Included in
Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Data Science Commons, Numerical Analysis and Scientific Computing Commons, Other Computer Sciences Commons, Programming Languages and Compilers Commons, Theory and Algorithms Commons