Honors Program
University Honors
Date of Award
5-2025
Thesis Professor(s)
JeanMarie Hendrickson
Thesis Professor Department
Mathematics and Statistics
Thesis Reader(s)
Michael Garrett
Abstract
Machine learning is a method that employs statistical algorithms to identify patterns and make predictions from data. This study applies machine learning techniques to analyze data from Major League Baseball (MLB) teams between 1998 and 2024, with the goal of determining which factors strongly influence a team's likelihood of reaching the postseason and in accurately predicting the teams that do and do not qualify for the postseason. Data exploration and unsupervised machine learning methods such as clustering were used to identify underlying patterns in team performance metrics and determine potential significant contributors to team success. Many different supervised learning methods were employed to develop predictive models. The dataset was randomly divided into a training set which was used to train the predictive models, and a test set which was used to evaluate the models’ accuracy. Hierarchical and K-Means clustering were used to group similarly performing teams and identify variables that had great influence on the teams’ performance. Logistic regression, KNN Classification, Linear Discriminant Analysis, Non-linear Functions, Decision Trees, Bagging, and Random Forest models were constructed to classify teams and evaluate the importance of the various predictors. Results indicate that ERA+, Runs Allowed, OPS+, and Runs Scored are the most significant contributors to postseason qualification. This stresses the importance of a balance between offensive and defensive strength and performance. A Random Forest model was selected and used to predict the outcomes of the 2025 season based on early-season data. By identifying key performance indicators, this research aims to offer insights into critical contributors to reaching the postseason in the MLB. This research also demonstrates the value of machine learning in sports analytics and highlights how different methods can be used to handle complex data to support strategic decision-making and forecasting in professional baseball.
Publisher
East Tennessee State University
Document Type
Honors Thesis - Open Access
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.
Recommended Citation
Foster, Chase S., "A Machine Learning Analysis of Factors Leading to Major League Baseball Postseason Berths" (2025). Undergraduate Honors Theses. Paper 833. https://dc.etsu.edu/honors/833
Copyright
Copyright by the authors.