Degree Name
MS (Master of Science)
Program
Mathematical Sciences
Date of Award
5-2020
Committee Chair or Co-Chairs
Christina Nicole Lewis
Committee Members
Robert M. Price Jr, JeanMarie L. Hendrickson
Abstract
We compare different multiple imputation methods for categorical variables using the MICE package in R. We take a complete data set and remove different levels of missingness and evaluate the imputation methods for each level of missingness. Logistic regression imputation and linear discriminant analysis (LDA) are used for binary variables. Multinomial logit imputation and LDA are used for nominal variables while ordered logit imputation and LDA are used for ordinal variables. After imputation, the regression coefficients, percent deviation index (PDI) values, and relative frequency tables were found for each imputed data set for each level of missingness and compared to the complete corresponding data set. It was found that logistic regression outperformed LDA for binary variables, and LDA outperformed both multinomial logit imputation and ordered logit imputation for nominal and ordered variables. Simulations were ran to confirm the validity of the results.
Document Type
Dissertation - unrestricted
Recommended Citation
Miranda, Samantha, "Investigation of Multiple Imputation Methods for Categorical Variables" (2020). Electronic Theses and Dissertations. Paper 3722. https://dc.etsu.edu/etd/3722
Copyright
Copyright by the authors.