Off-campus ETSU users: To download "Campus Only" theses, please use the following link to log in to our proxy server with your ETSU username and password.

Non-ETSU users: Please talk to your librarian about requesting this thesis through interlibrary loan.

Degree Name

MS (Master of Science)

Program

Mathematical Sciences

Date of Award

5-2025

Committee Chair or Co-Chairs

Michele Joyner

Committee Members

Jeff Knisley, Mostafa Zahed

Abstract

This thesis explores the privacy-utility trade-off in synthetic data generation using the Correlated Attribute Mode of DataSynthesizer, which employs Bayesian networks to model attribute dependencies. It focuses on integrating differential privacy mechanisms, particularly the Laplace mechanism, to inject controlled noise into synthetic data and enhance privacy protection. As organizations face challenges balancing data-driven decision-making with privacy regulations such as the General Data Protection Regulation and the California Consumer Privacy Act, synthetic data offers a solution by creating artificial datasets that preserve statistical properties while balancing data privacy and utility. This research investigates how different differential privacy parameters epsilon affect data similarity and dataset-specific implications for model performance. Using Jensen-Shannon divergence to measure distribution similarity and Spearman rank correlation to assess feature importance preservation, this study provides guidelines for selecting optimal privacy settings across different datasets. These findings help organizations leverage synthetic data without compromising confidentiality or analytical reliability.

Document Type

Thesis - restricted

Copyright

Copyright by the authors.

Share

COinS