Using Machine Learning to Predict Heavy Drinking During Outpatient Alcohol Treatment

Document Type


Publication Date



BACKGROUND: Accurate clinical prediction supports the effective treatment of alcohol use disorder (AUD) and other psychiatric disorders. Traditional statistical techniques have identified patient characteristics associated with treatment outcomes. However, less work has focused on systematically leveraging these associations to create optimal predictive models. The current study demonstrates how machine learning can be used to predict clinical outcomes in people completing outpatient AUD treatment. METHOD: We used data from the COMBINE multisite clinical trial (n = 1383) to develop and test predictive models. We identified three priority prediction targets, including (1) heavy drinking during the first month of treatment, (2) heavy drinking during the last month of treatment, and (3) heavy drinking between weekly/bi-weekly sessions. Models were generated using the random forest algorithm. We used "leave sites out" partitioning to externally validate the models in trial sites that were not included in the model training. Stratified model development was used to test for sex differences in the relative importance of predictive features. RESULTS: Models predicting heavy alcohol use during the first and last months of treatment showed internal cross-validation area under the curve (AUC) scores ranging from 0.67 to 0.74. AUC was comparable in the external validation using data from held-out sites (AUC range = 0.69 to 0.72). The model predicting between-session heavy drinking showed strong classification accuracy in internal cross-validation (AUC = 0.89) and external test samples (AUC range = 0.80 to 0.87). Stratified analyses showed substantial sex differences in optimal feature sets. CONCLUSION: Machine learning techniques can predict alcohol treatment outcomes using routinely collected clinical data. This technique has the potential to greatly improve clinical prediction accuracy without requiring expensive or invasive assessment methods. More research is needed to understand how best to deploy these models.