Improving Identification of Fall-Related Injuries in Ambulatory Care Using Statistical Text Mining

Document Type


Publication Date



Objectives. We determined whether statistical text mining (STM) can identify fall-related injuries in electronic health record (EHR) documents and the impact on STM models of training on documents from a single or multiple facilities. Methods. We obtained fiscal year 2007 records for Veterans Health Administration (VHA) ambulatory care clinics in the southeastern United States and Puerto Rico, resulting in a total of 26 010 documents for 1652 veterans treated for fall-related injury and 1341 matched controls. We used the results of an STM model to predict fall-related injuries at the visit and patient levels and compared them with a reference standard based on chart review. Results. STM models based on training data from a single facility resulted in accuracy of 87.5% and 87.1%, F-measure of 87.0% and 90.9%, sensitivity of 92.1% and 94.1%, and specificity of 83.6% and 77.8% at the visit and patient levels, respectively. Results from training data from multiple facilities were almost identical. Conclusions. STM has the potential to improve identification of fall-related injuries in the VHA, providing a model for wider application in the evolving national EHR system.