Using Clinician Text Notes in Electronic Medical Record Data to Validate Transgender-Related Diagnosis Codes

Document Type


Publication Date



Objective: Transgender individuals are vulnerable to negative health risks and outcomes, but research remains limited because data sources, such as electronic medical records (EMRs), lack standardized collection of gender identity information. Most EMR do not include the gold standard of self-identified gender identity, but International Classification of Diseases (ICDs) includes diagnostic codes indicating transgender-related clinical services. However, it is unclear if these codes can indicate transgender status. The objective of this study was to determine the extent to which patients' clinician notes in EMR contained transgender-related terms that could corroborate ICD-coded transgender identity. Methods: Data are from the US Department of Veterans Affairs Corporate Data Warehouse. Transgender patients were defined by the presence of ICD9 and ICD10 codes associated with transgender-related clinical services, and a 3:1 comparison group of nontransgender patients was drawn. Patients' clinician text notes were extracted and searched for transgender-related words and phrases. Results: Among 7560 patients defined as transgender based on ICD codes, the search algorithm identified 6753 (89.3%) with transgender-related terms. Among 22 072 patients defined as nontransgender without ICD codes, 246 (1.1%) had transgender-related terms; after review, 11 patients were identified as transgender, suggesting a 0.05% false negative rate. Conclusions: Using ICD-defined transgender status can facilitate health services research when self-identified gender identity data are not available in EMR.