Implementing a Spatial Smoothing Algorithm to Help Identify a Lung Cancer Belt in the United States
Disease mapping is used to identify high risk areas, inform resource allocation and generate hypotheses. The stroke and diabetes belts in the U.S. have encouraged public dialogue and spurred research. Lung cancer is the leading cause of U.S. cancer mortality, accounting for 158135 deaths in 2010 compared to 129180 from cerebrovascular disease and 68905 from diabetes mellitus. If one exists, defining a distinct pattern of high lung cancer mortality could increase public awareness of the disease and facilitate investigation of its determinants. To begin our inquiry, we generated a map and observed an area of high lung cancer mortality, primarily in the Southeast. However, variability in county rates, likely due to small populations, made determining patterns difficult. Spatial smoothing can clarify obscured patterns. We downloaded county lung cancer mortality rates, population sizes and death counts. Concurrent incidence and mortality rates for lung cancer were nearly equivalent, so mortality was used as a proxy for risk. After downloading county population centroids with latitudes and longitudes, we implemented a median-based, weighted, two-dimensional smoothing algorithm to enhance spatial patterns by borrowing strength from neighbor counties. The algorithm selected three proximate centroids, forming a “triple,” anchored by the centroid of the county to be smoothed. The parameter for nearest neighbor (NN) counties was set to NN=10, with the number of triples (NTR) for each county NTR=(2/3)*NN, producing seven collinear triples for each county with a center angle ≥135°. Median rates for the top and bottom 50% of neighbor counties were calculated and weighted by 1/SE, creating a “window,” whereby if the original rate was between the two medians, or if the county population was sufficiently large, it was not smoothed. If the original rate was outside the window, it was adjusted according to the corresponding neighbor median. Ten iterations of this process were conducted for each county. Smoothed rates were imported to ArcGIS and joined to a U.S. counties layer. Congruent counties in or near the Southeast with rates above 64 per 100,000 were defined as one class. We observed clustering of high lung cancer mortality, comprising 724 counties and forming an arc not evident in the unsmoothed data. This area, which we define as the lung cancer belt, included nearly all of Arkansas, Kentucky and Tennessee, and portions of 16 other states. Heavily affected regions include much of the Ohio Valley, Central Appalachia, the Tennessee Valley, the Ozarks, the Mississippi Delta and the northern Gulf Coast. Smoking, a modifiable behavior, causes the majority of lung cancer deaths, and is the single leading cause of mortality in the United States. Lung cancer mortality rates presented at the state level obscure differences within states. The lung cancer belt may provide a tool to identify areas in greatest need of resources. National survey data could be utilized to determine demographic, socioeconomic and behavioral differences between the lung cancer belt and the rest of the nation.
Johnson City, TN
Blackley, David; Zheng, Shimin; and Ketchum, Winn. 2012. Implementing a Spatial Smoothing Algorithm to Help Identify a Lung Cancer Belt in the United States. Oral presentation. Appalachian Student Research Forum, Johnson City, TN. http://www.etsu.edu/studentresearch/2012/documents/2012_ProgramBook.pdf