Chapter 1 Geospatial health
1.1 Geospatial health data
Health data provides information to identify public health problems and respond appropriately when they occur. This information is crucial to prevent and control a variety of health conditions such as infectious diseases, non-communicable diseases, injuries, and health-related behaviors. The process of analysis and interpretation of health data encompasses a broad variety of system designs, analytic methods, modes of presentation, and interpretive uses (Lee et al. 2010). In general, descriptive methods are the basis of routine reporting of surveillance data. These focus on the observed patterns in the data and might also seek to compare the relative occurrence of health outcomes in different subgroups. More specialized hypotheses are explored using inferential methods. The aim of these methods is to make statistical conclusions about the patterns or outcomes of health.
The increased availability of georeferenced health information, population data, satellite imagery of environmental factors that influence disease activity levels, and the development of geographic information systems (GIS) and software for geocoding addresses, have facilitated the ascent of the investigations of the spatial and spatio-temporal variations of disease. John Snow’s cholera-outbreak investigation in London in 1854 provides one of the most famous examples of spatial analysis. Snow used a map to illustrate how cholera deaths appeared to be clustered around a public water pump. The assessment of the spatial pattern of the cholera cases was important in identifying the source of the infection and gave support to the theory of cholera transmission through drinking water (Snow 1857).
There is a wide range of spatial and spatio-temporal methods for disease surveillance including methods for disease mapping, clustering, and geographic correlation studies. Many of these methods may be used to highlight areas of high risk (Moraga and Lawson 2012), identify risk factors (Hagan et al. 2016), assess spatial variations in temporal trends (Moraga and Kulldorff 2016), quantify the excess of disease risk close to a putative source (Wakefield and Morris 2001), and early detection of outbreaks (Polonsky et al. 2019, Moraga et al. (2018)).
1.2 Disease mapping
The mapping of disease risk has a long history in public health surveillance. Disease maps provide a rapid visual summary of spatial information and allow the identification of patterns that may be missed in tabular presentations (Elliott and Wartenberg 2004). Such maps are crucial for describing the spatial and temporal variation of the disease, identifying areas of unusually high risk, formulating etiological hypotheses, measuring inequalities, and allowing better resource allocation.
Disease risk estimates are based on information of the observed disease cases, the number of individuals at risk, and possibly, also covariate information such as demographic and environmental factors. Bayesian hierarchical models are used to describe the variability in the response variable as a function of risk factor covariates and random effects that account for unexplained variation. The use of Bayesian modeling provides a flexible and robust approach that permits to take into account the effects of explanatory variables and accomodate spatial and spatio-temporal correlation, and provides a formal expression of uncertainty in the risk estimates (Moraga 2018a). Bayesian inference can be implemented via Markov chain Monte Carlo (MCMC) methods or by using Integrated Nested Laplace Approximations (INLA) which is a computationally effective alternative to MCMC designed for latent Gaussian models (Lindgren and Rue 2015).
Health data are often obtained by aggregating point data over subareas of the study region such as counties or provinces due to several reasons such as patient confidentiality. Often, disease risk models aim to obtain low variance estimates of disease risk within the same areas where data are available. One limitation of this approach is that disease risk maps obtained at this resolution are unable to show how risk varies within areas which difficulties targeting health interventions and directing resources where they are most needed. A better approach is to use point data and build models that exploit correlation between nearby data points and include high spatial resolution covariates to produce disease risk estimates in a continuous surface (Diggle et al. 2013, Moraga et al. (2017)). Maps obtained with this type of models offer high spatial resolution estimates with which to more precisely implement public health programs where they can have the greatest impact.
1.3 Communication of results
It is important to note that the goal of health surveillance is not merely to collect data for analysis, but to guide public health policy and action to control and prevent diseases. A key aspect of surveillance practice is, therefore, the proper and timely dissemination of information to those responsible for disease prevention and control. Depending on the circumstances, those should include health agencies, governments, private organizations, potentially exposed individuals, and innumerable others.
The R software provides excellent tools that greatly facilitate effective communication with collaborators, decision-makers, and the general public, and these should be used consistently and thoughtfully to respond quickly to population’s health needs. R offers visualization packages such as leaflet (Cheng, Karambelkar, and Xie 2018) for making interactive maps, dygraphs (Vanderkam et al. 2018) for plotting time series, and DT (Xie 2018a) for displaying data tables. Moreover, findings can be easily included in reproducible reports generated with R Markdown (Allaire et al. 2018), interactive dashboards using flexdashboard (Borges and Allaire 2017), and interactive web applications built with Shiny (Chang et al. 2018). These tools provide important information on which to base action and a careful interpretation of them allow public health officers to allocate resources efficiently and target populations for education or preventive programs.
Lee, Lisa M., Steven M. Teutsch, Stephen B. Thacker, and Michael E. St. Louis. 2010. Principles and Practice of Public Health Surveillance. 3rd ed. New York: Oxford University Press.
Snow, John. 1857. “Cholera, and the Water Supply in the South Districts of London.” British Medical Journal 1 (42): 864–65.
Moraga, Paula, and Andrew B. Lawson. 2012. “Gaussian component mixtures and CAR models in Bayesian disease mapping.” Computational Statistics & Data Analysis 56 (6): 1417–33. doi:10.1016/j.csda.2011.11.011.
Hagan, José E., Paula Moraga, Federico Costa, Nicolas Capian, Guilherme S. Ribeiro, Elsio A. Wunder Jr., Ridalva D. M. Felzemburgh, et al. 2016. “Spatio-temporal determinants of urban leptospirosis transmission: Four-year prospective cohort study of slum residents in Brazil.” Public Library of Science: Neglected Tropical Diseases 10 (1): e0004275. https://doi.org/10.1371/journal.pntd.0004275.
Moraga, Paula, and Kulldorff. 2016. “Detection of spatial variations in temporal trends with a quadratic function.” Statistical Methods for Medical Research 25 (4): 1422–37. https://doi.org/10.1177/0962280213485312.
Wakefield, John C., and Sarah E. Morris. 2001. “The Bayesian Modeling of Disease Risk in Relation to a Point Source.” Journal of the American Statistical Association 96 (453): 77–91. https://doi.org/10.1198/016214501750332992.
Polonsky, Jonathan A., Amrish Baidjoe, Zhian N. Kamvar, Anne Cori, Kara Durski, W. John Edmunds, Rosalind M. Eggo, et al. 2019. “Outbreak analytics: a developing data science for informing the response to emerging pathogens.” Philosophical Transactions B 374: 20180276. doi:https://doi.org/10.1098/rstb.2018.0276.
Moraga, Paula, Illaria Dorigatti, Zhian N Kamvar, Pawel Piatkowski, Salla E Toikkanen, VP Nagraj VP, Christl A Donnelly, and Thibaut Jombart. 2018. “epiflows: an R package for risk assessment of travel-related spread of disease.” F1000Research 7: 1374. doi:https://doi.org/10.12688/f1000research.16032.1.
Elliott, Paul, and Daniel Wartenberg. 2004. “Spatial epidemiology: Current approaches and future challenges.” Environmental Health Perspectives 112 (9): 998–1006. https://doi.org/10.1289/ehp.6735.
Moraga, Paula. 2018a. “Small Area Disease Risk Estimation and Visualization Using R.” The R Journal 10 (1): 495–506. https://journal.r-project.org/archive/2018/RJ-2018-036/index.html.
Lindgren, Finn, and Håvard Rue. 2015. “Bayesian Spatial Modelling with R-INLA.” Journal of Statistical Software 63. https://doi.org/10.18637/jss.v063.i19.
Diggle, Peter J., Paula Moraga, Barry Rowlingson, and Benjamin M. Taylor. 2013. “Spatial and Spatio-Temporal Log-Gaussian Cox Processes: Extending the Geostatistical Paradigm.” Statistical Science 28 (4): 542–63. https://doi.org/10.1214/13-STS441.
Moraga, Paula, Susanna Cramb, Kerrie Mengersen, and Marcello Pagano. 2017. “A geostatistical model for combined analysis of point-level and area-level data using INLA and SPDE.” Spatial Statistics 21: 27–41. https://doi.org/10.1016/j.spasta.2017.04.006.
Vanderkam, Dan, JJ Allaire, Jonathan Owen, Daniel Gromer, and Benoit Thieurmel. 2018. Dygraphs: Interface to ’Dygraphs’ Interactive Time Series Charting Library. https://CRAN.R-project.org/package=dygraphs.
Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, and Winston Chang. 2018. Rmarkdown: Dynamic Documents for R. https://CRAN.R-project.org/package=rmarkdown.
Borges, Barbara, and JJ Allaire. 2017. Flexdashboard: R Markdown Format for Flexible Dashboards. https://CRAN.R-project.org/package=flexdashboard.
Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2018. Shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.