# Chapter 2 Spatial data and R packages for mapping

In this chapter we describe the basic characteristics and provide examples of spatial data including areal, geostatistical and point patterns. Then we introduce the geopraphic and projected coordinate reference systems (CRS) that are used for spatial data representation, and show how to use R to set CRS and transform data to different projections. Then we describe the data storage format called shapefile to store geospatial data. Finally, we present several examples that show R packages useful to create static and interactive maps including ggplot2 (Wickham, Chang, et al. 2018), leaflet (Cheng, Karambelkar, and Xie 2018), mapview (Appelhans et al. 2018) and tmap (Tennekes 2018). Examples show how to define color palettes, create legends, use background maps, plot different geometries, and save maps as HTML files or as static images.

## 2.1 Types of spatial data

A spatial process in $$d=2$$ dimensions is denoted as

$\{ Z(\boldsymbol{s}): \boldsymbol{s} \in D \subset \mathbb{R}^d\}.$

Here, $$Z$$ denotes the attribute we observe, for example, the number of sudden infant deaths or the level of rainfall, and $$\boldsymbol{s}$$ refers to the location of the observation. Cressie (1993) distinguishes three basic types of spatial data through characteristics of the domain $$D$$, namely, areal data, geostatistical data, and point patterns.

### 2.1.1 Areal data

In areal (or lattice) data the domain $$D$$ is fixed (of regular or irregular shape) and partitioned into a finite number of areal units with well-defined boundaries. Examples of areal data are attributes collected by ZIP code, census tract, or remotely sensed data reported by pixels.

An example of areal data is shown in Figure 2.1 which depicts the number of sudden infant deaths in each of the counties of North Carolina, USA, in 1974 (Pebesma 2018). Here the region of interest (North Carolina) has been partitioned into a finite number of subregions (counties) at which outcomes have been aggregated. Using data about population and other covariates we could obtain death risk estimates within each county. Other examples of areal data are presented in Moraga and Lawson (2012) where Bayesian hierarchical models are used to estimate the relative risk of low birth weight in Georgia, USA, in year 2000, and Moraga and Kulldorff (2016) where spatial variations in temporal trends are assessed to detect unusual cervical cancer trends in white women in USA over the period 1969 to 1995.

library(sf)
library(ggplot2)
library(viridis)
nc <- st_read(system.file("shape/nc.shp", package = "sf"),
quiet = TRUE
)
ggplot(data = nc, aes(fill = SID74)) + geom_sf() +
scale_fill_viridis() + theme_bw() FIGURE 2.1: Sudden infant deaths in North Carolina, 1974.

### 2.1.2 Geostatistical data

In geostatistical data the domain $$D$$ is a continuous fixed set. By continuous we mean that $$\boldsymbol{s}$$ varies continuously over $$D$$ and therefore $$Z(\boldsymbol{s})$$ can be observed everywhere within $$D$$. By fixed we mean that the points in $$D$$ are non-stochastic. It is important to note that the continuity only refers to the domain, and the attribute $$Z$$ can be continuous or discrete. Examples of this type of data are air pollution or rainfall values measured at several monitoring stations.

Figure 2.2 shows the average rainfall for the period May-June (dry-season) over different years collected at 143 recording stations throughout Paraná state, Brazil (Ribeiro Jr and Diggle 2016). These data represent rainfall measurements obtained at specific stations and using model-based geostatistics we could predict rainfall at unsampled sites. Another example of geostatistical data is shown in Moraga et al. (2015). Here prevalence values of lymphatic filariasis are obtained from surveys conducted at several villages in sub-Saharan Africa. Authors use a geostatistical model to predict the disease risk at unobserved locations and construct a spatially continuous risk surface.

library(geoR)
ggplot(data.frame(cbind(parana$coords, Rainfall = parana$data)))+
geom_point(aes(east, north, color = Rainfall), size = 2) +
coord_fixed(ratio = 1) +
scale_color_gradient(low = "blue", high = "orange") +
d_new$UTMy <- coordinates(d_new)[, 2] ## 2.3 Shapefiles Geographic data can be represented using a data storage format called shapefile that stores the location, shape, and attributes of geographic features such as points, lines and polygons. A shapefile is not a unique file, but consists of a collection of related files that have different extensions and a common name and are stored in the same directory. A shapefile has three mandatory files with extensions .shp, .shx, and .dbf: • .shp: contains the geometry data, • .shx: is a positional index of the geometry data that allows to seek forwards and backwards the .shp file, • .dbf: stores the attributes for each shape. Other files that can form a shapefile are the following: • .prj: plain text file describing the projection, • .sbn and .sbx: spatial index of the geometry data, • .shp.xml: geospatial metadata in XML format. Thus, when working with shapefiles it is not enough to obtain the .shp file that contains the geometry data, all the other supporting files are also required. In R, we can read shapefiles using the readOGR() function of the rgdal package, or also the function st_read() of the sf package. For example, we can read the shapefile of North Carolina which is stored in the sf package with readOGR() as follows: # name of the shapefile of North Carolina of the sf package nameshp <- system.file("shape/nc.shp", package = "sf") # read shapefile with readOGR() library(rgdal) map <- readOGR(nameshp, verbose = FALSE) class(map) ##  "SpatialPolygonsDataFrame" ## attr(,"package") ##  "sp" head(map@data) ## AREA PERIMETER CNTY_ CNTY_ID NAME FIPS ## 0 0.114 1.442 1825 1825 Ashe 37009 ## 1 0.061 1.231 1827 1827 Alleghany 37005 ## 2 0.143 1.630 1828 1828 Surry 37171 ## 3 0.070 2.968 1831 1831 Currituck 37053 ## 4 0.153 2.206 1832 1832 Northampton 37131 ## 5 0.097 1.670 1833 1833 Hertford 37091 ## FIPSNO CRESS_ID BIR74 SID74 NWBIR74 BIR79 SID79 ## 0 37009 5 1091 1 10 1364 0 ## 1 37005 3 487 0 10 542 3 ## 2 37171 86 3188 5 208 3616 6 ## 3 37053 27 508 1 123 830 2 ## 4 37131 66 1421 9 1066 1606 3 ## 5 37091 46 1452 7 954 1838 5 ## NWBIR79 ## 0 19 ## 1 12 ## 2 260 ## 3 145 ## 4 1197 ## 5 1237 plot(map) FIGURE 2.6: Map of North Carolina imported with the rgdal package. We can also read the map with st_read() as follows: # read shapefile with st_read() library(sf) map <- st_read(nameshp, quiet = TRUE) class(map) ##  "sf" "data.frame" head(map) ## Simple feature collection with 6 features and 14 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -81.74 ymin: 36.07 xmax: -75.77 ymax: 36.59 ## epsg (SRID): 4267 ## proj4string: +proj=longlat +datum=NAD27 +no_defs ## AREA PERIMETER CNTY_ CNTY_ID NAME FIPS ## 1 0.114 1.442 1825 1825 Ashe 37009 ## 2 0.061 1.231 1827 1827 Alleghany 37005 ## 3 0.143 1.630 1828 1828 Surry 37171 ## 4 0.070 2.968 1831 1831 Currituck 37053 ## 5 0.153 2.206 1832 1832 Northampton 37131 ## 6 0.097 1.670 1833 1833 Hertford 37091 ## FIPSNO CRESS_ID BIR74 SID74 NWBIR74 BIR79 SID79 ## 1 37009 5 1091 1 10 1364 0 ## 2 37005 3 487 0 10 542 3 ## 3 37171 86 3188 5 208 3616 6 ## 4 37053 27 508 1 123 830 2 ## 5 37131 66 1421 9 1066 1606 3 ## 6 37091 46 1452 7 954 1838 5 ## NWBIR79 geometry ## 1 19 MULTIPOLYGON (((-81.47 36.2... ## 2 12 MULTIPOLYGON (((-81.24 36.3... ## 3 260 MULTIPOLYGON (((-80.46 36.2... ## 4 145 MULTIPOLYGON (((-76.01 36.3... ## 5 1197 MULTIPOLYGON (((-77.22 36.2... ## 6 1237 MULTIPOLYGON (((-76.75 36.2... plot(map) FIGURE 2.7: Map of North Carolina imported with the sf package. ## 2.4 Making maps with R Maps are very useful to convey geospatial information. Here, we present simple examples that demonstrate the use of some of the packages that are commonly used for mapping in R, namely, ggplot2, leaflet, mapview, and tmap. In the rest of the book, we show how to create more complex maps to visualize the results of several applications using the ggplot2 and leaflet packages. ### 2.4.1ggplot2 ggplot2 (https://ggplot2.tidyverse.org/) is a package to create graphics based on the grammar of graphics. This means we can create a plot using the ggplot() function and the following elements: 1. Data we want to visualize, 2. Geometric shapes that represent the data such as points or bars. Shapes are specified with geom_*() functions. For example, geom_point() is used for points and geom_histogram() is used for histograms, 3. Aesthetics of the geometric objects. aes() is used to map variables in the data to the visual properties of the objects such as color, size, shape, and position. 4. Optional elements such as scales, titles, labels, legends, and themes. We can create maps by using the geom_sf() function and providing a simple feature (sf) object. Note that if the data available is a spatial object of class SpatialPolygonsDataFrame, we can easily convert it to a simple feature object of class sf with the st_as_sf() function of the sf package. For example, we can create a map of sudden infant deaths in North Carolina in 1974 (SID74) as follows (Figure 2.8): library(ggplot2) map <- st_as_sf(map) ggplot(map) + geom_sf(aes(fill = SID74)) + theme_bw() FIGURE 2.8: Map of sudden infant deaths in North Carolina, 1974, created with ggplot2. In ggplot() the default color scale for discrete variables is scale_*_hue(). Here * indicates color (to color features such as points and lines) or fill (to color polygons or histograms). We can change the default scale by using scale_*_grey() which uses grey colors, scale_*_brewer() which uses colors of the RColorBrewer package (Neuwirth 2014), and scale_*_viridis(discrete = TRUE) which uses colors of the viridis package (Garnier 2018). We can also manually define our own set of colors with scale_*_manual(). Note that this function has a logical argument called drop to decide whether to keep unusued factor levels in the scale. Color scales for continuous variables can be specified with scale_*_gradient() which creates a sequential gradient between two colors (low-high), scale_*_gradient2() which creates a diverging color gradient (low-mid-high), and scale_*_gradientn() which creates a gradient between n colors. We can also use scale_*_distiller() and scale_*_viridis() to use colors of the packages RColorBrewer and viridis, respectively. We can plot the map of SID74 with the viridis scale as follows (Figure 2.8): library(viridis) map <- st_as_sf(map) ggplot(map) + geom_sf(aes(fill = SID74)) + scale_fill_viridis() + theme_bw() FIGURE 2.9: Map of sudden infant deaths in North Carolina, 1974, created with ggplot2 and viridis scale. To save a plot produced with ggplot2 we can use the ggsave() function. Alternatively, we can save the plot by specifying a device driver (e.g. png, pdf), printing the plot, and then shutting down the device with dev.off(): png("plot.png") ggplot(map) + geom_sf(aes(fill = SID74)) + scale_fill_viridis() + theme_bw() dev.off() Moreover, packages gganimate (Pedersen and Robinson 2019) and plotly (Sievert et al. 2018) can be used in combination with ggplot2 to create animated and interactive plots, respectively. ### 2.4.2leaflet Leaflet is a very popular open-source JavaScript library for interactive maps. The R package leaflet (https://rstudio.github.io/leaflet/) makes it easy to integrate and control Leaflet maps in R. We can create a map using leaflet by calling the leaflet() function, and then adding layers to the map by using layer functions. For example, we can use addTiles() to add a background map, addPolygons() to add polygons, and addLegend() to add a legend. We can use a variety of background maps. Examples can be seen in the leaflet providers website. A map of SID74 with a color scale given by the palette "YlOrRd" of the RColorBrewer package can be created as follows. First, we transform map which has projection given by EPSG code 4267 to projection with EPSG code 4326 which is the projection required by leaflet. We do this by using the st_transform() function of the sf package. st_crs(map) ## Coordinate Reference System: ## EPSG: 4267 ## proj4string: "+proj=longlat +datum=NAD27 +no_defs" map <- st_transform(map, 4326) Then we create a color palette using colorNumeric(), and plot the map using the leaflet(), addTiles(), and addPolygons() functions specifying the color of the polygons border (color) and the polygons (fillColor), the opacity (fillOpacity), and the legend (Figure 2.10). library(leaflet) pal <- colorNumeric("YlOrRd", domain = map$SID74)

leaflet(map) %>%
addTiles() %>%
addPolygons(
color = "white", fillColor = ~ pal(SID74),
fillOpacity = 1
) %>%
addLegend(pal = pal, values = ~SID74, opacity = 1)

FIGURE 2.10: Map of sudden infant deaths in North Carolina, 1974, created with leaflet.

To save the map created to an HTML file, we can use the saveWidget() function of the htmlwidgets package (Vaidyanathan et al. 2018). If we wish to save an image of the map, we first save it as an HTML file with saveWidget() and then capture a static version of the HTML using the webshot() function of the webshot package (Chang 2017).

### 2.4.3mapview

The mapview package (https://r-spatial.github.io/mapview/) allows to very quickly create interactive visualizations to investigate both the spatial geometries and the variables in the data. For example, we can create a map showing SID74 by just using the mapview() function with arguments the map object and the variable we want to show (zcol = "SID74") (Figure ??). This map is interactive and by clicking in each of the counties we can see popups with the information of the other variables in the data.

library(mapview)
mapview(map, zcol = "SID74")

mapview is very convenient to very quickly inspect spatial data, but maps created can also be customized by adding elements such as legends and background maps. Moreover, we can create visualizations that show multiple layers and incorporate synchronization. For example, we can create a map with background map "CartoDB.DarkMatter" and color from the palette "YlOrRd" of the RColorBrewer package as follows (Figure ??):

library(RColorBrewer)
pal <- colorRampPalette(brewer.pal(9, "YlOrRd"))
mapview(map,
zcol = "SID74",
map.types = "CartoDB.DarkMatter",
col.regions = pal
)

We can also use the sync() function to produce a lattice like view of multiple synchronized maps created with mapview or leaflet. For example, we can create maps of the variables SID74 and SID79 with synchronized zoom and pan by using sync() with arguments the maps corresponding to each year (Figure 2.11).

m74 <- mapview(map, zcol = "SID74")
m79 <- mapview(map, zcol = "SID79")
m <- sync(m74, m79)
m FIGURE 2.11: Snapshot of synchronized maps of sudden infant deaths in North Carolina in 1974 and 1979.

We can save maps created with mapview in the same manner as maps created with leaflet (using saveWidget() and webshot()). Alternatively, maps can be saved with the mapshot() function as an HTML file or as a PNG, PDF, or JPEG image.

### 2.4.4tmap

The tmap package is used to generate thematic maps with great flexibility. Maps are created by using the tm_shape() function and adding layers with a tm_*() function. In addition, we can create static or interactive maps by setting tmap_mode("plot") and tmap_mode("view"), respectively. For example, an interactive map of SID74 can be created as follows:

library(tmap)
tmap_mode("view")
tm_shape(map) + tm_polygons("SID74")

This package package also allows to create visualizations with multiple shapes and layers, and specify different styles. To save maps created with tmap we can use the tmap_save() function where we need to specify the name of the HTML file (view mode) or image (plot mode). Additional information about tmap can be seen in the package vignette.

### References

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, and Kara Woo. 2018. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.

Cheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2018. Leaflet: Create Interactive Web Maps with the Javascript ’Leaflet’ Library. https://CRAN.R-project.org/package=leaflet.

Appelhans, Tim, Florian Detsch, Christoph Reudenbach, and Stefan Woellauer. 2018. Mapview: Interactive Viewing of Spatial Data in R. https://CRAN.R-project.org/package=mapview.

Tennekes, Martijn. 2018. Tmap: Thematic Maps. https://CRAN.R-project.org/package=tmap.

Cressie, Noel A. C. 1993. Statistics for Spatial Data. New York: John Wiley & Sons.

Pebesma, Edzer. 2018. Sf: Simple Features for R. https://CRAN.R-project.org/package=sf.

Moraga, Paula, and Andrew B. Lawson. 2012. “Gaussian component mixtures and CAR models in Bayesian disease mapping.” Computational Statistics & Data Analysis 56 (6): 1417–33. doi:10.1016/j.csda.2011.11.011.

Moraga, Paula, and Kulldorff. 2016. “Detection of spatial variations in temporal trends with a quadratic function.” Statistical Methods for Medical Research 25 (4): 1422–37. https://doi.org/10.1177/0962280213485312.

Ribeiro Jr, Paulo J., and Peter J. Diggle. 2016. GeoR: Analysis of Geostatistical Data. https://CRAN.R-project.org/package=geoR.

Moraga, Paula, Jorge Cano, Rebecca F. Baggaley, John O. Gyapong, Sammy M. Njenga, Birgit Nikolay, Emmanuel Davies, et al. 2015. “Modelling the distribution and transmission intensity of lymphatic filariasis in sub-Saharan Africa prior to scaling up interventions: integrated use of geostatistical and mathematical modelling.” Public Library of Science: Neglected Tropical Diseases 8: 560. https://doi.org/10.1186/s13071-015-1166-x.

Li, Peter. 2018. Cholera: Amend, Augment and Aid Analysis of John Snow’s Cholera Map. https://CRAN.R-project.org/package=cholera.

Moraga, Paula, and Francisco Montes. 2011. “Detection of spatial disease clusters with LISA functions.” Statistics in Medicine 30: 1057–71. https://doi.org/10.1002/sim.4160.

Bivand, Roger, Tim Keitt, and Barry Rowlingson. 2018. Rgdal: Bindings for the ’Geospatial’ Data Abstraction Library. https://CRAN.R-project.org/package=rgdal.

Neuwirth, Erich. 2014. RColorBrewer: ColorBrewer Palettes. https://CRAN.R-project.org/package=RColorBrewer.

Garnier, Simon. 2018. Viridis: Default Color Maps from ’Matplotlib’. https://CRAN.R-project.org/package=viridis.

Pedersen, Thomas Lin, and David Robinson. 2019. Gganimate: A Grammar of Animated Graphics. https://CRAN.R-project.org/package=gganimate.

Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2018. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.

Vaidyanathan, Ramnath, Yihui Xie, JJ Allaire, Joe Cheng, and Kenton Russell. 2018. Htmlwidgets: HTML Widgets for R. https://CRAN.R-project.org/package=htmlwidgets.

Chang, Winston. 2017. Webshot: Take Screenshots of Web Pages. https://CRAN.R-project.org/package=webshot.