Spatial data is usually represented in two different ways:
We can represent spatial objects in different dimensions:
Besides having the location of an object, we can include other characteristics such as the name, id, temperature recorded, number of animals in the farm, etc…
We use raster data to represent continuous values in a field. Raster are just a grid where each cell has a value and in a grid. The resolution of a raster just represent the size of the cells from the grid. We use raster data to represent values such as altitude, temperature, among other continuous values.
In this tutorial we will introduce to spatial data manipulation in R.
There are two main formats to manipulate spatial data in R:
- SpatialDataFrame from the sp
package: This is was the first format introduced in R for spatial data manipulation, therefore, this package has a lot of dependencies (packages that uses this format to do other functions) i.e raster
, spdep
, spstat
.
- Simple features from the package sf
: THis is a more recently developed package, this package was developed to be more intuitive and friendly with other packages such as dplyr
. The problem with this package is that since its more recent, some packages doesn’t support this format.
Working with both formats has its advantages, for spatial data manipulation sf
is more intuitive and powerful, but for spatial analysis sp
is more robust.
Here we will use mostly the sf
package, but there will be times that we will need to switch between formats.
The package STNet
was developed specifically for this workshop. All the data and some functions we will use are contained in the package.
The installation of this package is done from github, so we will need to install the packagedevtools
to access the STNet
package.
# If devtools is not installed we need to install it
install.packages("devtools")
# once installed we can use the following function to install STNet
devtools::install_github("jpablo91/STNet")
# Loading the libraries
library(STNet)
library(sf)
library(dplyr)
# Loading the data from the package
data("PRRS")
data("SwinePrem")
# Loading the spatial data from the package
Io <- st_read(system.file("data/Io.shp", package = "STNet"))
## Reading layer `Io' from data source
## `/Library/Frameworks/R.framework/Versions/4.1/Resources/library/STNet/data/Io.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 99 features and 2 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -96.63567 ymin: 40.37454 xmax: -90.13931 ymax: 43.50465
## Geodetic CRS: WGS 84
The st_read()
function automatically shows some information about our shapefile, but we can see more details when printing the object into the console.
Io
## Simple feature collection with 99 features and 2 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -96.63567 ymin: 40.37454 xmax: -90.13931 ymax: 43.50465
## Geodetic CRS: WGS 84
## First 10 features:
## NAME_1 NAME_2 geometry
## 1 Iowa Adair POLYGON ((-94.24274 41.5030...
## 2 Iowa Adams POLYGON ((-94.4728 41.1566,...
## 3 Iowa Allamakee POLYGON ((-91.20658 43.4250...
## 4 Iowa Appanoose POLYGON ((-92.71506 40.5902...
## 5 Iowa Audubon POLYGON ((-94.74762 41.8609...
## 6 Iowa Benton POLYGON ((-91.83191 42.2986...
## 7 Iowa Black Hawk POLYGON ((-92.29958 42.2974...
## 8 Iowa Boone POLYGON ((-93.70088 42.2079...
## 9 Iowa Bremer POLYGON ((-92.08252 42.9061...
## 10 Iowa Buchanan POLYGON ((-91.60603 42.6437...
The output shows:
- geometry type
: The type of shapefile (either point data, lines or polygons). - dimension
Dimensions used in the data.
- bbox
: The extent of our data.
- epsg (SRID)
: The projection in the EPSG format (which is a standardized code to describe the projection).
- proj4string
: The projection in proj4string format.
- And the first 10 features.
The sf
objects are basically a data.frame with extra information about geometry, projection and CRS. We can ask for the geometry only using the $
operator or the function st_geometry()
and then show it in a plot.
plot(Io$geometry)
We can also extract only the data frame without geometry using the function data.frame()
:
data.frame(Io) %>%
head() # We use this function to see the first 6 only
## NAME_1 NAME_2 geometry
## 1 Iowa Adair POLYGON ((-94.24274 41.5030...
## 2 Iowa Adams POLYGON ((-94.4728 41.1566,...
## 3 Iowa Allamakee POLYGON ((-91.20658 43.4250...
## 4 Iowa Appanoose POLYGON ((-92.71506 40.5902...
## 5 Iowa Audubon POLYGON ((-94.74762 41.8609...
## 6 Iowa Benton POLYGON ((-91.83191 42.2986...
First we obtain the prevalence per farm:
PRRS_S <- PRRS %>%
group_by(id) %>%
summarise(N = n(), Cases = sum(Result)) %>% # Get the number of samples and positives
mutate(Prevalence = Cases/N) # Estimate an apparent prevalence.
Now we join this new values with the locations of the farms (SwinePrem):
SwinePrem <- SwinePrem %>%
left_join(PRRS_S, by = "id")
We can use the function st_as_sf()
to transform the data.frame
to sf
. For this we will need to specify the data frame, coordinates and CRS.
Nodes <- SwinePrem %>%
st_as_sf(coords = c("long", "lat"), # The names of the coordinates in our data
crs = st_crs(Io)) # The CRS we will use.
Our spatial objects are not projected, which means that are we are representing the data in a planar scale without considering the earth curvature, something only a flat earther will do. The impact of the projection in our data will be associated with the size of our study area. In smaller areas the projection wont have a big impact, but as our study are increases the projection will have a bigger impact when calculating distances.
We can use the function st_transform()
to set a projection to our data.
Iop <- Io %>%
st_transform(st_crs("+init=EPSG:26975"))
Nodesp <- Nodes %>%
st_transform(st_crs(Iop))
If we observe our map (projected and not projected) side by side, we can notice that there is a slight curvature on the straight lines such as the north and south borders.
par(mfrow = c(1,2), mar = c(0, 0, 0, 0))
plot(Io$geometry)
plot(Iop$geometry)
Now that we have our data correctly projected, we can improve the visual aspect of our map.
We can visualize multiple spatial objects in the same map adding several layers to our plot. We use the argument add = T
in the layers we want to add to our base plot.
plot(Io$geometry) # Base map, first layer
plot(Nodes["Prevalence"], pch = 16, add = T) # Second layer
Using the information of the apparent prevalence by farm we can change the size of the point and color them by type of farm. We use the function rescale()
from the scales
package to rescale the points between 0.5 - 1.5 according to their apparent prevalence.
We can also add a legend that shows what they simbology means in our map. For this we use the function legend()
, which will take multiple arguments such as the location, names to display, among others.
# We use the function rainbow to obtain a palette
colpal <- rainbow(length(levels(Nodes$farm_type)), # Number of colors needed
s = 0.6) # saturation for the colors
# Visualize the map
## background map
plot(Iop$geometry, col = "grey90", main = "Farm distribution and its prevalence")
## First layer
plot(Nodesp$geometry, # Spatial points object
pch = 16, # Type of point
cex = scales::rescale(Nodesp$Prevalence, to = c(0.5, 1.5)), # Size
col = colpal[Nodesp$farm_type], # Color for the points
add = T) # Add as a second layer for previous plot
## Legend
legend("topright", # Position of the legend
legend = levels(Nodesp$farm_type), # Categories to show
pch = 16, # Type of point
col = colpal) # Colors for the points
We can aggregate the data at county level to create a choropleth map. First we will join the farm information with the county using a spatial joint with the function st_join()
. This function will create duplicates of our counties since we have multiple farms per county, so then we will sum all the cases and number of observations using group_by()
and summarise()
, and calculate the prevalence at county level (instead of farm level).
Iop <- Iop %>%
st_join(Nodesp) %>% # the data we are joining with
group_by(NAME_2) %>% # group by county
summarise_at(vars(N, Cases), .funs = ~sum(.,na.rm = T)) %>% # apply the sum function to the variables sN and Cases
mutate(Prevalence = Cases/N) # get the apparent prevalence at county level
plot(Iop["Prevalence"])
There are other libraries that can be used to create maps in R, the base R functions for graphics can be very flexible, but require more knowledge about the functions in R. There ar other options such as tmap
and ggplot
that allow us to use more intuitive functions to create maps.
tmap
has several options for mapping and with very similar syntax to the tidyverse packages (dplyr
, ggplot2
). We can create several maps and arrange them into a layout. For example, we can specify a predefined color palette for our map without the need of telling R how many colors we need.
library(tmap)
First we create a map of the farms and their apparent prevalence
tm_shape(Iop) + # base layer
tm_polygons(col = "grey80") + # color of the base layer
tm_shape(Nodesp) + # second layer
# Options for second layer
tm_symbols("Prevalence", # Name of the variable we will use
size = 0.5, # Size of the points
legend.hist = T) + # Add a histogram
# Layout options
tm_layout(legend.outside = T, # We want the legend outside the box
frame = F, # remove frame
legend.hist.width = 3) # with of histogram
### 3.1.1 Interactive maps
One of the nice things about tmap is that we can ver easy convert our static map into something interactive. We use the function tmap_mode()
to switch between the interactive mode (‘view’) to the static mode (‘plot’). Now we will create a choropleth map like the one we did before, but add it as an interactive map layer.
tmap_mode('view')
## tmap mode set to interactive viewing
tm_shape(Iop) +
tm_polygons("Prevalence", palette = "-RdYlBu", alpha = 0.5) +
tm_layout(legend.outside = T, frame = F)
ggplot2
is one of the most popular libraries for creating figures in R, we can add sf objects using the function geom_sf()
library(ggplot2)
p <- ggplot() + # first we call the ggplot() function
geom_sf(data = Iop, # we specify our data
aes(fill = Prevalence)) + # we use aes() to add the variables that we want to plot
theme_void()
p
Similarly to tmap
, we can also convert our map into an interactive map using the library plotly
# make sure you have it installed
# install.packages("plotly")
plotly::ggplotly(p)
ggplot and tmap have more personalization options that will not be covered in this tutorial. For more plotting information i recommend the R Graph Gallery which has several examples and code for making different types of plots
One of the easiest ways to get data from other countries or regions is downloading directly from R using the raster
library. For example, we will get a shapefile for Denmark. The data will be downloaded in your computer and store in your project directory. Once downloaded you can just load it directly from your computer without internet connectiom with the function read_sf
.
DK <- raster::getData(name = 'GADM', # this is the data source
level = 1, # Level of administration
country = 'DK') %>% # country code
st_as_sf() # Now transform the data to sf
# the dataset can be quite detailed, so we can simplify it to make it easier to plot
DK <- rmapshaper::ms_simplify(DK)
## Registered S3 method overwritten by 'geojsonlint':
## method from
## print.location dplyr
# Plot the map
ggplot() +
geom_sf(data = DK)