In this lab we will review concepts of community detection and how to represent our network in a map.
First we will load the data and the libraries we will be using. Remember that on the previous lab we exported a data set at the end, we will continue using this objects.
# libraries we will use.
library(dplyr) # Data manipulation
library(sf) # Spatial data manipulation
library(tidygraph) # Network manipulation and analysis
library(purrr) # for data manipulation
library(ggraph) # for network visualization
library(ggpubr) # for arranging plots
# If you are starting a new session, load the files and libraries again
# Load the data for nodes
node <- STNet::SwinePrem %>% # load the data from the STNet library
mutate(id = as.character(id)) # change the id to variable to character
# Load the data for edges
edge <- STNet::SwineMov %>%
mutate(id_orig = as.character(id_orig), id_dest = as.character(id_dest))
# create the network object
net <- as_tbl_graph(edge) %N>%
left_join(node, by = c('name' = 'id')) # Now we join to the node data frame to include other variables
In this section we will use different algorithms to identify communities in our network.
# First we need to simplify the network
c <- net %E>% # we call our network and activate the edges
mutate(N = as.integer(1)) %>% # create a variable for the number of movements (each row is 1 movement)
convert(to_simple) %E>% # now we will convert it to a simple network
mutate(weight = map_int(.orig_data, ~.x %>% pull(N) %>% sum())) %N>% # We have to sum all the repeated movements
mutate(walktrap = factor(group_walktrap(weights = weight)), # use the walktrap algorithm for community detection
infomap = factor(group_infomap(weights = weight))) # use the infomap for community detection
Then we will create an empty list to fill with plots and compare the different algorithms.
# Create the empty list
CP <- list()
# Make a plot for the edges only
pc <- c %>% # our simplified network
ggraph(layout = 'nicely') + # call the ggraph function
geom_edge_link() + # add the edges
theme(legend.position = 'bottom') # set the legend position to bottom
CP[['Infomap']] <- pc + # We call our plot with only the edges
geom_node_point(aes(col = infomap), size = 2) + # we add the nodes
labs(title = 'Infomap') # title of our plot
CP[['Walktrap']] <- pc +
geom_node_point(aes(col = walktrap), size = 2) +
labs(title = 'Walktrap')
# We arrange our plots in a single figure
ggarrange(plotlist = CP)
## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
We can select specific communities to show by filtering by the newly created community variable.
c %N>%
filter(walktrap == '1') %>% # filter for community 1
ggraph(layout = 'nicely') + # call the ggraph function
geom_edge_link() + # add the edges
geom_node_point(aes(col = walktrap), size = 2) + # we add the nodes
theme(legend.position = 'bottom') + # set the legend position to bottom
labs(title = 'Subset of the nework for the community 1')
Exercise: Now increase the number of steps in the walk trap algorithm, what happens when we increase the number of steps? What do you think would be the optimal number of steps for this example
Now we will use the network we created and the spatial location of
our farms to see the movements on a map.
We will be using the sf
package to manipulate the spatial
objects, and the ggplot2
package for visualization.
In the STNet
package there is a spatial polygons data,
which includes the counties in the state of Iowa.
# Loading the packages
library(sf) # Package for spatial objects
library(ggplot2) # package for plots
# We load the spatial object from the package STNet
iowa <- st_read(system.file("data/Io.shp", package = "STNet"))
## Reading layer `Io' from data source
## `/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/STNet/data/Io.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 99 features and 2 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -96.63567 ymin: 40.37454 xmax: -90.13931 ymax: 43.50465
## Geodetic CRS: WGS 84
# plot map using sf
plot(iowa$geometry)
Next we will transform the nodes as a spatial points object, for this
we use the function st_to_sf()
and we need to specify the
names of the columns that have the spatial coordinates.
NodeSp <- net %N>% # This is our node data.frame
mutate(indegree = centrality_degree(mode = 'in')) %>% # calculate the indegree
data.frame() %>%
st_as_sf(coords = c("long", "lat"), # Variables for the coordinates
crs = st_crs(iowa)) # This is the CRS we are using
One of the nice things of ggplot is that we can create a map and store it in an object and later we can keep adding layers to this map. So first we will create a map of the state.
map <- ggplot() +
geom_sf(data = iowa, # name of the spatial dataset
color="grey20", # color of the shape border
fill="white", # fill of the shape
size=0.4) + # width of the border
theme_void() # This is a theme form ggplot
Once we have the base map of the state, we can add the spatial points data we created previously. We can specify the size of the points using a variable.
map + geom_sf(data = NodeSp, # name of our data
aes(color = farm_type, # we color the nodes by farm type
size = indegree)) +
ggtitle("Farms and their indegree") # the title of our plot
Exercise: Make the same plot, but make the size of the nodes relative to outdegree
We can also add the connection between the nodes represented by euclidean distance (fancy word for straight line). But first we will need to add the coordinates to the edge data using joins:
# First we need to include the information of origins and destination
edge <- edge %>%
left_join(node[c("id", "lat", "long")], by = c("id_orig" = "id")) %>%
rename(O_Lat = lat, O_Long = long) %>%
left_join(node[c("id", "lat", "long")], by = c("id_dest" = "id")) %>%
rename(D_Lat = lat, D_Long = long)
In the previous lab we calculated the euclidean distance between each pair of farms involved in a movement. Here we will visualize those movements.
# The function geom_segment adds a straight line between two coordinates:
map +
geom_segment(data=edge,
aes(x=O_Long, y=O_Lat, # this is where the line starts
xend=D_Long, yend=D_Lat)) # this is where it ends
# We can add the information of the type of movement to change the color of the line and the number of animals for the transparency
map +
geom_segment(data=edge,
aes(x=O_Long, y=O_Lat,
xend=D_Long, yend=D_Lat,
color=type_orig,
alpha = pigs.moved))
Now we will add both the farm locations and the direction of the movements between the farms on a map.
#plot nodes & edges - add both commands geom_segment and geom_point#
map +
geom_segment(data=edge,
aes(x=O_Long, y=O_Lat,
xend=D_Long, yend=D_Lat,
alpha = pigs.moved),
show.legend=F) +
geom_sf(data = NodeSp,
aes(color = farm_type,
size = indegree), show.legend = "point")
Sometimes we are interested in a particular type of movements. We can
subset this using the dplyr functions such as filter()
. In
the next plot we will select only the movements that comes from sow
farms.
# plot movements from sow farms only
map +
edge %>%
filter(type_orig == "sow farm") %>%
geom_segment(data = .,
aes(x=O_Long, y=O_Lat,
xend=D_Long, yend=D_Lat,
color = type_orig), show.legend = F) +
geom_sf(data = NodeSp,
aes(color = farm_type), size=3, show.legend = "point")
We can be even more specific and filter the movements from sow farms
that are directed to GDU.
We will also add at the end the function ggplotly()
from
the package plotly
to obtain a map were we can zoom and
hover over some features to obtain more information.
# We store the map of movements between GDU to sow farm
m <- map +
edge %>%
filter(type_orig == 'GDU', type_dest == "sow farm") %>%
geom_segment(data = ., aes(x=O_Long, y=O_Lat,
xend=D_Long, yend=D_Lat,
color = type_orig), show.legend = F) +
geom_sf(data = NodeSp,
aes(color = farm_type),
size=3, show.legend = "point") +
ggtitle("GDU to Sow farm Movments")
# We use the function from plotly to transform our map into n interactive map.
library(plotly)
ggplotly(m)
If you are interested in more about interactive maps
plotly
also has option for using background maps, but for
this you need to get a public Mapbox access token, which is
free, but requires registration. Some great resources for more
information:
Here I provide an example of the kind of maps you can get using mapbox, but this This code will not run unless you use your own API Key
# Sys.setenv('MAPBOX_TOKEN' = yourKey)
plot_mapbox() %>%
add_segments(
data = group_by(edge, id_orig, id_dest),
x = ~O_Long, xend = ~D_Long,
y = ~O_Lat, yend = ~D_Lat,
alpha = 0.1, size = I(1), hoverinfo = "none"
) %>%
add_markers(data = node,
x = ~long, y = ~lat, text = ~id,
split = ~farm_type,
hoverinfo = "text"
) %>%
layout(
mapbox = list(
style = 'open-street-map',
zoom = 6,
center = list(lat = 42, lon = -93)
))
This lab has been developed with contributions from: Jose Pablo Gomez-Vazquez,
Jerome
Baron, and Beatriz
Martinez-Lopez.
Feel free to use these training materials for your own research and
teaching. When using the materials we would appreciate using the proper
credits. If you would be interested in a training session, please
contact: jpgo@ucdavis.edu