In this lab we will start making our first network objects, obtain network statistics and some visualization of the data.
# Libraries we will use:
library(dplyr) # for data manipulation
library(ggplot2) # For making figures
library(ggraph) # For visualization of the networks
library(igraph) # for network analysis
library(tidygraph) # tidyverse friendly network analysis
# Loading the data from the STNet package ------------
# Load the data for nodes
node <- STNet::SwinePrem %>% # load the data from the STNet library
mutate(id = as.character(id)) # change the id to variable to character
# Load the data for edges
edge <- STNet::SwineMov %>%
mutate(id_orig = as.character(id_orig), id_dest = as.character(id_dest))
The data sets we will be using are:
node
which includes the information of the farmsedge
which includes the information for the movements
between the farms.We use the function as_tbl_graph()
to create the network
from a data.frame. This function takes as arguments:x
which
requires the contacts data and assumes that contacts are directed. If we
print the results, we will see a object from the class
tbl_graph
, which we will explain:
net <- as_tbl_graph(edge)
net
## # A tbl_graph: 40 nodes and 1611 edges
## #
## # A directed multigraph with 1 component
## #
## # A tibble: 40 × 1
## name
## <chr>
## 1 17
## 2 12
## 3 14
## 4 11
## 5 7
## 6 9
## # ℹ 34 more rows
## #
## # A tibble: 1,611 × 6
## from to date pigs.moved type_orig type_dest
## <int> <int> <chr> <int> <chr> <chr>
## 1 1 7 8/20/15 160 finisher sow farm
## 2 1 7 8/20/15 76 finisher sow farm
## 3 1 3 9/11/15 155 finisher nursery
## # ℹ 1,608 more rows
The results of printing the object will show us the number of nodes and edges, we can see that our network has 40 nodes and 1611 edges. Then we can also see the attributes for the nodes (which in this case there is only the name) and the edges (which for this example has from, to, date, pigs.moved, type_orig, and type_dest).
We can treat this object like two data frames that are joined by an
key or identification variable. Whenever we want to access one of the
data frames to modify it, we can use either the function
activate()
or when using the pipes we can use
%N>%
to call the nodes or %E>%
to call
the edges. In the next example we will add the rest of the node
information to our tbl_graph
net <- net %N>% # <- Notice we are including 'N' inside our pipe to specify we want to access the nodes
left_join(node, by = c('name' = 'id')) # Now we join to the node data frame to include other variables
# We can ask for the nodes data specifically to see the changes
net %N>%
data.frame()
We can use other network centrality measures such as outdegree, closeness, betwenness, among others. In the following code chunk, we will calculate more centrality measures:
net <- net %>% # This is our entwork data
mutate(outdegree = centrality_degree(mode = 'out', loops = F), # calculate the outdegree
closeness = centrality_closeness(), # calculate the closeness
betweenness = centrality_betweenness(), # calculate betweenness
Nbs = neighborhood.size(graph = .) # calculate the number of neighbors
)
# lets have a look at our network with the new variables
net
## # A tbl_graph: 40 nodes and 1611 edges
## #
## # A directed multigraph with 1 component
## #
## # A tibble: 40 × 9
## name name.y lat long farm_type outdegree closeness betweenness Nbs
## <chr> <fct> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 17 US Farm Lea… 41.6 -93.6 finisher 68 0.04 220. 20
## 2 12 Uncle Bill'… 41.7 -92.7 nursery 5 0.0233 46.1 10
## 3 14 Western Iow… 42.4 -96.3 nursery 15 0.0345 64.7 9
## 4 11 Loess Hills… 41.7 -95.9 nursery 1 0.0217 0 2
## 5 7 Kloubec Koi 41.8 -91.8 GDU 3 0.0217 25.2 9
## 6 9 Kroul Farms 41.9 -91.5 nursery 6 0.0270 0 4
## # ℹ 34 more rows
## #
## # A tibble: 1,611 × 6
## from to date pigs.moved type_orig type_dest
## <int> <int> <chr> <int> <chr> <chr>
## 1 1 7 8/20/15 160 finisher sow farm
## 2 1 7 8/20/15 76 finisher sow farm
## 3 1 3 9/11/15 155 finisher nursery
## # ℹ 1,608 more rows
To visualize the network we can use the function
plot()
.
#plot network
plot(net)
This figure looks a bit messy, to make it more informative we can
visualize dome of the parameters. We use the arguments
edge.arrow.size
to adjust the size of the arrowhead,
vertex.size
to adjust the node size, and
vertex.label
to remove the names of the nodes.
#make it clearer: adjust size, remove labels#
plot(net, # Our network object
edge.arrow.size=0.2, # define the arrow size
vertex.size=4, # the sie of the node
vertex.label=NA) # We remove the name of the nodes
We can use base R to make our figures, but since in this workshop we
have been focusing on ggplot2, we will use the library ggraph which is
based in ggplot2. The way the library ggraph
works is very
similar to ggplot2
, we use the function
ggraph()
to set our empty canvas, same way we would do with
the function ggplot()
in
ggplot2. The library
ggraph` also introduces a bunch of new
geometry tipes for both the nodes and the edges.
We can get specific attributes from the tbl_graph
inside
the aes()
argument, similar to what we would do for
ggplot2. In the following plot, we will set the node color to the type
of farm. Let’s try it:
ggraph(graph = net) + # First we set our empty canvas
geom_edge_link() + # Add the edges
geom_node_point(aes(color = farm_type), size = 3) # add the nodes
We will use the indegree value to assign the node size. First we will
calculate indegree using the function centrality_degree()
with the argument mode = "in"
, and we will also get rid of
the loops with the argument loops = F
.
# First calculate the indegree for the nodes:
net <- net %>%
mutate(indegree = centrality_degree(mode = 'in', loops = F))
# Now we use ggraph to visualize it in the network
ggraph(net, layout = 'kk') + # this is our empty canvas
geom_edge_link(aes(width = pigs.moved)) + # Add the edges
geom_node_point(aes(color = farm_type, size = indegree)) + # Add the nodes
scale_edge_width(range = c(0.01, 0.9)) # we set the range for the width of the edges
We can change the position of the nodes using the argument
layout
. If we don’t specify any layout, ggraph will
automatically set the layout ‘stress’. Force directed layouts place the
nodes in fixed places based on various criteria. The layouts ‘nicely’
and ‘kk’ force the network in clear layouts, based on the position of
the nodes in the network in terms of connectivity. They also attempt to
keep the distance between connected nodes fixed.
# Plot with layout nicely
ggraph(net, layout = 'nicely') +
geom_edge_link() + # Add the edges
geom_node_point(aes(color = farm_type), size = 3) + # add the nodes
labs(title = 'Layout nicely')
# Plot with layout kk
ggraph(net, layout = 'kk') +
geom_edge_link() + # Add the edges
geom_node_point(aes(color = farm_type), size = 3) +
labs(title = 'Layout KK')
You can read more about the different layouts available HERE
Exercise: Let’s have a look at different layouts to see how the network changes. Try using a couple of different layouts to produce a different figure (i.e. ‘fr’, ‘lgl’, ‘graphopt’).
We can change the size of the edge to represent the number of pigs moved between nodes
ggraph(net, layout = 'kk') +
geom_edge_link(aes(width = pigs.moved)) + # Add the edges
geom_node_point(aes(color = farm_type), size = 3) +
scale_edge_width(range = c(0.01, 0.9))
Exercise: explore plotting the network using the different centrality measures to change its color or size.
We can use some of the functions for data manipulation such as
filter()
, since filtering uses boolean operations, we can
use some network properties for the filtering
We will use the function filter()
to select only
observations where there was more than 150 animals moved
net %E>% # This is the network
filter(pigs.moved > 150) %>% # We filter the edges for only the ones with > 150 animals
# we visualize the graph:
ggraph() +
geom_edge_link() +
geom_node_point(size = 4)
You will notice that there are a few of isolated nodes that does not connect to the main network, this is because we only filtered the edges, but not the nodes.sometimes we want this, but other times we would like to remove the noeds as well. For this we will add an extra line tot the code we previously did:
net %E>% # This is the network
filter(pigs.moved > 150) %N>% # We filter the edges for only the ones with > 150 animals
filter(!node_is_isolated()) %>% # THIS IS THE LINE WE ADDED to filter the isolated nodes out
# we visualize the graph:
ggraph() +
geom_edge_link() +
geom_node_point(size = 4)
Notice that in the code I use both the operators %N>%
and %E>%
depending on what I am filtering.
We can filter the network for the neighborhood of a specific node.
The function convert()
will convert the network to the
nodes that are connected with a specific node.
id <- 1 # first we define the id of the node we want
#then we can visualize its neighborhood:
net %>% # this is our network
convert( # We use the function convert() to subset our network
to_local_neighborhood, # we specify that we want to convert to a local neighborhood
node = id, # this is the node id we will filter by
order = 1, # this is the order of nbs
mode = 'all' # the type of contact
) %>%
# then we create a variable to color our index node:
mutate(index = ifelse(.tidygraph_node_index == id, '1', '0')) %>%
# and visualize the network
ggraph() +
geom_edge_link() +
geom_node_point(aes(col = index, shape = index), size = 5)
We can export the network in tables to use them in another session. To do that we will first generate a table for only the nodes:
node <- net %N>%
data.frame()
We can export the data in multiple formats, here we will export the network in rds first (which is a format that can be read by R), then we will export it in csv format.
# Export the object in rds format:
readr::write_rds(net, "Data/Outputs/net.rds")
# We will also export the edges with the distance and coordinates
write.csv(edge, "Data/Outputs/edge.csv", row.names = F)
write.csv(node, "Data/Outputs/node.csv", row.names = F)
This lab has been developed with contributions from: Jose Pablo Gomez-Vazquez,
Jerome
Baron, and Beatriz
Martinez-Lopez.
Feel free to use these training materials for your own research and
teaching. When using the materials we would appreciate using the proper
credits. If you would be interested in a training session, please
contact: jpgo@ucdavis.edu