We will continue using the data set from the previous section to introduce some basics of data visualization. Some libraries can include data sets that once you install the library you can use. In this case we will use data from the STNet library, if you dont have it installed yet make sure you do by using:

# Install STNet from github
devtools::install_github('spablotemporal/STNet')
# Load the libraries
library(ggplot2) # for graphics
library(dplyr) # For data manipulation
library(STNet)

# loading the data from the package
data('captures') # we load the data
head(captures) # let's have a look at the data
##    municipality          location             Loc     date year captures
## 1 Temascaltepec San Pedro Tenayac   Cueva  el Uno 11/06/14 2014        6
## 2      Tlatlaya  Nuevo Copaltepec La alcantarilla 12/05/05 2005        3
## 3      Tlatlaya  Nuevo Copaltepec La alcantarilla 12/05/07 2007       30
## 4      Tlatlaya  Nuevo Copaltepec La alcantarilla 12/03/09 2009        0
## 5      Tlatlaya  Nuevo Copaltepec La alcantarilla 10/08/10 2010        4
## 6      Tlatlaya  Nuevo Copaltepec La alcantarilla 16/05/11 2011        4
##   treated      lat       lon trap_type
## 1       6 18.03546 -100.2095         1
## 2       2 18.40417 -100.2688         1
## 3      29 18.40417 -100.2688         4
## 4       0 18.40417 -100.2688         3
## 5       3 18.40417 -100.2688         1
## 6       3 18.40417 -100.2688         2

1 Plots in R

By default R already has a set of functions to create a variety of figures, but the code can get quite complex and difficult to read as we produce more detailed figures. ggplot2 is a library that provides a set of functions for producing a variety of figures.

The function ggplot() has to be called at the beginning of the plot definition, this function sets a blank canvas for our plot. If we call the function with no arguments we will just see the empty canvas, for example:

ggplot()

Then we can add layers to our canvas based on the data we want to visualize, similarly to the pipes, we will connect the different layers of our plot with the operator +.

The basic components that we need to define for a plot are the following:

  • data, the data set we will use to generate the figure
  • geometry, or type of graphic we will generate (i.e. histogram, bar, scatter, etc..)
  • aesthetic, variables or arguments that will be used for the figure for example: location, color, size, etc..

An example:

ggplot(data = captures) + # This is the data we will use
  geom_histogram( # This is the geometry 
    aes(x = treated) # The aesthetic includes only one variable representing the x axis
  )

Other components of the plots can be defined to further customize our figures, and we will cover those more in detail in future sections.
As you noticed in the previous example, we can print the figures directly from the R console, but a way I like to organize our figures is to put them all inside a single object in R. This object can be a list, which is just a container for other objects.

# To create an empty list we can use the function list()
figures <- list()

2 Visualizing distributions

2.1 Continous variables

2.1.1 Histograms

The most simple way to visualize the distribution of a continuous variable is using a histogram. Histograms are a special kind of bar plots where our variable is grouped in bins and showing the counts for each bin. Now that we have our container list for the plots, we can simply save it there and assign a name we want.

Notice that we will combine the pipes with the ggplot syntax. you can either define the name of the data in the ggplot function or before the function and connect it with a pipe.

figures$histogram <- captures %>% # This is the data we use.
  ggplot() + # we set the empty canvas
  geom_histogram(aes(x = treated)) # add a layer to visualize a histogram

# we can see our plot by calling the name in our container list
figures$histogram

2.1.2 Boxplots

Box plots are great to show the distribution of a continuous variable. We can use it to show only one variable, or multiple variables. It is important to be very descriptive when making plots, the idea of a figure is that can explain itself. we will start to slowly introduce functions to do this and customize our figures.

# Only one variable
figures$box <- captures %>% 
  ggplot() +
  geom_boxplot(aes(y = treated))

figures$box

2.2 Barplots

Bar plots are great to represent frequencies of categories. In the following example we will first count the number of treated by year, and then see it in a bar plot.

figures$bars <- captures %>% 
  count(year, wt = treated) %>% 
  ggplot() +
  geom_bar(aes(
    x = year, # X axis
    y = n # Y axis
  ), stat = 'identity') # type of barplot

figures$bars

3 Visualizing relationships

3.1 Scatterplots

This is one of the most popular kind of plots, it is useful to represent relationship between two continuous variables.

figures$scatter <- captures %>% # first we start with the name of our data.frame
  ggplot() + # then we set the canvas
  geom_point(aes(x = treated, y = captures)) # and we add layer for points

figures$scatter

4 Time series

To create a time series we will need to reformat the data a little bit so R can do what we want. We will introduce a new kind of variable: date. The date variables are pretty much what it sounds like, is a variable that has a format with year, month and day; there are other ways to format dates in R, but this is the most common and straight forward.

tCaptures <- captures %>% 
  count(date, wt = treated) %>% 
  mutate(date = as.Date(date, "%m/%d/%y"), # First we will format the date
         week = format(date, "%V"),
         week = lubridate::floor_date(date, 'week')) # The we create a variable formatting the date as week of the year

Now that we have our variables in the correct format, we can use it as any other variable.

figures$timeseries <- tCaptures %>% 
  ggplot() +
  geom_line(aes(x = week, y = n))

figures$timeseries

5 Arranging the plots in a layout

Now that we have all the figures in a list, we can make arrangements with our figures. For this we use the function ggarrange() from the ggpubr library.

library(ggpubr) # load the library
ggarrange(plotlist = figures)

6 Further customization

6.1 Labels

Usually we try to avoid spaces when using names for the column names, but for our plots this could be not the most straight forward way to communicate our analysis, we can set specific labels to make our plots more readable and self explanatory. Let’s improve bar plot figure a bit more to make it clearer:

figures$bars <- captures %>% 
  count(year, wt = treated) %>% 
  ggplot() +
  geom_bar(aes(x = year, y = n), stat = 'identity') + # type of barplot
  labs(# We will use the function labs to generate our labels
    title = 'Number of treated by year',
    x = 'Year',
    y = 'Number of treated'
  )

figures$bars

6.2 Themes

ggplot includes the function theme() to define most of the aspects of the figure such as the background color, the grid, axes, legend, among many others. There is also several predefined themes (all start with theme_ followed by the name of the theme) that you can use, if don’t want to mess with all the arguments from the function theme(). For example:

# all the predefined themes start with theme_
figures$timeseries <- tCaptures %>% 
  ggplot() +
  geom_line(aes(x = week, y = n)) +
  labs(title = 'Treated by year', x = 'Week', y = 'Number of treated') +
  theme_minimal() # We will use the theme minimal

figures$timeseries

6.3 Other aesthetics

6.3.1 Shape

There are other aesthetics we can define such as color, type of point, size, among many others. Lets try changing the point shape for one of the plots we previously did:

figures$scatter <- captures %>% # the data we are using
  ggplot() + # we set the canvas
  geom_point(aes(
    x = captures, # X axis
    y = treated, # Y axis
    shape = factor(trap_type) # point shape
  )) +
  labs(title = 'Captures and treatments by year', x = 'Captures', y = 'Treated', shape = 'Trap type') +
  theme_classic() # now lets try the theme 'classic'

figures$scatter

6.3.2 Color

In the next example we will use the trap type to color our boxplot:

captures %>% 
  ggplot() + # set the canvas
  geom_boxplot(aes(y = treated, fill = factor(trap_type))) # we add a boxplot layer

We can also use the variable to color other parts of the plot such as the border:

captures %>% 
  ggplot() + # set the canvas
  geom_boxplot(aes(y = treated, col = factor(trap_type))) # we add a boxplot layer

6.3.3 Non aesthetics customization

So far we have added variables inside our aes() function, but we can add some arguments outside the aes() function that we want them to be applied for all observations. For example, we can change the outline of the boxplot to be the same for the two groups, but the fill color different per group:

captures %>% 
  ggplot() + # set the canvas
  geom_boxplot(
    aes(y = treated, fill = factor(trap_type)), # This is the normal aesthetics we define
    col = 'red' # all aesthetics we define here will be applied to all th ebservations
  ) # we add a boxplot layer

6.4 Colors

To define specific colors for our figure, we can use the function scale_*_manual where the * represents the aesthetic we want to represent. If we want to use the color for the fill, we would use:

figures$box <- captures %>% 
  ggplot() +
  geom_boxplot(aes(y = treated, fill = factor(trap_type))) +
  scale_fill_manual(values = c('red', 'green', 'blue', 'yellow'))

figures$box

R manages colors in three different ways: by name (i.e: ‘red’), by rgb value using the function rgb() (i.e. rgb(1, 0, 0)), or using hexadecimal code (i.e. “#F00000”). You can get a full list of the named colors in R by using the function colors(), but you will only be able to see the names. Luckly someone made a tool that can help us exacly the colors that we want: the Colour Picker addin. Addins are tools that are available in Rstudio to facilitate tasks, lets try the colour picker (should be already in your addins toolbar).

Exercise: Pick 4 colors you like and use them to create a boxplot of the distribution of number of treated by the trap type.

7 Facets

Facets are a way of stratifying the data based on variables in the data set, you can think about it in a similar way we have been using groups. To create a stratified plot we can use the function facet_grid() which will ask for a variable to go in the rows and another for columns:

figures$histogram <- captures %>% # The data we will use 
  ggplot() + # set the canvas
  geom_histogram(aes(treated), fill = 'red4') + # We will create a histogram of the Age
  facet_grid(
    rows = vars(trap_type), # We will use the Sex variable for rows
    cols = vars(year) # We use the Result variable for Columns
  ) +
  theme_pubclean() # lets try another theme

figures$histogram

8 Exporting plots

Now that we have a good mix of different figures, lets export them. We can use the function ggsave() to export a high definition figure.

ggsave(figures$histogram, filename = 'myHistogram.png')

8.1 More customization

You can create more detailed arrangements of figures like i will show in the following example:

ggarrange( # Funciton to arrange the figures
  figures$box, figures$scatter, figures$bars, figures$timeseries, # These are the figures I will arrange
  heights = c(5, 3), # We can define the heights for each of the rows
  labels = c("A","B", "C", "D") # add a label for each figure
) %>% 
  annotate_figure( # Function to annotate the figure
    top = text_grob( # Top title
      label = 'Distribution of the data', # Label for title
      face = 'bold', # Bold letters
      size = 18 # Size of the title
    )
  )

9 Interactive figures

Having static figures is the most common application of graphics in R, but R is also capable of making interactive figures that can be used in dashboards and other platforms (i.e. shiny, or quarto). There are several libraries that allow you to create interactive figures, one of the most popular ones is called plotly. The best part of plotly is that if you learn how to use ggplot, you can transfer your figures to interactive plotly figures pretty much seamlessly. Lets try that.

We use the function ggplotly() from the plotly library to do that:

library(plotly) # laod the plotly library
# Use the ggplotly function in one of the figures we previouslt created:
ggplotly(figures$histogram)

This lab has been developed with contributions from: Jose Pablo Gomez-Vazquez.
Feel free to use these training materials for your own research and teaching. When using the materials we would appreciate using the proper credits. If you would be interested in a training session, please contact: