R and Data visualization

Pablo Gomez

Introduction

What is this workshop?

This workshop is an introduction
DO NOT EXPECT TO MASTER R AFTER THE WORKSHOP
If you already have experience with R, is always nice to see how other people do things, so feel free to share!

Tentative Schedule

Workshop format

Homepage: https://cadms-ucd.github.io/PROCINORTE_TT/

Posit cloud

https://posit.cloud

It’s Lab time!

Introduction to R studio

Some programming concepts

Comments

COMMENT AS MUCH AS POSSIBLE!

# This is a comment in R it will be only for the user
This is not a comment and will cause an error

What is the difference between line 1 and 2?

YES! the # character will make everything after it a comment in that line of code

10 + 10 # Everything after will be a comment
7 + 4

Operators

Operators are characters with a specific function in R for example

3 + 3 # this is a sum operator

[1] 6

3 - 2 # this is a subtract operator

[1] 1

4 * 4 # This is a multiplication

[1] 16

Later we will see other kind of operators, but… DONT STRESS about learning everything.

Objects

Objects in R are containers for information, we can create objects with any names we want that start with a letter

myNumber <- 4
myResult <- 4 * 5

Storing multiple elements

Using the c() function

x <- c(1, 3, 5) # using the c() function
x

[1] 1 3 5

Using the list() function

y <- list(1, 3, 5) # using the list() function
y

[[1]]
[1] 1

[[2]]
[1] 3

[[3]]
[1] 5

Boolean logic

1 == 1 # is it equal?

[1] TRUE

1 != 1 # is it NOT equal?

[1] FALSE

1 %in% c(1, 2, 3) # is the number contained in the sequence?

[1] TRUE

Notice that we are using operators to make the comparisons

Functions

Functions are a special kind of object. Functions are objects that require arguments, the arguments needs to be inside parentheses.

# create a sequence of numbers
seq(
  from = 0, # Starting number
  to = 80, # Ending number
  by = 20 # number increment of the sequence
)

[1]  0 20 40 60 80

Notice that the arguments are named in the function, the arguments in the function seq() function are from, to, by.

We can create our own functions, which we will talk more about in the labs

Variables in R

numeric, continuous numeric variables WITH any decimal values. For example: KG of product imported, probability of an event happening.
integer , Whole numbers WITHOUT decimal values. For example: Number of animals, number of shipments, etc..
character , Alphanumeric variables. For example: name of a region, name of a disease, farm ID.
factor , Alphanumeric variable with specific categories or levels. For example: type of product imported, type of farm, etc…

Test time!

x <- seq(from = 5, to = 23, length.out = 10) # create a sequence of numbers
y <- seq(from = 0.1, to = 0.78, length.out = 10) # Create another sequence
mean(x*y) # Get the mean of the multiplication

[1] 7.406667

Objects:
- x
- y

Operators:
- *
- <-
- =

Functions:
- seq()
- mean()

Arguments:
- from
- to
- lengt.out

R syntax

R Syntax

R is like a calculator, we can make mathematical operations, for example:

x = 2 # create a new object with the = operator
y <- 2 # create a new object with the <- operator
x + y # make a operation with the objects

[1] 4

Vectors in R

You can store more than one value using vectors, to create a vector of numbers we use c().

x <- c(5, 6, 7, 8, 9, 10) # create a sequence form 5 to 10
y = 5:10 # create the same sequence but with a different approach
x == y # ask R if the objects have the same information

[1] TRUE TRUE TRUE TRUE TRUE TRUE

Using the keys “alt” + “-” will automatically add the operator <-.

Vector operations

When we have a vector, we can ask R specific values inside an object by using the operator [ ] and specifying which ones we want.

[1]  5  6  7  8  9 10

# Here we ask the 3rd value from our sequence
x[3]

[1] 7

Vector operations

When we have a vector, we can ask R specific values inside an object by using the operator [ ] and specifying which ones we want.

[1]  5  6  7  8  9 10

[1]  5  6  7  8  9 10

# Now we multiply the 3rd value of the x sequence times the 5th value of the y sequence
x[3] * y[5]

[1] 63

Functions

# To get the sum of a vector of numbers inside an object we use sum()
sum(x)

[1] 45

Functions

We can put functions inside function, for example, to get \(\sqrt{\sum_1^n x}\) the square root of a sum of the numbers in x we can use:

sqrt(sum(x))

[1] 6.708204

Making functions

The following function has only one argument which is a name (string) and just pastes some text before and after:

F1 <- function(name){
  paste("Hola", name, "! welcome to the R world (: !") # paste the name with some text
}

# Testing the function (Put your name)
F1(name = "Pablo")

[1] "Hola Pablo ! welcome to the R world (: !"

Remember this?

\[\sqrt{\sum_1^n x}\]

sqrt(sum(x))

Introducing the pipes `%>%`

Pipes (%>%), can connect several functions to an object.

For example, if we want to execute a function F1() followed by another function F2() for the object x:

F2(F1(x))

is equivalent to:

x %>% F1() %>% F2()

For example

\[\sqrt{\sum_1^n x}\]

Instead of this:

sqrt(sum(x))

We can write it like this:

x %>% sum() %>% sqrt()

For example

Instead of this:

# Get the number of outgoing and incoming shipments 
Out <- rename(summarise(group_by(mov, id_orig), Outgoing = n()), id = id_orig)

We can write this:

# Get the number of outgoing and incoming shipments 
Out <- mov %>% 
  group_by(id_orig) %>%
  summarise(Outgoing = n()) %>%
  rename(id = id_orig)

For example

Instead of this:

# Get the number of outgoing and incoming shipments 
Out <- rename(summarise(group_by(mov, id_orig), Outgoing = n()), id = id_orig)

We can write this:

# Get the number of outgoing and incoming shipments 
Out <- mov %>% # This is the movement data set
  group_by(id_orig) %>% # Group by origin
  summarise(Outgoing = n()) %>% # Count the number of observations
  rename(id = id_orig) # Rename the variable

And we can break down the code easier!

It’s Lab time!

R syntax (Section 1)

Let’s use some Data!

Importing data

Download the excel file from this link. It’s not necessary to have a Box account.

# Import the excel file
PRRS <- readxl::read_xlsx("Data/PRRS.xlsx")
PRRS

Result	Sex	Age	id	name	farm_type	County
No	H	18	23	Armstrong Research Farm	sow farm	Pottawattamie
No	H	60	23	Armstrong Research Farm	sow farm	Pottawattamie
No	H	60	23	Armstrong Research Farm	sow farm	Pottawattamie
Yes	H	36	23	Armstrong Research Farm	sow farm	Pottawattamie

Reducing the data

Sometimes we want to select specific columns and rows on our data to reduce the dimensionality, for this we can use the functions:

select() to select specific columns
slice() to select specific rows based on position
filter() to select specific rows based on a condition

Selecting specific columns

PRRS %>%  # the name of our data
  select(Result, farm_type) # we want to select only the columns Result and farm_type

Result	farm_type
No	sow farm
No	sow farm
No	sow farm
Yes	sow farm
Yes	sow farm

Selecting specific columns

We can also specify which columns we DON’T want to show in our data:

PRRS %>% 
  select(-Age, -id) # with a '-' before the name we will exclude the column from the data

Result	Sex	name	farm_type	County
No	H	Armstrong Research Farm	sow farm	Pottawattamie
No	H	Armstrong Research Farm	sow farm	Pottawattamie
No	H	Armstrong Research Farm	sow farm	Pottawattamie
Yes	H	Armstrong Research Farm	sow farm	Pottawattamie
Yes	H	Armstrong Research Farm	sow farm	Pottawattamie

Filtering specific rows

Filtering only the observations from boar studs:

PRRS %>% 
  filter(farm_type == 'boar stud') # we will use the equals to operator for this

Result	Sex	Age	id	name	farm_type	County
No	H	48	32	Farm Sweet Farm at Rosmann Family Farms	boar stud	Shelby
No	H	60	32	Farm Sweet Farm at Rosmann Family Farms	boar stud	Shelby
Yes	H	60	32	Farm Sweet Farm at Rosmann Family Farms	boar stud	Shelby
Yes	H	15	32	Farm Sweet Farm at Rosmann Family Farms	boar stud	Shelby
No	H	68	32	Farm Sweet Farm at Rosmann Family Farms	boar stud	Shelby

Creating variables

# Creating a new variable
PRRS %>% # name of the data set
  mutate( # mutate is the function we sue to create new variables
    SowFarm = ifelse(farm_type == 'sow farm', 1, 0) # we will create a binary variable where 1 = sow farm, 0 = Any other farm type
  )

Result	Sex	Age	id	name	farm_type	County	SowFarm
No	H	18	23	Armstrong Research Farm	sow farm	Pottawattamie	1
No	H	60	23	Armstrong Research Farm	sow farm	Pottawattamie	1
No	H	60	23	Armstrong Research Farm	sow farm	Pottawattamie	1
Yes	H	36	23	Armstrong Research Farm	sow farm	Pottawattamie	1
Yes	H	50	23	Armstrong Research Farm	sow farm	Pottawattamie	1

Grouping the data

We can calculate different statistics by group. For example lets calculate the mean and standard deviation of the age by Result and Sex:

PRRS %>%
  group_by(Result, Sex) %>% # The groups used for the data
  summarise( # the function summarise calculates statistics by the defined groups
    meanAge = mean(Age), # Calculate the mean age
    sdAge = sd(Age) # Calculate the standard deviation
  )

Result	Sex	meanAge	sdAge
No	H	39.67300	24.82636
No	M	22.39357	16.67979
Yes	H	23.61135	19.79150
Yes	M	15.10870	10.30037

Joining data sets

Sometimes we have different data sets that have variables in common and we want to integrate them into a single data set for further analysis.

Joining data sets

Farms:

nodes

id	name	lat	long	farm_type
1	Iowa Select Farms Inc	42.50489	-93.26323	sow farm
2	Stanley Martins Fleckvieh Farms	43.08261	-91.56682	sow farm
3	Centrum Valley Farms	42.66331	-93.63630	nursery
4	Hilltop Farms fresh produce	41.71651	-93.90491	sow farm
5	Hog Slat Inc.	42.25929	-91.15566	GDU

Joining data sets

Movements:

Out

id	Outgoing
1	30
3	13
4	15
5	33
6	11

Joining data sets

# Joining the two datasets
nodes <- nodes %>% 
  left_join(Out, by ="id")

id	name	lat	long	farm_type	Outgoing
1	Iowa Select Farms Inc	42.50489	-93.26323	sow farm	30
2	Stanley Martins Fleckvieh Farms	43.08261	-91.56682	sow farm	NA
3	Centrum Valley Farms	42.66331	-93.63630	nursery	13
4	Hilltop Farms fresh produce	41.71651	-93.90491	sow farm	15
5	Hog Slat Inc.	42.25929	-91.15566	GDU	33

Back to the excercise!

R syntax (Sections 2 and 3)

Data visualization

ggplot2

We build our figures based on layers
Similar syntax as dplyr
We can combine data wrangling and visualization into a single code chunk

Instead of the %>%, in ggplot we connect pieces of code with +

ggplot2

The basic components that we need to define for a plot are the following:

data, the data set we will use to generate the figure
geometry, or type of graphic we will generate (i.e. histogram, bar, scatter, etc..)
aesthetic, variables or arguments that will be used for the figure for example: location, color, size, etc..

Example

captures # Data

municipality	location	Loc	date	year	captures	treated	lat	lon	trap_type
Temascaltepec	San Pedro Tenayac	Cueva el Uno	11/06/14	2014	6	6	18.03546	-100.2095	1
Tlatlaya	Nuevo Copaltepec	La alcantarilla	12/05/05	2005	3	2	18.40417	-100.2688	1
Tlatlaya	Nuevo Copaltepec	La alcantarilla	12/05/07	2007	30	29	18.40417	-100.2688	4
Tlatlaya	Nuevo Copaltepec	La alcantarilla	12/03/09	2009	0	0	18.40417	-100.2688	3
Tlatlaya	Nuevo Copaltepec	La alcantarilla	10/08/10	2010	4	3	18.40417	-100.2688	1

Example

captures %>% # Data used
  count(year, wt = treated)  # Some data transformation

year	n
2005	167
2006	103
2007	249
2008	143
2009	125

Example

captures %>% # Data used
  count(year, wt = treated) %>%   # Some data wrangling
  ggplot() # Add an empty canvas

Example

captures %>% # Data used
  count(year, wt = treated) %>%   # Some data wrangling
  ggplot() + # Add an empty canvas
  geom_bar() # This is the geometry type

Example

captures %>% # Data used
  count(year, wt = treated) %>%   # Some data wrangling
  ggplot() + # Add an empty canvas
  geom_bar( # This is the geometry type
    aes( # Aesthetics or mapping
      x = year, # X axis
      y = n # Y axis
      ), 
    stat = 'identity' # extra arguments
  )

Example

captures %>% # Data used
  count(year, wt = treated) %>%   # Some data wrangling
  ggplot() + # Add an empty canvas
  geom_bar(aes(x = year, y = n), stat = 'identity') +
  labs(# extra arguments
    title = 'Bar plot',
    x = 'Year', y = 'Frequency'
  )

ggplot2

Link to the cheasheet

Other cheatsheets

R and Data visualization

Introduction

What is this workshop?

Tentative Schedule

Workshop format

Posit cloud

Posit cloud

It’s Lab time!

Some programming concepts

Comments

Operators

Objects

Storing multiple elements

Boolean logic

Functions

Variables in R

Test time!

R syntax

R Syntax

Vectors in R

Vector operations

Vector operations

Functions

Functions

Making functions

Remember this?

Introducing the pipes %>%

For example

For example

For example

It’s Lab time!

Let’s use some Data!

Importing data

Reducing the data

Selecting specific columns

Selecting specific columns

Filtering specific rows

Creating variables

Grouping the data

Joining data sets

Joining data sets

Joining data sets

Joining data sets

Back to the excercise!

Data visualization

ggplot2

ggplot2

Example

Example

Example

Example

Example

Example

ggplot2

It’s Lab time!

Introducing the pipes `%>%`