Homepage: https://cadms-ucd.github.io/PROCINORTE_TT/




Introduction to R studio
Operators are characters with a specific function in R for example
[1] 6
[1] 1
[1] 16
Later we will see other kind of operators, but… DONT STRESS about learning everything.
Objects in R are containers for information, we can create objects with any names we want that start with a letter
Using the c() function
[1] TRUE
[1] FALSE
[1] TRUE
Notice that we are using operators to make the comparisons
Functions are a special kind of object. Functions are objects that require arguments, the arguments needs to be inside parentheses.
# create a sequence of numbers
seq(
from = 0, # Starting number
to = 80, # Ending number
by = 20 # number increment of the sequence
) [1] 0 20 40 60 80
Notice that the arguments are named in the function, the arguments in the function seq() function are from, to, by.
We can create our own functions, which we will talk more about in the labs
x <- seq(from = 5, to = 23, length.out = 10) # create a sequence of numbers
y <- seq(from = 0.1, to = 0.78, length.out = 10) # Create another sequence
mean(x*y) # Get the mean of the multiplication[1] 7.406667
Objects:
- x
- y
Operators:
- *
- <-
- =
Functions:
- seq()
- mean()
Arguments:
- from
- to
- lengt.out
R is like a calculator, we can make mathematical operations, for example:
You can store more than one value using vectors, to create a vector of numbers we use c().
x <- c(5, 6, 7, 8, 9, 10) # create a sequence form 5 to 10
y = 5:10 # create the same sequence but with a different approach
x == y # ask R if the objects have the same information[1] TRUE TRUE TRUE TRUE TRUE TRUE
Using the keys “alt” + “-” will automatically add the operator <-.
When we have a vector, we can ask R specific values inside an object by using the operator [ ] and specifying which ones we want.
When we have a vector, we can ask R specific values inside an object by using the operator [ ] and specifying which ones we want.
We can put functions inside function, for example, to get \(\sqrt{\sum_1^n x}\) the square root of a sum of the numbers in x we can use:
The following function has only one argument which is a name (string) and just pastes some text before and after:
\[\sqrt{\sum_1^n x}\]
%>%Pipes (%>%), can connect several functions to an object.
For example, if we want to execute a function F1() followed by another function F2() for the object x:
\[\sqrt{\sum_1^n x}\]
Instead of this:
Instead of this:
Instead of this:
# Get the number of outgoing and incoming shipments
Out <- rename(summarise(group_by(mov, id_orig), Outgoing = n()), id = id_orig)We can write this:
# Get the number of outgoing and incoming shipments
Out <- mov %>% # This is the movement data set
group_by(id_orig) %>% # Group by origin
summarise(Outgoing = n()) %>% # Count the number of observations
rename(id = id_orig) # Rename the variable
And we can break down the code easier!
R syntax (Section 1)
Download the excel file from this link. It’s not necessary to have a Box account.
| Result | Sex | Age | OtherSpecies | id | name | farm_type | County |
|---|---|---|---|---|---|---|---|
| No | H | 18 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie |
| No | H | 60 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie |
| No | H | 60 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie |
| Yes | H | 36 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie |
Sometimes we want to select specific columns and rows on our data to reduce the dimensionality, for this we can use the functions:
select() to select specific columnsslice() to select specific rows based on positionfilter() to select specific rows based on a condition| Result | farm_type |
|---|---|
| No | sow farm |
| No | sow farm |
| No | sow farm |
| Yes | sow farm |
| Yes | sow farm |
We can also specify which columns we DON’T want to show in our data:
| Result | Sex | OtherSpecies | name | farm_type | County |
|---|---|---|---|---|---|
| No | H | 0 | Armstrong Research Farm | sow farm | Pottawattamie |
| No | H | 0 | Armstrong Research Farm | sow farm | Pottawattamie |
| No | H | 0 | Armstrong Research Farm | sow farm | Pottawattamie |
| Yes | H | 0 | Armstrong Research Farm | sow farm | Pottawattamie |
| Yes | H | 0 | Armstrong Research Farm | sow farm | Pottawattamie |
Filtering only the observations from boar studs:
| Result | Sex | Age | OtherSpecies | id | name | farm_type | County |
|---|---|---|---|---|---|---|---|
| No | H | 48 | 0 | 32 | Farm Sweet Farm at Rosmann Family Farms | boar stud | Shelby |
| No | H | 60 | 0 | 32 | Farm Sweet Farm at Rosmann Family Farms | boar stud | Shelby |
| Yes | H | 60 | 0 | 32 | Farm Sweet Farm at Rosmann Family Farms | boar stud | Shelby |
| Yes | H | 15 | 0 | 32 | Farm Sweet Farm at Rosmann Family Farms | boar stud | Shelby |
| No | H | 68 | 0 | 32 | Farm Sweet Farm at Rosmann Family Farms | boar stud | Shelby |
| Result | Sex | Age | OtherSpecies | id | name | farm_type | County | SowFarm |
|---|---|---|---|---|---|---|---|---|
| No | H | 18 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie | 1 |
| No | H | 60 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie | 1 |
| No | H | 60 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie | 1 |
| Yes | H | 36 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie | 1 |
| Yes | H | 50 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie | 1 |
We can calculate different statistics by group. For example lets calculate the mean and standard deviation of the age by Result and Sex:
| Result | Sex | meanAge | sdAge |
|---|---|---|---|
| No | H | 39.67300 | 24.82636 |
| No | M | 22.39357 | 16.67979 |
| Yes | H | 23.61135 | 19.79150 |
| Yes | M | 15.10870 | 10.30037 |
Sometimes we have different data sets that have variables in common and we want to integrate them into a single data set for further analysis.
Farms:
| id | name | lat | long | farm_type |
|---|---|---|---|---|
| 1 | Iowa Select Farms Inc | 42.50489 | -93.26323 | sow farm |
| 2 | Stanley Martins Fleckvieh Farms | 43.08261 | -91.56682 | sow farm |
| 3 | Centrum Valley Farms | 42.66331 | -93.63630 | nursery |
| 4 | Hilltop Farms fresh produce | 41.71651 | -93.90491 | sow farm |
| 5 | Hog Slat Inc. | 42.25929 | -91.15566 | GDU |
Movements:
| id | Outgoing |
|---|---|
| 1 | 30 |
| 3 | 13 |
| 4 | 15 |
| 5 | 33 |
| 6 | 11 |
| id | name | lat | long | farm_type | Outgoing |
|---|---|---|---|---|---|
| 1 | Iowa Select Farms Inc | 42.50489 | -93.26323 | sow farm | 30 |
| 2 | Stanley Martins Fleckvieh Farms | 43.08261 | -91.56682 | sow farm | NA |
| 3 | Centrum Valley Farms | 42.66331 | -93.63630 | nursery | 13 |
| 4 | Hilltop Farms fresh produce | 41.71651 | -93.90491 | sow farm | 15 |
| 5 | Hog Slat Inc. | 42.25929 | -91.15566 | GDU | 33 |
R syntax (Sections 2 and 3)

Instead of the %>%, in ggplot we connect pieces of code with +
The basic components that we need to define for a plot are the following:
| municipality | location | Loc | date | year | captures | treated | lat | lon | trap_type |
|---|---|---|---|---|---|---|---|---|---|
| Temascaltepec | San Pedro Tenayac | Cueva el Uno | 11/06/14 | 2014 | 6 | 6 | 18.03546 | -100.2095 | 1 |
| Tlatlaya | Nuevo Copaltepec | La alcantarilla | 12/05/05 | 2005 | 3 | 2 | 18.40417 | -100.2688 | 1 |
| Tlatlaya | Nuevo Copaltepec | La alcantarilla | 12/05/07 | 2007 | 30 | 29 | 18.40417 | -100.2688 | 4 |
| Tlatlaya | Nuevo Copaltepec | La alcantarilla | 12/03/09 | 2009 | 0 | 0 | 18.40417 | -100.2688 | 3 |
| Tlatlaya | Nuevo Copaltepec | La alcantarilla | 10/08/10 | 2010 | 4 | 3 | 18.40417 | -100.2688 | 1 |
| year | n |
|---|---|
| 2005 | 167 |
| 2006 | 103 |
| 2007 | 249 |
| 2008 | 143 |
| 2009 | 125 |
Comments
COMMENT AS MUCH AS POSSIBLE!
What is the difference between line 1 and 2?
YES! the
#character will make everything after it a comment in that line of code