Homepage: https://cadms-ucd.github.io/PROCINORTE_TT/
Introduction to R studio
Operators are characters with a specific function in R for example
[1] 6
[1] 1
[1] 16
Later we will see other kind of operators, but… DONT STRESS about learning everything.
Objects in R are containers for information, we can create objects with any names we want that start with a letter
Using the c()
function
[1] TRUE
[1] FALSE
[1] TRUE
Notice that we are using operators to make the comparisons
Functions are a special kind of object. Functions are objects that require arguments, the arguments needs to be inside parentheses.
# create a sequence of numbers
seq(
from = 0, # Starting number
to = 80, # Ending number
by = 20 # number increment of the sequence
)
[1] 0 20 40 60 80
Notice that the arguments are named in the function, the arguments in the function seq()
function are from
, to
, by
.
We can create our own functions, which we will talk more about in the labs
x <- seq(from = 5, to = 23, length.out = 10) # create a sequence of numbers
y <- seq(from = 0.1, to = 0.78, length.out = 10) # Create another sequence
mean(x*y) # Get the mean of the multiplication
[1] 7.406667
Objects:
- x
- y
Operators:
- *
- <-
- =
Functions:
- seq()
- mean()
Arguments:
- from
- to
- lengt.out
R is like a calculator, we can make mathematical operations, for example:
You can store more than one value using vectors, to create a vector of numbers we use c()
.
x <- c(5, 6, 7, 8, 9, 10) # create a sequence form 5 to 10
y = 5:10 # create the same sequence but with a different approach
x == y # ask R if the objects have the same information
[1] TRUE TRUE TRUE TRUE TRUE TRUE
Using the keys “alt” + “-” will automatically add the operator <-
.
When we have a vector, we can ask R specific values inside an object by using the operator [ ]
and specifying which ones we want.
When we have a vector, we can ask R specific values inside an object by using the operator [ ]
and specifying which ones we want.
We can put functions inside function, for example, to get \(\sqrt{\sum_1^n x}\) the square root of a sum of the numbers in x we can use:
The following function has only one argument which is a name (string) and just pastes some text before and after:
\[\sqrt{\sum_1^n x}\]
%>%
Pipes (%>%
), can connect several functions to an object.
For example, if we want to execute a function F1()
followed by another function F2()
for the object x
:
\[\sqrt{\sum_1^n x}\]
Instead of this:
Instead of this:
Instead of this:
# Get the number of outgoing and incoming shipments
Out <- rename(summarise(group_by(mov, id_orig), Outgoing = n()), id = id_orig)
We can write this:
# Get the number of outgoing and incoming shipments
Out <- mov %>% # This is the movement data set
group_by(id_orig) %>% # Group by origin
summarise(Outgoing = n()) %>% # Count the number of observations
rename(id = id_orig) # Rename the variable
And we can break down the code easier!
R syntax (Section 1)
Download the excel file from this link. It’s not necessary to have a Box account.
Result | Sex | Age | OtherSpecies | id | name | farm_type | County |
---|---|---|---|---|---|---|---|
No | H | 18 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie |
No | H | 60 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie |
No | H | 60 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie |
Yes | H | 36 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie |
Sometimes we want to select specific columns and rows on our data to reduce the dimensionality, for this we can use the functions:
select()
to select specific columnsslice()
to select specific rows based on positionfilter()
to select specific rows based on a conditionResult | farm_type |
---|---|
No | sow farm |
No | sow farm |
No | sow farm |
Yes | sow farm |
Yes | sow farm |
We can also specify which columns we DON’T want to show in our data:
Result | Sex | OtherSpecies | name | farm_type | County |
---|---|---|---|---|---|
No | H | 0 | Armstrong Research Farm | sow farm | Pottawattamie |
No | H | 0 | Armstrong Research Farm | sow farm | Pottawattamie |
No | H | 0 | Armstrong Research Farm | sow farm | Pottawattamie |
Yes | H | 0 | Armstrong Research Farm | sow farm | Pottawattamie |
Yes | H | 0 | Armstrong Research Farm | sow farm | Pottawattamie |
Filtering only the observations from boar studs:
Result | Sex | Age | OtherSpecies | id | name | farm_type | County |
---|---|---|---|---|---|---|---|
No | H | 48 | 0 | 32 | Farm Sweet Farm at Rosmann Family Farms | boar stud | Shelby |
No | H | 60 | 0 | 32 | Farm Sweet Farm at Rosmann Family Farms | boar stud | Shelby |
Yes | H | 60 | 0 | 32 | Farm Sweet Farm at Rosmann Family Farms | boar stud | Shelby |
Yes | H | 15 | 0 | 32 | Farm Sweet Farm at Rosmann Family Farms | boar stud | Shelby |
No | H | 68 | 0 | 32 | Farm Sweet Farm at Rosmann Family Farms | boar stud | Shelby |
Result | Sex | Age | OtherSpecies | id | name | farm_type | County | SowFarm |
---|---|---|---|---|---|---|---|---|
No | H | 18 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie | 1 |
No | H | 60 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie | 1 |
No | H | 60 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie | 1 |
Yes | H | 36 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie | 1 |
Yes | H | 50 | 0 | 23 | Armstrong Research Farm | sow farm | Pottawattamie | 1 |
We can calculate different statistics by group. For example lets calculate the mean and standard deviation of the age by Result and Sex:
Result | Sex | meanAge | sdAge |
---|---|---|---|
No | H | 39.67300 | 24.82636 |
No | M | 22.39357 | 16.67979 |
Yes | H | 23.61135 | 19.79150 |
Yes | M | 15.10870 | 10.30037 |
Sometimes we have different data sets that have variables in common and we want to integrate them into a single data set for further analysis.
Farms:
id | name | lat | long | farm_type |
---|---|---|---|---|
1 | Iowa Select Farms Inc | 42.50489 | -93.26323 | sow farm |
2 | Stanley Martins Fleckvieh Farms | 43.08261 | -91.56682 | sow farm |
3 | Centrum Valley Farms | 42.66331 | -93.63630 | nursery |
4 | Hilltop Farms fresh produce | 41.71651 | -93.90491 | sow farm |
5 | Hog Slat Inc. | 42.25929 | -91.15566 | GDU |
Movements:
id | Outgoing |
---|---|
1 | 30 |
3 | 13 |
4 | 15 |
5 | 33 |
6 | 11 |
id | name | lat | long | farm_type | Outgoing |
---|---|---|---|---|---|
1 | Iowa Select Farms Inc | 42.50489 | -93.26323 | sow farm | 30 |
2 | Stanley Martins Fleckvieh Farms | 43.08261 | -91.56682 | sow farm | NA |
3 | Centrum Valley Farms | 42.66331 | -93.63630 | nursery | 13 |
4 | Hilltop Farms fresh produce | 41.71651 | -93.90491 | sow farm | 15 |
5 | Hog Slat Inc. | 42.25929 | -91.15566 | GDU | 33 |
R syntax (Sections 2 and 3)
Instead of the %>%
, in ggplot we connect pieces of code with +
The basic components that we need to define for a plot are the following:
municipality | location | Loc | date | year | captures | treated | lat | lon | trap_type |
---|---|---|---|---|---|---|---|---|---|
Temascaltepec | San Pedro Tenayac | Cueva el Uno | 11/06/14 | 2014 | 6 | 6 | 18.03546 | -100.2095 | 1 |
Tlatlaya | Nuevo Copaltepec | La alcantarilla | 12/05/05 | 2005 | 3 | 2 | 18.40417 | -100.2688 | 1 |
Tlatlaya | Nuevo Copaltepec | La alcantarilla | 12/05/07 | 2007 | 30 | 29 | 18.40417 | -100.2688 | 4 |
Tlatlaya | Nuevo Copaltepec | La alcantarilla | 12/03/09 | 2009 | 0 | 0 | 18.40417 | -100.2688 | 3 |
Tlatlaya | Nuevo Copaltepec | La alcantarilla | 10/08/10 | 2010 | 4 | 3 | 18.40417 | -100.2688 | 1 |
year | n |
---|---|
2005 | 167 |
2006 | 103 |
2007 | 249 |
2008 | 143 |
2009 | 125 |
Comments
COMMENT AS MUCH AS POSSIBLE!
What is the difference between line 1 and 2?
YES! the
#
character will make everything after it a comment in that line of code