1 Objectives

In this lab we will be using R, which is an open access software developed for statistical analysis (but can do some data manipulation, simulation and modeling). We will be using the package QuantRRA for rapid risk assessment (still under development). Make sure you have installed the latest version of R, Rstudio, and QuantRRA. If you don’t want/can’t install R and Rstudio in your local machine, you can create a free account of Rstudio cloud and use a cloud server to run the code. The objectives for this lab are:

  • Familiarize with methods to develop risk assessment using R
  • Integrate qualitative and quantitative information for the development of RRA
  • Develop a system that can be easily updated and re-parametrized
  • Use random forest and regression trees to evaluate the sensitivity of the model
  • Use the RRA model in multiple regions.

2 Basics

First we have to make sure we have installed a couple of packages to run this lab:

packages <- c('devtools', # Package for installing external dependencies
              'sf', # For spatial data manipulation
              'sparkline') # For visualization

## Now load or install&load all
package.check <- lapply(
  packages,
  FUN = function(x) {
    if (!require(x, character.only = TRUE)) {
      install.packages(x, dependencies = TRUE)
      library(x, character.only = TRUE)
    }
  }
)


devtools::install_github('spablotemporal/quantrra', dependencies = T) # RRA toolset
# Libraries we will use
library(sf); library(dplyr)
library(quantrra)

2.1 Sampling probability distributions

By default, R has several functions to sample a distribution and plot the results. For example, if we would like to sample a normal distribution with a mean of 5 and standard deviation of 0.12, we can use the following code:

# sample the distribution and save it to an object
x <- rnorm(n = 100, # Number of observations to sample
           mean = 5, # Mean 
           sd = 0.12) # Standard deviation

# Plot the observations
hist(x)

In this lab we will be using the package QuantRRA, which is specifically developed for rapid risk assessment in R. The function ra_sample() support multiple distributions such as: Normal, binomial, uniform, pert, among others. The function requires two arguments:

  • x which is a string (text) of the distribution and its parameters.
  • n the number of observations we will sample.
n <- 100 # number of observations
d <- 'Normal(5, 0.12)' # Distribution to sample

x <- ra_sample(x = d, n = n) # Function to sample the distribution

# We can use the function plotDist() from the package QuantRRA to get a more familiar output:
ra_plotDist(x, # the values sampled
         main = 'Distribution of x') # A title for our plot

2.2 The main function

Now that we covered the basics of quantrra, we will create our first model. The main function ra_run() requires two arguments:

  • M, the model file
  • n number of simulations.

Lets see an example of a pre-made model file. The following model was based on a risk assessment performed by the OIRSA (Organismo Internacional Regional de Sanidad Agropecuaria) for the introduction of ASF into the countries that are part of OIRSA, you can read more about this report in this link. The model estimates the number of introduction events from imported animal products swill feeding in a given year.

To represent the model we must identify the inputs and the outputs we want to calculate. The following figure was adapted from the report by OIRSA

Lets have a look of how we would represent this in R. This model was adapted and its contained in the package, to access the model file use the following code:

quantrra::OIRSA$nodes[, 1:6]

The model file (M) must be a data.frame where each of the rows is a parameter and the columns are the following:

  • id, Unique identifier for the parameter. Use short names with no special characters or spaces starting with a letter, i.e. I1, P3, etc…
  • label Name of the parameter.
  • type If its an input (In) or output (Out) parameter. Input parameters must have a distribution specified, otherwise NA; Output parameters must have a formula specified, NA otherwise
  • level Hierarchy level of the parameter, This will determine the order in which the output parameters will be calculated.
  • distribution Distribution for the input parameters, must follow the specification of the ra_sample() function from quantrra, if the parameter is an output, must be NA.
  • formula Formula used to calculate the Out parameters, the formula must include other parameters from lower hierarchy levels. The formula con use any operators such as multiplication (*), sum (+), substraction (-), exponential (^), division(/), etc…

When we use the function ra_run() on the model file, we will calculate the outcome n times based on the distributions specified:

# Save the model file to an object
M <- quantrra::OIRSA$nodes
# Run the model 5000 times
Mo <- ra_run(M = M, nsim = 5000)
# Visualize the results:
ra_plotDist(Mo$P)

The result of this is a table with the distributions sampled for the inputs, and the ones calculated for the outputs. Based on the results, we can estimate the probability that the number of introduction events will be more than 1:

sum(Mo$P > 1) / length(Mo$P)
## [1] 0.2108

2.3 Sensitivity analysis

We can perform sensitivity analysis on the model to identify the most relevant parameters and explore the parameters sample space, for this we can use the function ra_gsa() from the quantrra package. The function uses random forest to estimate the relative importance of the parameters, and classification and regression trees to visualize the interactions between the parameters sampled. The function requires 3 main arguments:

  • data: Our results table from the model.
  • formula A formula that contains the names of the outcome analyzed and the parameters.
  • tree: we can specify if we want an interactive or static visualization for our tree.
# First we specify the formula:
f <- P ~ P1 + P2 + P3 + R1 + R2 + H1 + H2
# Then we use the function with our results
SA <- ra_gsa(data = Mo, f = f, tree = 'interactive')
# The results contain 3 objects:
SA$VarianceExp # The variance explained by our parameters
## [1] 86.25587
SA$RelImport # The relative importance of our parameters

SA$RT # The classification and regression tree

3 Case study: Rabies in Mexico

Now we will be using a model to estimating the risk of rabies transmission from dogs to humans trough bite aggression. We will use the data set for the rabies vaccination program in Mexico. This data set contains the estimated dog and human population per state, number of reported dog bites, and the vaccination coverage for the years 2016 to 2020.

3.1 Data preparation

# Load the rabies data set
rb <- quantrra::.

# lets have a look at the first observations
head(rb)

We will use the data set to estimate the median, min, and max of the variables, and then use this to describe Pert distributions to use in further analysis.

# We need to format the data 
rbt <- rb %>% # The rabies data set
  group_by(State) %>%  # We will group our observations by state
  # Next we will calculate the median, min and max for each of our variables
  summarise_at(.vars = c('HumanPop', 'DogPop', 'Bites', 'VaccinationCov'), # These are the variables
               .funs = c(m = ~median(., na.rm = T), min = ~min(., na.rm = T), max = ~max(., na.rm = T))) %>% # These are the functions 
  # Next we will create new columns to specify the distributions we will use for the RRA
  mutate(DogPop = paste0('Pert(', DogPop_min, ', ', DogPop_m, ', ', DogPop_max, ')'),
         HumanPop = paste0('Pert(', HumanPop_min, ', ', HumanPop_m, ', ', HumanPop_max, ')'),
         Bites = paste0('Pert(', Bites_min, ', ', Bites_m, ', ', Bites_max, ')'),
         VaccinationCov = paste0('Pert(', VaccinationCov_min, ', ', VaccinationCov_m, ', ', VaccinationCov_max, ')'))

rbt

3.2 Planning the model.

Now we will create a model file based on the next scenario tree for the transmission of dog rabies to humans:

We need identify the inputs and outputs, and the relationships between the nodes in our scenario tree. To parametrize our model we will be using the data reported by the vaccination program (such as dog and human population, bite reports, and vaccination coverage) and some theoretical distributions that we can parametrize based on expert opinion or ‘what if?’ scenarios.

The outputs in the model will be calculated based in the inputs as follows:

\[ \begin{aligned} P(Contact) = O_1 &= \frac{I_1}{I_2} \\ P(Bite) = O_2 &= \frac{I_3}{I_1} \\ P(Infected\ dog) = O_3 &= (1 - I_4) \times I5\\ P(Transmission) = O_4 &= O_1 \times O_2 \times O_3 \times I_5\\ \end{aligned} \] We can also calculate the estimated number of transmission events in a given year using the following formula:

\[ E[Transmission\ events] = O_4 \times I_2 \] We will create our model file from scratch by defining each of the columns that needs to be present in order to calculate the outputs. We will be calculating the probability of transmission for one of the states in Mexico (Mexico city). Now complete the formulas described for the outputs to be calculated:

# Get the index for Mexico city:
state <- which(rbt$State == 'CIUDADDEMEXICO')

# Define the model: 
M <- data.frame(
  # First we will create our IDs
  id = c('I1', 'I2', 'O1',
         'I3', 'O2',
         'I4', 'I5', 'O3',
         'I6', 'O4',
         'E'),
  # Then we define the names for the nodes
  label = c('DogPop', 'HumanPop', 'Contact Probability',
            'Bites', 'Bite Probability',
            'VaccinationCoverage', 'Prevalence', 'Probability of Infected dog',
            'Transmission rate', 'Transmission probability', 
            'Expected transmission events'),
  # Type of node 
  type = c('In', 'In', 'Out',
           'In', 'Out',
           'In', 'In', 'Out',
           'In', 'Out',
           'Out'),
  # Hierarchy level
  level = c(0, 0, 1,
            0, 1,
            0, 0, 1,
            0, 2, 
            2),
  # Distributions for the input nodes
  distribution = c(rbt$DogPop[state], rbt$HumanPop[state], NA,
                   rbt$Bites[state], NA,
                   rbt$VaccinationCov[state], 'Pert(0.01, 0.05, 0.1)', NA,
                   'Pert(0.1, 0.3, 0.5)', NA,
                   NA),
  # Formulas for the output nodes
  formula = c(NA, NA, 'I1/I2', # O1 Contact
              NA, 'Formula for O2', # O2 Bite
              NA, NA, 'Formula', # O3 infection
              NA, 'Formula', # O4 Transmission
              'Formula')) # E Transmission events

3.3 Running the model

Now we will run 1000 simulations of the model and plot the distribution of the outputs calculated:

Mo <- ra_run(M = M, nsim = 1000)

ra_plotDist(Mo$O1, main = 'Contact probability') # Contact

Question 1 Plot the distributions of the Probability of aggression, a dog being infected with rabies, the dog-human transmission and the expected number of cases.

ra_plotDist(Mo$O2, main = 'Bite probability') # Bite

ra_plotDist(Mo$O3, main = 'Infected dog probability') # Infected

ra_plotDist(Mo$O4, main = 'Transmission probability') # Transmission

ra_plotDist(Mo$E, main = 'Expected number of cases') # Expected number of transmission events

Question 2 Based on the results, what would be the probability that there will be more than 1 transmission events?

sum(Mo$E > 1) / length(Mo$E)
## [1] 0.349

3.4 Sensitivity analysis

Now we will run the sensitivity analysis on the model results for the estimated number of transmission events

SA <- ra_gsa(data = Mo, f =  E ~ I1 + I2 + I3 + I4 + I5 + I6, tree = 'interactive')

SA$VarianceExp
## [1] 95.67769
SA$RelImport

SA$RT

Question 3 What can you say about the sensitivity of the model? would you say its a good model?

3.5 Generating a risk map

We can use the function ra_run_strat() to calculate a risk model stratified by a variable, in this case we will estimate the expected transmission events for the rest of the states in Mexico. The function ra_run_strat() requires 3 arguments:

  • M, the model file
  • Tbl, a table where each row represent a strata, and each column represent the name of strata and the input parameters.
  • nsim, number of simulations
# We will create a table to use as input
rbts <- rbt %>% 
  select(State, I1 = DogPop, I2 = HumanPop, I3 = Bites, I4 =VaccinationCov) %>% 
  mutate(
         I5 = 'Pert(0.01, 0.1, 0.15)', # Prevalence
         I6 = 'Pert(0.1, 0.3, 0.5)')  # Transmission

# Run the model for all the states
rabRR <- ra_run_strat(M = M, Tbl = rbts, nsim = 5000)

# Load the spatial shape file for the region
MxSp <- st_read(system.file("data/MxSp.shp", package = "quantrra"), quiet = T)

# Plot the risk map
MxSp %>%  # This is our spatial shape file
  left_join(rabRR, by = c('Entidad' = 'IDs')) %>%  # we use left join to join our model results
  ggplot() + # We call ggplot
  geom_sf(aes(fill = E_m)) + # We add a layer representing the polygons colored by the variable E
  scale_fill_gradient(low = 'black', high = 'red') + # set the color scale
  theme_void() # Theme of the plot

MxSp %>%  # This is our spatial shape file
  left_join(rabRR, by = c('Entidad' = 'IDs')) %>% 
  data.frame() %>% 
  arrange(desc(E_m))

In this lab we created a Model file from scratch, but there are other options if you don’t feel comfortable with the user interface of Rstudio. One option is to create the Model table in excel, save it as a CSV and then import it into R. Another option is using the RRA interactive shiny interface that comes with this package (Currently under development), you can call the interactive app by typing in the R console: quantrra::run_quantrra(), this will open a new window with a more user friendly interactive interface for creating and uploading model files.


4 Reporting the results

To submit your assignment you should include:

  1. Plots for the distribution outputs including: Probability of bite aggression, Probability of a Dog being infected, Probability of Dog-Human rabies transmission, and the expected number of transmissions in a given year.
  2. The probability that there will be more than 1 transmission events?
  3. Your opinion about the model sensitivity.