Bayesian Inference for an SIR Model

Johns Hopkins University have put together a repository containing confirmed cases of COVID19, deaths and recovered patients. Below we plot the confirmed cases, confirmed recovered and deaths.

SIR Model

The system of ordinary differential equations (ODE) for the Susceptible Infected Recovered (SIR) model is given by

\[\begin{align} & \frac{dS}{dt} = - \frac{\beta I S}{N}, \\ & \frac{dI}{dt} = \frac{\beta I S}{N}- \gamma I, \\ & \frac{dR}{dt} = \gamma I,\\ & N = S + I + R. \end{align}\]

Where \(S\) is the number of susceptible, \(I\) the total infected and \(R\) the total recovered. \(\gamma\) is the recovery rate (\(1/\gamma\) is the infectious period), \(\beta\) is the infection rate (\(1/\beta\) is the time between contacts). These parameters are unobserved.

We can Use deSolve to solve the ODE system startin with an initial state of 66.4 million people susceptible and one infected. This produces a simulation conditional on the parameters chosen. The parameters can change depending on each countries reaction to the virus. For instance the infection rate can be lowered by quarantine or social distancing, thus reducing the contact rate \(\beta\).

parameters <- c(beta = 0.5, gamma = 1/4.5)
initial_state <- c(S = 66.4e6, I = 1, R = 0) 

sir <- function(t, state, parameters) {
  with(as.list(c(state, parameters)), {
    N <- sum(S, I, R)
    dS = -beta * S * I / N
    dI = beta * S * I / N - gamma * I
    dR = gamma * I
    
    list(c(dS, dI, dR))
  })  
}

times <- seq_len(100)
out <- ode(y = initial_state, times = times, func = sir, parms = parameters)

Consider the model in this pre-print from Lourenco et al 2020, which has since been criticised in this response. We observe the cumulative deaths

\[\Lambda_t = \rho\eta R_{t-\psi},\]

where \(\rho\) is the proportion of the population at risk of severe disease, \(\eta\) is the probability of dying with the severe disease. \(R_{t-\psi}\) is the removed population with a delay between the time of infection represented by \(\psi\). The parameters are given prior distributions in the paper, we can simulate multiple trajectories of the cumulative deaths by first simulating from the prior distribution then solving the SIR system. The prior distributions as given in the paper are

\[\begin{aligned} \frac{1}{\gamma} &\sim \mathcal{N}(4.5, 1^2), \\ \psi &\sim \mathcal{n}(17, 2^2), \\ R_0 &\sim \mathcal{n}(2.25, 0.2^2), \\ \eta &\sim \mathcal{n}(0.14, 0.007^2), \\ \rho &\sim \text{Gamma}(5, 5/0.01). \end{aligned}\]

There is also a parameter for the time of introduction relative to the time of the first reported case, \(\tau\). It has a strange prior distribution, being uniform from \(-\infty\) to \(\infty\). Obviously this parameter can not be positive, since a confirmed case indicates that the time of introduction is in the past.

We can simulate more times from the prior and calculate the empirical intervals instead of plotting raw trajectories. The initial state is \(S = 66.44 \times 10^6\), \(I = 1\) and \(R = 0\). The initial time is taken to be one week before the first confirmed case in the UK.

Inference method

The inference method is explained clearly in Lourenço et al (2017), albeit with a different model. The Metropolis algorithm with a symmetric random walk proposal distribution is used, the likelihood is the product of independent Poisson distributions

\[\mathcal{L}(y_{1:T}|\Lambda_{1:T}, \theta^\star) = \prod_{t=1}^T\left(\textrm{Poisson}(y_t;\Lambda_t)\right)\]

The steps to perform inference for the static parameters are summarised below

  1. Propose new (log) parameters from a symmetric Normal distribution \(\log(\theta^\star) \sim \mathcal{N}(\log\theta^\star\mid\log\theta, \sigma I_d)\)
  2. Solve the ODE using the deSolve package using the proposed parameters
  3. Calculate the un-normalised log-posterior \(\log p(\Lambda_{1:T}, \theta^\star\mid y_{1:T}) = \log p(\theta) +\sum_{t=1}^T\log\left(\textrm{Poisson}(y_t\mid\Lambda_t)\right)\)
  4. Accept the new parameters with probability \(\alpha\) where

\[\alpha = \min\left(1, \frac{p(\Lambda_{1:T}, \theta^\star\mid y_{1:T})}{p(\Lambda_{1:T}, \theta\mid y_{1:T})}\right).\]

First we specify the likelihood in R.

log_likelihood_sir <- function(parameters, ys, initial_state) {
  initial_state <- c(S = 60e6, I = 1, R = 0) # 
  
  # Transition function
  sir <- function(t, state, parameters) {
    beta <- parameters[1] * parameters[2]
    gamma <- parameters[2]
    
    with(as.list(state), {
      N <- sum(S, I, R)
      dS = -beta * S * I / N
      dI = beta * S * I / N - gamma * I
      dR = gamma * I
      
      list(c(dS, dI, dR))
    })  
  }
  
  sir_sim <-
    deSolve::ode(
      y = initial_state,
      times = seq_along(ys),
      func = sir,
      parms = parameters
    )
  
  cumulative_deaths <- function(t, R, parameters) {
    R[max(1, t - parameters[5])] * parameters[3] * parameters[4]
  }
  
  lambdas <- purrr::map_dbl(sir_sim[, 1], ~ cumulative_deaths(t = .x, sir_sim[, 4], parameters))
  
  ll <- sum(dpois(x = ys, lambda = lambdas, log = TRUE))
  
  if_else(is.nan(ll) || is.na(ll), -Inf, ll)
}

Then we specify the prior distributions.

log_prior <- function(parameters) {
  r0 = parameters[1]; gamma = parameters[2]; rho = parameters[3]
  eta = parameters[4]; psi = parameters[5]
  
  dnorm(1/gamma, mean = 4.5, sd = 1, log = TRUE) +
    dnorm(psi, mean = 17, sd = 2, log = TRUE) +
    dnorm(r0, mean = 2.25, sd = 0.2, log = TRUE) +
    dnorm(eta, mean = 0.14, sd = 0.007, log = TRUE) +
    dgamma(rho, shape = 5, rate = 5/0.01, log = TRUE)
}

proposal <- function(p) {
  p * exp(rnorm(5, sd = c(0.02, 0.02, 0.02, 0.02, 0.05)))
}

initial_parameters <- c(r0 = 2.25, gamma = 1/4.5, rho = 0.01, eta = 0.14, psi = 17)
ys <- uk %>% pull(deaths)

We initialise the parameters at the mean of the prior distributions and simulate 1 million iterations from the Metropolis algorithm. The first half are discarded and every 100th iteration is retained in an attempt to reduce auto-correlation in the chain.

iters <-
  jonnylaw::metropolis(
    theta = initial_parameters,
    function(p) log_likelihood_sir(ys = ys, parameters = p) + log_prior(p),
    proposal = proposal,
    1e6, 
    chains = 2, 
    parallel = TRUE
  )
write_rds(iters, path = here::here("models/sir.Rds"))
iters <- read_rds(here::here("models/sir.Rds"))

I have cached the results of the chunk running the Metropolis algorithm. The results can be loaded using lazyLoad - this is useful for long-running Rmarkdown chunks.

lazyLoad(here::here("content/blog/bayesian-inference-for-an-sir-model_cache/html/metropolis-samples_4388ff8bda2e17c9d25302cfbbb94be8"))

The next plot shows both the prior and posterior distribution. The prior is the solid black line and the posterior samples are plotted in a histogram. Most of the prior distributions are narrow and hence have resulted in little change. The posterior mean of the infectious period, \(1/gamma\) is approximately 4 days, down from the prior mean of 4.5. The posterior mean of \(\psi\) is, 13.5, 3.5 days shorter than the prior mean.

Now plot the posterior cumulative death curve.

Let’s simulate forward 100 days from the hypothesised start date, 2020-01-26 using the posterior distribution.