Multi-state models are used to model disease progression. The model is a continuous time Markov process. The states and time of transitions are fully observed. There are three states a patient can be in, “healthy”, “illness” and “deceased”. The possible pairs of transitions between these states include healthy -> illness, illness -> healthy, illness -> death and healthy -> death. The model can be expressed as a directed graph.
The state “Deceased” is an absorbing state, whereas the other two states are transient. This means we can’t transition away from the Deceased state.
The process is a Markov process, it has a transition kernel,
Then we can rearrange for the transition kernel,
This gives the infinitesimal transition for a very small time increment. Using this result, we can solve for the transition kernel over a finite time
Then solve the resulting differential equation to see that
$$
The non-zero elements of the rate-matrix are potential transitions. The final state is absorbing - hence we can’t transition from it.
Simulating Data from the Model
We can simulate forward using a grid of times,
- Sample the rate parameters from the prior distribution
- Construct a rate matrix from each sample
- Compute the matrix exponential and simulate forward conditional on the previous step
Forward simulation is drawing next state,
We use Gamma priors for the non-zero elements of the rate matrix:
The following function populates a rate matrix given a vector of hazards.
<- function(lambda, n_states = 3, n_absorbing = 1) {
build_rate_matrix <- matrix(rep(0, times = n_states * n_states), byrow = TRUE, nrow = n_states)
q <- 1
k for (i in 1:n_states) {
for (j in 1:n_states) {
if (i != j & i <= (n_states - n_absorbing)) {
= lambda[k]
q[i, j] = k + 1
k
}
}
}
## fix the diagonal to be negative the sum of the remaining elements in the row
diag(q) = -rowSums(q)
q }
The next function can be used to simulate the process on a grid of times
# Simulate the markov chain
<- function(lambdas, step_size, n_steps) {
sim_markov <- numeric(n_steps)
state
## Build the rate and transition matrices
<- build_rate_matrix(lambdas)
rate_matrix <- Matrix::expm(rate_matrix * step_size)
transition_matrix <- nrow(rate_matrix)
n_states
# initial state is always healthy
1] <- 1
state[
for (i in 2:n_steps) {
<- sample(x = n_states, size = 1, prob = transition_matrix[state[i-1], ])
state[i]
}tibble(time = seq_len(n_steps), state = state)
}
Simulating 1,000 trajectories for (50 * 0.1 = 5 days) transitions results in the following state transitions
# A tibble: 3 × 4
from `1` `2` `3`
<dbl> <int> <int> <int>
1 1 43000 227 1
2 2 11 5700 2
3 3 0 0 59
Instead of simulating on a fine grid we can using discrete event simulation:
- Given the initial state is
- Simulate the time to the next event,
- Simulate the next state by simulating from a distribution with pmf
- Return the sample path if we have landed in an absorbing state,
<- function(x, Q, n) {
sim_exact = numeric(n + 1)
xs = numeric(n)
ts = nrow(Q)
r = 0
t 1] <- t
ts[1] = x
xs[for (i in seq_len(n)) {
<- t + rexp(1, -Q[x, x]) # Sim time to next observation
t <- Q[x, ] # Calculate the probability of transitioning away to another state
weights <- 0 # We can't stay in the same state
weights[x] <- sample(r, 1, prob = weights) # Sample the next state
x + 1] <- x # add to vector of states
xs[i + 1] <- t # add to vector of event times
ts[i if (Q[x, x] == 0) { # If the new state is an absorbing state then return event times
return(tibble(time = ts[seq_len(i + 1)], state = xs[seq_len(i + 1)]))
}
}tibble(time = ts, state = xs) # Return event times without absorbing
}
1 2 3
[1,] 0 3019 80
[2,] 2150 0 869
The figure below shows the mean time in each state with 66% and 95% credible intervals. This is calculated by sampling 1,000 rate matrices from the prior distribution and calculating
Simulating a real-world example
In a real world example patients are observed at a different number of times,
The plot below shows patient journeys for patients 3, 11, 13 and 100. Patient 3 has only two observed state changes and censoring on the final state. Patient 11 has multiple observed transitions through illness and healthy, then a final transition to the absorbing state.
We can write down the likelihood for the
$$
The initial state is fixed at one and the transition kernel is given by the matrix exponential of the rate matrix. There is one problem with this likelihood: It does not incorporate censored paths.
A multi-state model such as this can be thought of as multiple survival models. I introduced Survival models in a previous blog post. In a survival model, the transition is governed by a hazard function. The hazard function for transitioning to state
The Survival function,
The pdf is
Let’s consider the likelihood for patient 13.
patient | time | state |
---|---|---|
13 | 0.00 | 1 |
13 | 3.93 | 2 |
13 | 8.20 | 1 |
For each state transition all other possible transitions are considered censored. The data-frame for patient 13 can be re-written by first considering a data-frame of all possible transitions and determining which transitions end in an absorbing state.
We can then create a “from” column representing the previous state of a transition, then left join the possible states using “from”. Observed states are when the “to” in both tables match, censored states are non-matching.
patient | time | to.x | from | to.y | absorbing | observed |
---|---|---|---|---|---|---|
13 | 0.00 | 1 | NA | NA | NA | NA |
13 | 3.93 | 2 | 1 | 2 | FALSE | TRUE |
13 | 3.93 | 2 | 1 | 3 | TRUE | FALSE |
13 | 8.20 | 1 | 2 | 1 | FALSE | TRUE |
13 | 8.20 | 1 | 2 | 3 | TRUE | FALSE |
This layout explicitly allows us to see which state transitions are observed and which are censored. Patient 13 starts in state 1, as do all patients. The first observation at time
The final line shows that the likelihood consists of the hazards of the observed transitions
This can be generalised:
Where
Where,
For reasons of numerical stability, the log-likelihood is used,
$$$$
Parameter Inference
The posterior distribution of the free parameters in the rate matrix is determined using a random walk Metropolis algorithm. The prior distribution for the rate parameters are independent Gamma distributions parameterised with shape,
The rate parameters are transformed to the log scale and the proposal distribution is a symmetric Normal distribution centered at the value of the previously accepted parameter at iteration
where
Calculate the total-time in each state.
state | total_time |
---|---|
1 | 2284.0808 |
2 | 189.7724 |
3 | 0.0000 |
Calculate the matrix of observed transitions. Note that no transitions occur from state 3, the absorbing state.
1 2 3
[1,] 0 107 21
[2,] 38 0 43
[3,] 0 0 0
Program the log-likelihood, prior and proposal distribution.
## rate_matrix is the constructed rate-matrix
## tau is a vector containing total time in each state
## observed_transitions is a matrix with all observed transitions
## with the state from in the rows and state to in the columns.
<- function(rate_matrix, tau, observed_transitions) {
log_likelihood <- nrow(rate_matrix)
n_states <- - diag(rate_matrix)
qi sum(log(rate_matrix - diag(diag(rate_matrix))) * observed_transitions, na.rm = TRUE) - (n_states - 1) * sum(qi * tau)
}
<- function(lambda) {
log_prior sum(dgamma(lambda, shape = 3, scale = 3 / 0.1, log = TRUE))
}
<- function(lambda) {
proposal * exp(rnorm(4, sd = 0.01))
lambda
}
<- function(lambda) {
log_posterior <- build_rate_matrix(lambda)
Q log_likelihood(Q, tau, observed_transitions) + log_prior(lambda)
}
We are ready to sample from the posterior distribution of the rate parameters using the Metropolis algorithm. We run four chains in parallel using furrr as explained in my previous post on efficient MCMC in R.
<- jonnylaw::metropolis(
iters
lambda,
log_posterior,
proposal,m = 1e5,
chains = 4,
parallel = TRUE
)
The posterior diagnostics are plotted below.
The plot below shows posterior distribution of the mean time in each state,
Most applications of multi-state survival models have patient attributes associated with each patient. This can be used to inform the next transition and the time to the next transition. I will consider this in a future post.
In addition, fully observed processes are rare. If the exact time of a state transition is not known or multiple state transitions can happen between observations then the multi-state survival model is partially observed. A continuous time Hidden Markov Model can be used in this case.