Poisson Distribution
Consider an arbitrary interval where the expected number of events in the interval is denoted as
This is the probability mass function of the Poisson distribution and corresponds to the probability of observing exactly
Exponential Survival Model
The inter-arrival times of a Poisson process are distributed according to an exponential distribution with density
$$
Observations consist of the triple
To determine the rate parameter of the exponential survival,
This means for a censored individual, when
For the following derivation assume that the model has no covariates and hence
This can be recognised as a Gamma distribution with shape
To test this method, we will use the ovarian cancer data from the survival
package. The ovarian cancer data contains the followup time, futime
and censoring, fustat
.
# A tibble: 26 × 2
futime fustat
<dbl> <dbl>
1 59 1
2 115 1
3 156 1
4 421 0
5 431 1
6 448 0
7 464 1
8 475 1
9 477 0
10 563 1
# … with 16 more rows
The posterior distribution of the hazard for the ovarian cancer data is plotted below, where the parameters of the Gamma prior distribution are
The posterior predictive distribution can be calculated as:
And the predictive survival:
Fitting the exponential model using BRMS
In general, parametric survival models can be fitted using sampling based methods such as MCMC. The package brms can be used as a straightforward interface to stan.
To express this model using brms
, we model the followup time directly while considering censoring. The event is considered right-censored if a death has not occurred, where fustat = 1
and the formula to specify the followup time is right censored is futime | cens(fustat)
. This is followed by distributed by ~
and the covariates of interest. To compare it to the analytic result above we consider the model with only an intercept and no covariates. Note that this model is not exactly the same since the prior is on specified on the log of the rate,
<-
fit_exponential brm(
| cens(fustat) ~ 1,
futime data = ovarian,
family = brmsfamily("exponential"),
cores = 4,
prior = prior("normal(0.0, 3.0)", "Intercept")
)
summary(fit_exponential)
Family: exponential
Links: mu = log
Formula: futime | cens(fustat) ~ 1
Data: ovarian (Number of observations: 26)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 7.00 0.26 6.51 7.55 1.00 1772 1778
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The figure on the left below compares the analytic survival function for the exponential model on the left, to the sampling based survival function using 100 draws from the posterior distribution. In addition, the thick black line on the right-hand plot represents the median survival calculated from 4,000 draws from the posterior distribution