This post considers how to implement a simple Metropolis scheme to determine the parameter posterior distribution of a bivariate Normal distribution. The implementation is generic, using higher-order-functions and hence can be re-used with new algorithms by specifying the un-normalised log-posterior density and a proposal distribution for the parameters. The built-in `parallel`

package is used fit multiple chains in parallel, finally the Metropolis algorithm is reimplemented in C++ using Rcpp which seemlessly integrates with R.

```
bivariate_normal <- function(theta, n) {
mu1 <- theta[1]
sigma1 <- theta[2]
mu2 <- theta[3]
sigma2 <- theta[4]
x <- rnorm(n / 2, mean = mu1, sd = sigma1)
y <- rnorm(n / 2, mean = mu2, sd = sigma2)
tibble(x, y)
}
theta <- c(5.0, 0.5, 2.0, 1.5)
sims <- bivariate_normal(theta, 1000)
xs <- as.matrix(sims)
ggplot(sims, aes(x, y)) +
geom_point()
```

The random variables in the model can be written as:

\[p(y,\mu, \Sigma) = p(\mu)p(\Sigma)\prod_{i=1}^N\mathcal{N}(y_i;\mu, \Sigma)\]

The covariance matrix is diagonal, hence the log-likelihood can be written as the sum of two univariate normal distributions:

\[\log p(y|\mu, \Sigma) = \sum_{j=1}^2\left(-\frac{N}{2}\log(2\pi\sigma_j^2) - \frac{1}{2\sigma_j^2}\sum_{i=1}^N(y_{ij}-\mu_j)^2\right)\]

```
log_likelihood <- function(xs, theta) {
apply(xs, 1, function(x) dnorm(x[1], mean = theta[1], sd = theta[2], log = T) +
dnorm(x[2], mean = theta[3], sd = theta[4], log = T)) %>% sum()
}
```

The prior distributions are chosen to be:

\[\begin{align} p(\mu_j) &= \mathcal{N}(0, 3), \\ p(\Sigma_{jj}) &= \textrm{Gamma}(3, 3), \quad j = 1, 2. \end{align}\]

```
log_prior <- function(theta) {
dnorm(theta[1], log = T) +
dnorm(theta[3], log = T) +
dgamma(theta[2], shape = 3.0, rate = 3.0, log = T) +
dgamma(theta[4], shape = 3.0, rate = 3.0, log = T)
}
log_posterior <- function(xs)
function(theta)
log_likelihood(xs, theta) + log_prior(theta)
```

A Metropolis-Hastings algorithm can be used to determine the posterior distribution of the parameters, \(theta = \{\mu, \Sigma\}\). The Metropolis algorithm constructs a Markov chain whose stationary distribution corresponds to the target posterior distribution, \(p(\theta|y)\). In order to construct the Markov chain with this property, a carefully chosen tansition function \(P(\theta^\prime|\theta)\) is used. In order to prove the Metropolis algorithm has the target distribution as its stationary distribution, the existence and uniqueness of the stationary distribution must be determined. A transition function which satisfies detailed balance is chosen which is a sufficient condition for the existence of the stationary distribution:

\[P(\theta^\prime|\theta)p(\theta|y) = P(\theta|\theta^\prime)p(\theta^\prime|y)\]

The Markov chain proceeds by proposing a new value of the parameters, \(\theta^\prime\) from a distribution which can be easily simulated from (typically a Normal distribution centred at the previously accepted value of the parameter, \(\theta\)), \(q(\theta^\prime|\theta)\). The transition function is the product of the proposal distribution and the acceptance ratio. The acceptance ration which satisfies detailed balance is called the Metropolis choice:

\[A = \operatorname{min}\left(1, \frac{p(\theta^\prime|y)q(\theta|\theta^\prime)}{p(\theta|y)q(\theta^\prime|\theta)}\right).\]

First of a single step of the Metropolis algorithm is implementated. This is a higher order function, since two of the arguments are functions themselves. The function `log_posterior`

is a function from parameters to log-likelihood and the `proposal`

is a symmetric proposal distribution for the parameters, a function from parameters to parameters. The final argument, `theta`

represents the parameters.

```
metropolis_step <- function(theta, log_posterior, proposal) {
propTheta <- proposal(theta)
a <- log_posterior(propTheta) - log_posterior(theta)
u <- runif(1)
if (log(u) < a) {
propTheta
} else {
theta
}
}
```

Next the step function can be used in a for loop to generate m samples, each dependent on the previous step. An matrix containing \(m\) rows is initialised to contain each iteration of the Metropolis algorithm.

```
metropolis <- function(theta, log_posterior, proposal, m) {
out = matrix(NA_real_, nrow = m, ncol = length(theta))
out[1, ] = theta
for (i in 2:m) {
out[i, ] <- metropolis_step(out[i-1, ], log_posterior, proposal)
}
out
}
```

The strictly positive variance parameters are proposed on the log-scale:

```
proposal <- function(x) {
z = rnorm(4, sd = 0.05)
c(x[1] + z[1], x[2] * exp(z[2]),
x[3] + z[3], x[4] * exp(z[4]))
}
```

Finally, all the components are there to sample from the posterior distribution of the parameters. The mean of the sampled posterior distribution should coincide with the parameters used to simulate the data. In the figure below the actual values used to simulate the data are plotted with dashed lines.

```
out = metropolis(theta, log_posterior(xs), proposal, 10000)
```

Typically, multiple chains are run in parallel, a straightforward way to do this in R is to use a parallel map from the furrr package. First we create a new function which alters the `metropolis`

function to return a dataframe:

```
metropolis_df <- function(theta, log_posterior, proposal, m, parameter_names) {
function(x) {
mat <- metropolis(theta, log_posterior, proposal, m)
colnames(mat) <- parameter_names
as.data.frame(mat)
}
}
```

Then `future_map_dfr`

is used which performs the function `.f`

for each element of `.x`

. It then rowbinds into a dataframe. This is explicit in the function name, the suffix `_dfr`

meaning a dataframe is the return type and is created by rowbinding the results. The `id`

of each function run is provided by the `.id`

column and takes on the values of `.x`

.

```
plan(multiprocess)
mh_samples <- future_map_dfr(
.x = 1:2,
.f = metropolis_df(theta, log_posterior(xs), proposal, 10000, actual_values$parameter),
.id = "chain"
)
```

The figure below shows the trace plots and marginal densities from 10,000 draws of the parallel Metropolis hastings algorithm.