## Normal approximation to posterior

To illustrate a normal approximation to a posterior, let’s return to the fire call example described in the September 6 post. Here we had Poisson sampling with mean and a normal prior on .

1. First we write a short function lpost that computes the logarithm of the posterior. I show the expressions for the log likelihood and the log prior. On the log scale, the log posterior is the log likelihood PLUS the log prior.

lpost = function(lambda){ loglike = -6*lambda + 8*log(lambda) logprior = dnorm(lambda, 3, 1, log=TRUE) loglike + logprior }

2. I plot the normalized version of the posterior below. I first write a short function post that computes the posterior, use the integrate function to numerically integrate the density from 0 to 10, and then use the curve function to display the normalized posterior.

post = function(lambda) exp(lpost(lambda)) C = integrate(post, 0, 10) curve(exp(lpost(x))/C$value, 0, 5)

3. There is a useful function laplace in the LearnBayes package that conveniently finds the matching normal approximation to the posterior. The function inputs are (1) the function defining the log posterior, and (2) a guess at the posterior mode. Typically the log posterior might depend on data and prior parameters and that would be the last input to laplace (here we are not using that extra input).

library(LearnBayes) fit=laplace(lpost, 1) fit $mode [1] 1.7 $var [,1] [1,] 0.2653809

The important output here is (1) the mode of the posterior and (2) the corresponding approximation to the posterior variance. By looking at the output, we see that the posterior of is approximately N(1.7, ).

4. To check the accuracy of this approximation, we use the curve function to add a matching normal density. (I have also added a legend by use of the legend function.) Note that the normal approximation is pretty accurate in this particular case.

curve(dnorm(x, fit$mode, sqrt(fit$var)), add=TRUE, col="red") legend("topright", legend=c("Exact", "Normal Approximation"), lty = 1, col=c("black", "red"))

## Learning about a correlation — part IV

Currently we are introducing MCMC fitting of posterior distributions. Here we illustrate the use of one popular algorithm, the Metropolis random walk chain, to simulate from an arbitrary posterior distribution.

Let’s return to the problem of learning about a correlation from standardized data. We have programmed the logarithm of the posterior density in the function cor.sampling

cor.sampling=function(rho,data)

{

x=data[,1]

y=data[,2]

ifelse(abs(rho)<1,sum(-0.5*log(1-rho^2)-0.5/(1-rho^2)*(x^2-2*rho*x*y+y^2)),

-9.99e200)

}

Here the parameter rho falls in the interval (-1, 1), but I want to define the log posterior for all values of rho. Note that in the ifelse command, I’m checking if abs(rho) < 1 — if it isn’t, then I’m defining the log posterior to be a very large negative value (so the posterior is essentially zero outside of that interval).

How does one implement a random walk Metropolis chain for this example?

1. Recall that in this algorithm, you have a current simulated draw and one proposes a new simulated draw where

,

is a scale parameter and is a simulated draw from a normal distribution with mean 0 and standard deviation . One has to choose a value of the scale parameter and the variance .

2. Then when you implement the algorithm, you choose a starting parameter value (where the simulation begins) and the total number of simulation draws you collect.

The function rwmetrop in the LearnBayes package has inputs: (1) the name of the function that defines the log posterior, (2) a list containing values of and , (3) the starting value of the parameter in the simulation, and (4) any data used in the log posterior function.

Here I’m using the function cor.sampling, I’ll choose , start the simulation at , collect 10,000 simulation draws, and my data is stored in dz.

R1=rwmetrop(cor.sampling,list(var=.009,scale=2),.5,10000,dz)

The output is R1 — R1$par contains the simulated draws and R$accept gives the acceptance rate of the algorithm.

Here’s some advice about using this function.

1. Typically it is good to let have a standard deviation equal to twice the estimated standard deviation of the posterior. So you let be an estimated posterior variance and .

2. How does one estimate the posterior variance? One can use the normal approximation (used in the function laplace) to estimate the posterior variance.

3. Acceptance rate? A random walk Metropolis algorithm works well when the acceptance rate is in the 25 – 40% range. You can monitor this value by displaying the R1$accept value.

4. A trace plot of the simulated parameter values is helpful for seeing how well the simulated sample is exploring the posterior density. One looks for a trend in this graph (is the MCMC algorithm slow in converging to the posterior) and for a snake-like appearance (this indicates that you have poor mixing in the chain). Trace plots and other diagnostics are available as part of the coda package.

Let’s look at two runs of this algorithm — one for a good scale value and one for a poor choice of scale. The acceptance rate for the first run is 50% and the acceptance rate for the second run is 94%.

library(coda)

R1=rwmetrop(cor.sampling,list(var=.009,scale=2),.5,10000,dz)

R1$accept

traceplot(mcmc(R1$par),main=”Good choice of scale”)

R2=rwmetrop(cor.sampling,list(var=.009,scale=0.2),.5,10000,dz)

R2$accept

traceplot(mcmc(R2$par),main=”Poor choice of scale”)

## Learning about a correlation — part III

In Chapter 5, we discuss the SIR method of simulating from a posterior density . Like rejection sampling, we find a proposal density that is easy to simulate from and covers the posterior density of interest. Here’s the algorithm:

1. We simulate a sample from the proposal density .

2. We compute weights .

3. We resample from the with replacement with weights proportional to .

The resampled values are approximately from the distribution of interest .

The function sir in the LearnBayes package implements this algorithm when the proposal density is a t density with arbitrary mean, variance and degrees of freedom.

For this example, we showed that a Normal(0.50, 0.009) density was a reasonable approximation to the posterior. So we use a t proposal density with location 0.50, variance 0.009 and 4 degrees of freedom. We decide to simulate 10,000 values.

The sir function’s arguments are similar to those for rejectsampling — function that defines the log posterior, parameters of the t proposal, number of simulated draws, and the data used in the log posterior function.

R=sir(cor.sampling,list(m=.5,var=.009,df=4),10000,dz)

plot(density(R),col=”blue”,lwd=3)

## Learning about a correlation – part II

Earlier we considered the problem of learning about a correlation and illustrated a normal approximation. Here we illustrate rejection sampling to obtain a simulated sample from the exact posterior.

The main task is to find a suitable proposal density p. We’ll let p be a t density with mean , variance and small degrees of freedom .

How to choose the parameters? First, since the posterior is approximately N(0.50, 0.009), we’ll let and . Also, we’ll choose the degrees of freedom . (It is a good strategy to have a proposal density with “flat” tails which we achieve by choosing a small number of degrees of freedom.)

Next, we have to find the constant so that for all . On the log scale, we want to find such that

.

I compute a small function that computes :

dfunction=function(rho,data)

cor.sampling(rho,data)-dmt(rho,.5,.009,4,log=TRUE)

I plot this function over the range of values of from -1 to 1 — I find that

log c = -42.9927.

To see if I have found a suitable covering function, I plot the exact posterior and the covering density . It looks pretty good.

Now we can implement rejection sampling. We can use a LearnBayes function rejectsampling.

R=rejectsampling(cor.sampling,list(m=.5,var=.009,df=4), dmax=-42.9927,10000,dz)

The inputs are (1) the exact log posterior function, (2) the parameters of the t proposal density (mean, variance, df), (3) the value of log c, (4) the number of values we propose, and (5) the data used in the log posterior.

Here is a density estimate of the simulated draws we get — this is very close to the exact posterior density.

plot(density(R),col=”orange”,lwd=3)

## Normal approximation to posteriors

Today we talked about normal approximations to posterior distributions. As an example, I observe the daily hits on my book’s website during each of the days one recent month. I collect the following counts for the weekdays (Monday through Friday)

20,30,22,20,20,17,21,26,22,30,36,15,30,27,22,23,18,24,28,23,12

and I collect the following counts for the weekends (Saturday and Sunday)

7,12,11,12,12,17,18,20,17

We assume that the counts are independent where is Poisson(), where the satisfy the log linear model

,

where is 0 (1) if the day is on a weekday (weekend). I place a uniform prior on .

I program the log posterior in the function loglinearpost — the data matrix has two columns, where the first column is the values of WEEKEND and the second column has the counts.

loglinearpost=function (beta, data)

{

x = data[, 1]

y = data[, 2]

logf = function(x, y, beta) {

lp = beta[1] + beta[2] * x

dpois(y,exp(lp),log=TRUE)}

sum(logf(x,y, beta))

}

24,28,23,12)

data=cbind(x,y)

fit

$mode

[1] 2.6390946 0.5025815

[,1] [,2]

[1,] 0.00793621 -0.007936210

xlab=”BETA0″,ylab=”BETA1″)

mydnorm=function(y,d)

dmnorm(y,d$mu,d$sigma,log=TRUE)

mycontour(mydnorm,c(2.2,3,.1,.9),

list(mu=fit$mode,sigma=fit$var),col=”blue”,add=TRUE)

## Learning about a correlation

We’re starting the chapter on Bayesian computation and I thought I’d use a new example to illustrate the basic methods. I collect the high school GPA and the ACT math score for 50 BGSU freshmen and I’m interested in learning about the correlation. If I standardize both measurements (by subtracting the mean and dividing by the standard deviation), then I can assume the standardized pairs are a random sample from a bivariate normal density with 0 means, unit standard deviations, and correlation . If I assume a uniform prior on , then the posterior density is given by

I begin by writing a short function to compute the log posterior of . Here data is a matrix of two columns of the standardized x and standardized y.

cor.sampling=function(rho,data)

{

x=data[,1]

y=data[,2]

sum(-0.5*log(1-rho^2)-0.5/(1-rho^2)*(x^2-2*rho*x*y+y^2))

}

gr=sapply(r,cor.sampling,dz)

gr=exp(gr-max(gr))/sum(exp(gr-max(gr)))/0.01

plot(r,gr,type=”l”,xlim=c(0,1),col=”red”,lwd=2)

fit

[1] 0.5015625

[1,] 0.00912294

legend(“topleft”,c(“Exact”,”Approximation”),col=c(“red”,”blue”),lty=c(1,1),lwd=c(2,2))

## Estimating a log odds ratio

We were looking at the relationship between a student’s ACT score (lo or hi) and his GPA (lo or hi). We constructed a prior on and , where is the log odds ratio and is the sum of the log odds. We assumed that was N(0.5, 0.3), was N(0, 2), and the two parameters were independent.

We now observe some data for a sample of students:

GPA <= 3.0 GPA > 3.0

ACT <=21 29 24

ACT > 21 15 34

The likelihood function is given by

.

Here is a computing strategy to summarize the posterior density.

1. We write a R function to compute the log posterior density of .

2. We use the function mycontour to compute the log posterior density on a grid.

3. Using the function simcontour, we simulate values of from the posterior density.

4. We learn about the log odds ratio by summarizing the sample of values.

Here is the R work. I write a function, lctablepost, that has two inputs, the vector theta (), and a list s that has two components, data (a vector of table counts) and prior (a vector of the means and standard deviations of the two normal priors). Note that I use a special function ilogit that computes the inverse logit of a probability.

lctablepost=function (theta, s)

{

ilogit=function(x) exp(x)/(1+exp(x))

logitp1 = (theta[1] + theta[2])/2

logitp2 = (theta[2] – theta[1])/2

s$data[1]*ilogit(logitp1)+s$data[2]*(1-ilogit(logitp1))+

s$data[3]*ilogit(logitp2)+s$data[4]*(1-ilogit(logitp2))+

dnorm(theta[1],s$prior[1],s$prior[2],log=TRUE) +

dnorm(theta[2],s$prior[3],s$prior[4],log=TRUE)

}

We define the data and the prior inputs.

d=c(29,24,15,34)

p=c(0.5,0.3,0,2)

stuff=list(data=d,prior=p)

Now we use the LearnBayes function mycontour — the bounding rectangle of (-1, 2, -9, 3) seems to work well.

mycontour(lctablepost,c(-1,2,-9,3),stuff,

xlab=”THETA1″,ylab=”THETA2″,col=”red”)

Last, I simulate 1000 values from this grid by the LearnBayes function simcontour. The output is a list s with components x and y — so s$x are the simulated draws of and s$y are the simulated draws of .

s=simcontour(lctablepost,c(-1,2,-9,3),stuff,1000)

I show the simulated sample on my contour by use of the points function.

points(s)