## Learning about a correlation — part III

In Chapter 5, we discuss the SIR method of simulating from a posterior density . Like rejection sampling, we find a proposal density that is easy to simulate from and covers the posterior density of interest. Here’s the algorithm:

1. We simulate a sample from the proposal density .

2. We compute weights .

3. We resample from the with replacement with weights proportional to .

The resampled values are approximately from the distribution of interest .

The function sir in the LearnBayes package implements this algorithm when the proposal density is a t density with arbitrary mean, variance and degrees of freedom.

For this example, we showed that a Normal(0.50, 0.009) density was a reasonable approximation to the posterior. So we use a t proposal density with location 0.50, variance 0.009 and 4 degrees of freedom. We decide to simulate 10,000 values.

The sir function’s arguments are similar to those for rejectsampling — function that defines the log posterior, parameters of the t proposal, number of simulated draws, and the data used in the log posterior function.

R=sir(cor.sampling,list(m=.5,var=.009,df=4),10000,dz)

plot(density(R),col=”blue”,lwd=3)

## Learning about a correlation – part II

Earlier we considered the problem of learning about a correlation and illustrated a normal approximation. Here we illustrate rejection sampling to obtain a simulated sample from the exact posterior.

The main task is to find a suitable proposal density p. We’ll let p be a t density with mean , variance and small degrees of freedom .

How to choose the parameters? First, since the posterior is approximately N(0.50, 0.009), we’ll let and . Also, we’ll choose the degrees of freedom . (It is a good strategy to have a proposal density with “flat” tails which we achieve by choosing a small number of degrees of freedom.)

Next, we have to find the constant so that for all . On the log scale, we want to find such that

.

I compute a small function that computes :

dfunction=function(rho,data)

cor.sampling(rho,data)-dmt(rho,.5,.009,4,log=TRUE)

I plot this function over the range of values of from -1 to 1 — I find that

log c = -42.9927.

To see if I have found a suitable covering function, I plot the exact posterior and the covering density . It looks pretty good.

Now we can implement rejection sampling. We can use a LearnBayes function rejectsampling.

R=rejectsampling(cor.sampling,list(m=.5,var=.009,df=4), dmax=-42.9927,10000,dz)

The inputs are (1) the exact log posterior function, (2) the parameters of the t proposal density (mean, variance, df), (3) the value of log c, (4) the number of values we propose, and (5) the data used in the log posterior.

Here is a density estimate of the simulated draws we get — this is very close to the exact posterior density.

plot(density(R),col=”orange”,lwd=3)

## Normal approximation to posteriors

Today we talked about normal approximations to posterior distributions. As an example, I observe the daily hits on my book’s website during each of the days one recent month. I collect the following counts for the weekdays (Monday through Friday)

20,30,22,20,20,17,21,26,22,30,36,15,30,27,22,23,18,24,28,23,12

and I collect the following counts for the weekends (Saturday and Sunday)

7,12,11,12,12,17,18,20,17

We assume that the counts are independent where is Poisson(), where the satisfy the log linear model

,

where is 0 (1) if the day is on a weekday (weekend). I place a uniform prior on .

I program the log posterior in the function loglinearpost — the data matrix has two columns, where the first column is the values of WEEKEND and the second column has the counts.

loglinearpost=function (beta, data)

{

x = data[, 1]

y = data[, 2]

logf = function(x, y, beta) {

lp = beta[1] + beta[2] * x

dpois(y,exp(lp),log=TRUE)}

sum(logf(x,y, beta))

}

24,28,23,12)

data=cbind(x,y)

fit

$mode

[1] 2.6390946 0.5025815

[,1] [,2]

[1,] 0.00793621 -0.007936210

xlab=”BETA0″,ylab=”BETA1″)

mydnorm=function(y,d)

dmnorm(y,d$mu,d$sigma,log=TRUE)

mycontour(mydnorm,c(2.2,3,.1,.9),

list(mu=fit$mode,sigma=fit$var),col=”blue”,add=TRUE)

## Learning about a correlation

We’re starting the chapter on Bayesian computation and I thought I’d use a new example to illustrate the basic methods. I collect the high school GPA and the ACT math score for 50 BGSU freshmen and I’m interested in learning about the correlation. If I standardize both measurements (by subtracting the mean and dividing by the standard deviation), then I can assume the standardized pairs are a random sample from a bivariate normal density with 0 means, unit standard deviations, and correlation . If I assume a uniform prior on , then the posterior density is given by

I begin by writing a short function to compute the log posterior of . Here data is a matrix of two columns of the standardized x and standardized y.

cor.sampling=function(rho,data)

{

x=data[,1]

y=data[,2]

sum(-0.5*log(1-rho^2)-0.5/(1-rho^2)*(x^2-2*rho*x*y+y^2))

}

gr=sapply(r,cor.sampling,dz)

gr=exp(gr-max(gr))/sum(exp(gr-max(gr)))/0.01

plot(r,gr,type=”l”,xlim=c(0,1),col=”red”,lwd=2)

fit

[1] 0.5015625

[1,] 0.00912294

legend(“topleft”,c(“Exact”,”Approximation”),col=c(“red”,”blue”),lty=c(1,1),lwd=c(2,2))