Home > Bayesian computation > Learning about a correlation

We’re starting the chapter on Bayesian computation and I thought I’d use a new example to illustrate the basic methods.  I collect the high school GPA and the ACT math score for 50 BGSU freshmen and I’m interested in learning about the correlation.  If I standardize both measurements (by subtracting the mean and dividing by the standard deviation), then I can assume the standardized pairs $(x_1, y_1), ..., (x_{50}, y_{50})$ are a random sample from a bivariate normal density with 0 means, unit standard deviations, and correlation $\rho$.  If I assume a uniform prior on $\rho$, then the posterior density is given by

$g(\rho) = \prod \frac{1}{\sqrt{1-\rho^2}}\exp\left(-\frac{1}{2(1-\rho^2)}(x_i^2 - 2\rho x _iy_i + y_i^2)\right)$

I begin by writing a short function to compute the log posterior of $\rho$.  Here data is a matrix of two columns of the standardized x and standardized y.

cor.sampling=function(rho,data)
{
x=data[,1]
y=data[,2]
sum(-0.5*log(1-rho^2)-0.5/(1-rho^2)*(x^2-2*rho*x*y+y^2))
}

cor.sampling=function(rho,data)
{
x=data[,1]
y=data[,2]
sum(-0.5*log(1-rho^2)-0.5/(1-rho^2)*(x^2-2*rho*x*y+y^2))
}

I display the posterior density below.  Note that I use the sapply function to evaluate the posterior for a vector of values.  Also, I do some standardization of the posterior values so that the total area of the plotted density is equal to one.
r=seq(-.99,.99,by=.01)
gr=sapply(r,cor.sampling,dz)
gr=exp(gr-max(gr))/sum(exp(gr-max(gr)))/0.01
plot(r,gr,type=”l”,xlim=c(0,1),col=”red”,lwd=2)
We illustrate finding a normal approximation to this density by use of the laplace function in the LearnBayes package.  The arguments are the name of the log posterior, the starting guess at the mode (we use 0) and the matrix of data dz.  The output is the posterior mode and the corresponding estimate at the posterior variance.
fit=laplace(cor.sampling,0,dz)
fit
$mode [1] 0.5015625$var
[1,] 0.00912294
So the posterior for $\rho$ is approximately N(0.50, $\sqrt{0.0091}$).  Using the curve function we add the normal approximation to the graph.  This approximation looks pretty good, but there is a little room for improvement.
curve(dnorm(x,fit$mode,sqrt(fit$var)),add=TRUE,col=”blue”,lwd=2)
legend(“topleft”,c(“Exact”,”Approximation”),col=c(“red”,”blue”),lty=c(1,1),lwd=c(2,2))