Home > Bayesian computation > Learning about a correlation

Learning about a correlation

We’re starting the chapter on Bayesian computation and I thought I’d use a new example to illustrate the basic methods.  I collect the high school GPA and the ACT math score for 50 BGSU freshmen and I’m interested in learning about the correlation.  If I standardize both measurements (by subtracting the mean and dividing by the standard deviation), then I can assume the standardized pairs (x_1, y_1), ..., (x_{50}, y_{50}) are a random sample from a bivariate normal density with 0 means, unit standard deviations, and correlation \rho.  If I assume a uniform prior on \rho, then the posterior density is given by

g(\rho) = \prod \frac{1}{\sqrt{1-\rho^2}}\exp\left(-\frac{1}{2(1-\rho^2)}(x_i^2 - 2\rho x _iy_i + y_i^2)\right)

I begin by writing a short function to compute the log posterior of \rho.  Here data is a matrix of two columns of the standardized x and standardized y.

cor.sampling=function(rho,data)
{
x=data[,1]
y=data[,2]
sum(-0.5*log(1-rho^2)-0.5/(1-rho^2)*(x^2-2*rho*x*y+y^2))
}

cor.sampling=function(rho,data)
{
x=data[,1]
y=data[,2]
sum(-0.5*log(1-rho^2)-0.5/(1-rho^2)*(x^2-2*rho*x*y+y^2))
}

I display the posterior density below.  Note that I use the sapply function to evaluate the posterior for a vector of values.  Also, I do some standardization of the posterior values so that the total area of the plotted density is equal to one.
r=seq(-.99,.99,by=.01)
gr=sapply(r,cor.sampling,dz)
gr=exp(gr-max(gr))/sum(exp(gr-max(gr)))/0.01
plot(r,gr,type=”l”,xlim=c(0,1),col=”red”,lwd=2)
cor1
We illustrate finding a normal approximation to this density by use of the laplace function in the LearnBayes package.  The arguments are the name of the log posterior, the starting guess at the mode (we use 0) and the matrix of data dz.  The output is the posterior mode and the corresponding estimate at the posterior variance.
fit=laplace(cor.sampling,0,dz)
fit
$mode
[1] 0.5015625
$var
[1,] 0.00912294
So the posterior for \rho is approximately N(0.50, \sqrt{0.0091}).  Using the curve function we add the normal approximation to the graph.  This approximation looks pretty good, but there is a little room for improvement.
curve(dnorm(x,fit$mode,sqrt(fit$var)),add=TRUE,col=”blue”,lwd=2)
legend(“topleft”,c(“Exact”,”Approximation”),col=c(“red”,”blue”),lty=c(1,1),lwd=c(2,2))
cor2
Advertisements
Categories: Bayesian computation
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: