Archive

Archive for the ‘Bayesian computation’ Category

Normal approximation to posterior

September 22, 2011 Leave a comment

To illustrate a normal approximation to a posterior, let’s return to the fire call example described in the September 6 post.  Here we had Poisson sampling with mean \lambda and a normal prior on \lambda.

1.  First we write a short function lpost that computes the logarithm of the posterior.  I show the expressions for the log likelihood and the log prior.  On the log scale, the log posterior is the log likelihood PLUS the log prior.

lpost = function(lambda){
  loglike = -6*lambda + 8*log(lambda)
  logprior = dnorm(lambda, 3, 1, log=TRUE)
  loglike + logprior
}

2.  I plot the normalized version of the posterior below.  I first write a short function post that computes the posterior, use the integrate function to numerically integrate the density from 0 to 10, and then use the curve function to display the normalized posterior.

post = function(lambda) exp(lpost(lambda))
C = integrate(post, 0, 10)
curve(exp(lpost(x))/C$value, 0, 5)

3.  There is a useful function laplace in the LearnBayes package that conveniently finds the matching normal approximation to the posterior.  The function inputs are (1) the function defining the log posterior, and (2) a guess at the posterior mode.  Typically the log posterior might depend on data and prior parameters and that would be the last input to laplace (here we are not using that extra input).

library(LearnBayes)
fit=laplace(lpost, 1)
fit
$mode
[1] 1.7
$var
          [,1]
[1,] 0.2653809

The important output here is (1) the mode of the posterior and (2) the corresponding approximation to the posterior variance.  By looking at the output, we see that the posterior of \lambda is approximately N(1.7, \sqrt{0.265}).

4.  To check the accuracy of this approximation, we use the curve function to add a matching normal density.  (I have also added a legend by use of the legend function.)  Note that the normal approximation is pretty accurate in this particular case.

curve(dnorm(x, fit$mode, sqrt(fit$var)), add=TRUE, col="red")
legend("topright", legend=c("Exact", "Normal Approximation"),
       lty = 1, col=c("black", "red"))

 

Advertisements
Categories: Bayesian computation

Learning about a correlation — part IV

October 19, 2009 Leave a comment

Currently we are introducing MCMC fitting of posterior distributions.  Here we illustrate the use of one popular algorithm, the Metropolis random walk chain, to simulate from an arbitrary posterior distribution.

Let’s return to the problem of learning about a correlation from standardized data.  We have programmed the logarithm of the posterior density in the function cor.sampling

cor.sampling=function(rho,data)
{
x=data[,1]
y=data[,2]
ifelse(abs(rho)<1,sum(-0.5*log(1-rho^2)-0.5/(1-rho^2)*(x^2-2*rho*x*y+y^2)),
-9.99e200)
}

Here the parameter rho falls in the interval (-1, 1), but I want to define the log posterior for all values of rho.  Note that in the ifelse command, I’m checking if abs(rho) < 1 — if it isn’t, then I’m defining the log posterior to be a very large negative value (so the posterior is essentially zero outside of that interval).

How does one implement a random walk Metropolis chain for this example?

1.  Recall that in this algorithm, you have a current simulated draw \rho^{i} and one proposes a new simulated draw \rho^C where

\rho^C = \rho^i + c Z,

c is a scale parameter and Z is a  simulated draw from a normal distribution with mean 0 and standard deviation \sqrt{V} .  One has to choose a value of the scale parameter c and the variance V.

2.  Then when you implement the algorithm, you choose a starting parameter value (where the simulation begins) and the total number of simulation draws you collect.

The function rwmetrop in the LearnBayes package has inputs:  (1)  the name of the function that defines the log posterior, (2) a list containing values of V and c, (3) the starting value of the parameter in the simulation, and (4) any data used in the log posterior function.

Here I’m using the function cor.sampling, I’ll choose V = 0.009, c = 2, start the simulation at \rho = 0.5, collect 10,000 simulation draws, and my data is stored in dz.

R1=rwmetrop(cor.sampling,list(var=.009,scale=2),.5,10000,dz)

The output is R1 — R1$par contains the simulated draws and R$accept gives the acceptance rate of the algorithm.

Here’s some advice about using this function.

1.  Typically it is good to let Z have a standard deviation equal to twice the estimated standard deviation of the posterior.  So you let var be an estimated posterior variance and scale = 2.

2.  How does one estimate the posterior variance?  One can use the normal approximation (used in the function laplace) to estimate the posterior variance.

3.  Acceptance rate?  A random walk Metropolis algorithm works well when the acceptance rate is in the 25 – 40% range.  You can monitor this value by displaying the R1$accept value.

4.  A trace plot of the simulated parameter values is helpful for seeing how well the simulated sample is exploring the posterior density.  One looks for a trend in this graph (is the MCMC algorithm slow in converging to the posterior) and for a snake-like appearance (this indicates that you have poor mixing in the chain).   Trace plots and other diagnostics are available as part of the coda package.

Let’s look at two runs of this algorithm — one for a good scale value and one for a poor choice of scale.  The acceptance rate for the first run is 50% and the acceptance rate for the second run is 94%.

library(coda)
R1=rwmetrop(cor.sampling,list(var=.009,scale=2),.5,10000,dz)
R1$accept
traceplot(mcmc(R1$par),main=”Good choice of scale”)

R2=rwmetrop(cor.sampling,list(var=.009,scale=0.2),.5,10000,dz)
R2$accept
traceplot(mcmc(R2$par),main=”Poor choice of scale”)

Rplot1

Rplot2

Categories: Bayesian computation

Learning about a correlation — part III

October 8, 2009 Leave a comment

In Chapter 5, we discuss the SIR method of simulating from a posterior density g(\theta).  Like rejection sampling, we find a proposal density p(\theta) that is easy to simulate from and covers the posterior density of interest.  Here’s the algorithm:

1.  We simulate a sample \{\theta_j\}  from the proposal density p(\theta).

2.  We compute weights \{w(\theta_j) = g(\theta_j)/p(\theta_j)\}.

3.  We resample from the \{\theta_j\} with replacement with weights proportional to \{w(\theta_j)\}.

The resampled values are approximately from the distribution of interest g(\theta_j).

The function sir in the LearnBayes package implements this algorithm when the proposal density is a t density with arbitrary mean, variance and degrees of freedom.

For this example, we showed that a Normal(0.50, 0.009) density was a reasonable approximation to the posterior.  So we use a t proposal density with location 0.50, variance 0.009 and 4 degrees of freedom.  We decide to simulate 10,000 values.

The sir function’s arguments are similar to those for rejectsampling — function that defines the log posterior, parameters of the t proposal, number of simulated draws, and the data used in the log posterior function.

R=sir(cor.sampling,list(m=.5,var=.009,df=4),10000,dz)
plot(density(R),col=”blue”,lwd=3)

R=sir(cor.sampling,list(m=.5,var=.009,df=4),10000,dz)

The output is a vector of 10,000 values.  We construct a density plot of the draws and it resembles the display in the previous posting.

plot(density(R),col=”blue”,lwd=3)

sir

Categories: Bayesian computation

Learning about a correlation – part II

October 6, 2009 Leave a comment

Earlier we considered the problem of learning about a correlation and illustrated a normal approximation.  Here we illustrate rejection sampling to obtain a simulated sample from the exact posterior.

The main task is to find a suitable proposal density p.  We’ll let p be a t density with mean \mu, variance \sigma^2 and small degrees of freedom \nu.

How to choose the parameters?  First, since the posterior is approximately N(0.50, 0.009), we’ll let \mu = 0.50 and \sigma^2 = 0.009.  Also, we’ll choose the degrees of freedom \nu = 4.  (It is a good strategy to have a proposal density with “flat” tails which we achieve by choosing a small number of degrees of freedom.)

Next, we have to find the constant c so that g(\rho | y) \le c p(\rho) for all \rho.  On the log scale, we want to find c such that

\log g(\rho|y) \le \log c + \log p(\rho).

I compute a small function that computes \log g(\rho | y) - \log p(\rho):

dfunction=function(rho,data)
cor.sampling(rho,data)-dmt(rho,.5,.009,4,log=TRUE)

dfunction=function(rho,data)
cor.sampling(rho,data)-dmt(rho,.5,.009,4,log=TRUE)

I plot this function over the range of values of \rho from -1 to 1 — I find that

log c = -42.9927.

To see if I have found a suitable covering function, I plot the exact posterior g(\rho | y) and the covering density c p(\rho).  It looks pretty good.

covering

Now we can implement rejection sampling.   We can use a LearnBayes function rejectsampling.

R=rejectsampling(cor.sampling,list(m=.5,var=.009,df=4), dmax=-42.9927,10000,dz)

The inputs are (1) the exact log posterior function, (2) the parameters of the t proposal density (mean, variance, df), (3) the value of log c, (4) the number of values we propose, and (5) the data used in the log posterior.

Here is a density estimate of the simulated draws we get — this is very close to the exact posterior density.

plot(density(R),col=”orange”,lwd=3)

density
Categories: Bayesian computation

Normal approximation to posteriors

October 6, 2009 Leave a comment

Today we talked about normal approximations to posterior distributions.  As an example, I observe the daily hits on my book’s website during each of the days one recent month.  I collect the following counts for the weekdays (Monday through Friday)

20,30,22,20,20,17,21,26,22,30,36,15,30,27,22,23,18,
24,28,23,12

20,30,22,20,20,17,21,26,22,30,36,15,30,27,22,23,18,24,28,23,12

and I collect the following counts for the weekends (Saturday and Sunday)

7,12,11,12,12,17,18,20,17

We assume that the counts y_1, ..., y_{30} are independent where y_i is Poisson(\lambda_i), where the \lambda_i satisfy the log linear model

\log \lambda_i = \beta_0 + \beta_1 WEEKEND_i,

where WEEKEND_i is 0 (1) if the day is on a weekday (weekend).  I place a uniform prior on (\beta_0, \beta_1).

I program the log posterior in the function loglinearpost — the data matrix has two columns, where the first column is the values of WEEKEND and the second column has the counts.

loglinearpost=function (beta, data)
{
x = data[, 1]
y = data[, 2]
logf = function(x,  y, beta) {
lp = beta[1] + beta[2] * x
dpois(y,exp(lp),log=TRUE)}
sum(logf(x,y, beta))
}

I put the counts and the weekend indicators in the vectors y and x and form the data matrix.
y=c(7,12,11,12,12,17,18,20,1720,30,22,20,20,17,21,26,22,30,36,15,30,27,22,23,18,
24,28,23,12)
x=c(rep(0,9),rep(1,21))
data=cbind(x,y)
I obtain the normal approximation using the laplace function.  The output is the posterior mode and the corresponding variance-covariance matrix.
fit=laplace(loglinearpost,c(0,0),data)
fit
$mode
[1] 2.6390946 0.5025815
$var
[,1]         [,2]
[1,]  0.00793621 -0.007936210
[2,] -0.00793621  0.009993843
Below using the mycontour function, I show the exact posterior in RED and the approximating normal density in BLUE.  Note that I define a new function mydnorm that computes the log of the multivariate normal density (defined in dmnorm).   Note that the two sets of contours are very close, indicating that the normal approximation is very accurate in this case.
mycontour(loglinearpost,c(2.2,3,.1,.9),data,col=”red”,
xlab=”BETA0″,ylab=”BETA1″)
mydnorm=function(y,d)
dmnorm(y,d$mu,d$sigma,log=TRUE)
mycontour(mydnorm,c(2.2,3,.1,.9),
list(mu=fit$mode,sigma=fit$var),col=”blue”,add=TRUE)
contour
Categories: Bayesian computation

Learning about a correlation

October 2, 2009 Leave a comment

We’re starting the chapter on Bayesian computation and I thought I’d use a new example to illustrate the basic methods.  I collect the high school GPA and the ACT math score for 50 BGSU freshmen and I’m interested in learning about the correlation.  If I standardize both measurements (by subtracting the mean and dividing by the standard deviation), then I can assume the standardized pairs (x_1, y_1), ..., (x_{50}, y_{50}) are a random sample from a bivariate normal density with 0 means, unit standard deviations, and correlation \rho.  If I assume a uniform prior on \rho, then the posterior density is given by

g(\rho) = \prod \frac{1}{\sqrt{1-\rho^2}}\exp\left(-\frac{1}{2(1-\rho^2)}(x_i^2 - 2\rho x _iy_i + y_i^2)\right)

I begin by writing a short function to compute the log posterior of \rho.  Here data is a matrix of two columns of the standardized x and standardized y.

cor.sampling=function(rho,data)
{
x=data[,1]
y=data[,2]
sum(-0.5*log(1-rho^2)-0.5/(1-rho^2)*(x^2-2*rho*x*y+y^2))
}

cor.sampling=function(rho,data)
{
x=data[,1]
y=data[,2]
sum(-0.5*log(1-rho^2)-0.5/(1-rho^2)*(x^2-2*rho*x*y+y^2))
}

I display the posterior density below.  Note that I use the sapply function to evaluate the posterior for a vector of values.  Also, I do some standardization of the posterior values so that the total area of the plotted density is equal to one.
r=seq(-.99,.99,by=.01)
gr=sapply(r,cor.sampling,dz)
gr=exp(gr-max(gr))/sum(exp(gr-max(gr)))/0.01
plot(r,gr,type=”l”,xlim=c(0,1),col=”red”,lwd=2)
cor1
We illustrate finding a normal approximation to this density by use of the laplace function in the LearnBayes package.  The arguments are the name of the log posterior, the starting guess at the mode (we use 0) and the matrix of data dz.  The output is the posterior mode and the corresponding estimate at the posterior variance.
fit=laplace(cor.sampling,0,dz)
fit
$mode
[1] 0.5015625
$var
[1,] 0.00912294
So the posterior for \rho is approximately N(0.50, \sqrt{0.0091}).  Using the curve function we add the normal approximation to the graph.  This approximation looks pretty good, but there is a little room for improvement.
curve(dnorm(x,fit$mode,sqrt(fit$var)),add=TRUE,col=”blue”,lwd=2)
legend(“topleft”,c(“Exact”,”Approximation”),col=c(“red”,”blue”),lty=c(1,1),lwd=c(2,2))
cor2
Categories: Bayesian computation

Estimating a log odds ratio

September 29, 2009 Leave a comment

We were looking at the relationship between a student’s ACT score (lo or hi) and his GPA (lo or hi).  We constructed a prior on \theta_1 and \theta_2, where \theta_1 is the log odds ratio and \theta_2 is the sum of the log odds.  We assumed that \theta_1 was N(0.5, 0.3), \theta_2 was N(0, 2), and the two parameters were independent.

We now observe some data for a sample of students:

GPA <= 3.0  GPA > 3.0
ACT <=21      29                24
ACT > 21        15                 34

The likelihood function is given by

L(p_1, p_2) = p_1^{29} (1-p_1)^{24} p_2^{15} (1-p_2)^{34}.

Here is a computing strategy to summarize the posterior density.

1.  We write a R function to compute the log posterior density of (\theta_1, \theta_2).

2.  We use the function mycontour to compute the log posterior density on a grid.

3.  Using the function simcontour, we simulate values of (\theta_1, \theta_2) from the posterior density.

4.  We learn about the log odds ratio \theta_1 by summarizing the sample of \theta_1 values.

lctablepost=function (theta, s)
{
ilogit=function(x) exp(x)/(1+exp(x))
logitp1 = (theta[1] + theta[2])/2
logitp2 = (theta[2] – theta[1])/2
s$data[1]*ilogit(logitp1)+s$data[2]*(1-ilogit(logitp1))+
s$data[3]*ilogit(logitp2)+s$data[4]*(1-ilogit(logitp2))+
dnorm(theta[1],s$prior[1],s$prior[2],log=TRUE) +
dnorm(theta[2],s$prior[3],s$prior[4],log=TRUE)
}
d=c(29,24,15,34)
p=c(0.5,0.3,0,2)
stuff=list(data=d,prior=p)
mycontour(lctablepost,c(-1,2,-9,3),stuff,
xlab=”THETA1″,ylab=”THETA2″,col=”red”)
s=simcontour(lctablepost,c(-1,2,-9,3),stuff,1000)
points(s)

Here is the R work.  I write a function, lctablepost, that has two inputs, the vector theta (\theta_1, \theta_2), and a list s that has two components, data (a vector of table counts) and prior (a vector of the means and standard deviations of the two normal priors).  Note that I use a special function ilogit that computes the inverse logit of a probability.

lctablepost=function (theta, s)
{
ilogit=function(x) exp(x)/(1+exp(x))
logitp1 = (theta[1] + theta[2])/2
logitp2 = (theta[2] – theta[1])/2
s$data[1]*ilogit(logitp1)+s$data[2]*(1-ilogit(logitp1))+
s$data[3]*ilogit(logitp2)+s$data[4]*(1-ilogit(logitp2))+
dnorm(theta[1],s$prior[1],s$prior[2],log=TRUE) +
dnorm(theta[2],s$prior[3],s$prior[4],log=TRUE)
}

We define the data and the prior inputs.

d=c(29,24,15,34)
p=c(0.5,0.3,0,2)
stuff=list(data=d,prior=p)

Now we use the LearnBayes function mycontour — the bounding rectangle of (-1, 2, -9, 3) seems to work well.

mycontour(lctablepost,c(-1,2,-9,3),stuff,
xlab=”THETA1″,ylab=”THETA2″,col=”red”)

Last, I simulate 1000 values from this grid by the LearnBayes function simcontour.  The output is a list s with components x and y — so s$x are the simulated draws of \theta_1 and s$y are the simulated draws of \theta_2.

s=simcontour(lctablepost,c(-1,2,-9,3),stuff,1000)

I show the simulated sample on my contour by use of the points function.

points(s)

plot1
plot2
Categories: Bayesian computation