## Normal approximations to posteriors

Continuing our Phillies example, I’m going to change the example somewhat and consider the relationship between the number of runs the Phillies score and the game outcome. Here is the table:

runs scored

game outcome low (four or less) high (five or more)

Win 15 74

Loss 53 20

There appears to be a notable positive relationship here and we’re interested in estimating the underlying correlation coefficient of the bivariate normal.

We first construct a function polycorr.R that computes the logarithm of the posterior of the correlation.

polycorr=function(rho,data)

{

# needs package mvtnorm

n11=data[1,1]; n12=data[1,2]; n21=data[2,1]; n22=data[2,2]

nA=n11+n12; nB=n11+n21; n=n11+n12+n21+n22

c=qnorm(nA/n); d=qnorm(nB/n); pc=nA/n; pd=nB/n

val=0*rho

for (j in 1:length(rho))

{

C=matrix(c(1,rho[j],rho[j],1),c(2,2))

p00=pmvnorm(lower=-Inf*c(1,1),upper=c(c,d),corr=C)

val[j]=n11*log(pc-p00)+n12*log(1-pc-pd+p00)+n21*log(p00)+n22*log(pd-p00) }

return(val)

}

We input the data as a two by two matrix.

data=matrix(c(15,53,74,20),c(2,2))

We find the normal approximation by use of the laplace function in the LearnBayes package. The inputs are the function, the starting value for the Newton algorithm, the number of iterations of the algorithm, and the data used in the function.

fit=laplace(polycorr,.6,10,data)

From the output of this function, we get that rho is approximately N(.694, .00479). In the R code below we plot the exact and approximate posteriors. In the figure, we see some inaccuracy in the normal approximation.

rho=seq(.3,1,by=.01)

gpost=exp(polycorr(rho,data))

plot(rho,gpost/sum(gpost)/.01,type=”l”,lwd=3,col=”red”, ylab=”DENSITY”,xlab=”RHO”)

lines(rho,dnorm(rho,fit$mode,sqrt(fit$var)),lwd=3,col=”blue”)

legend(locator(1),c(“EXACT”,”NORMAL APPROX”),col=c(“red”,”blue”),lwd=2)

One way of improving the accuracy of the normal approximation is to transform rho to the real-valued parameter theta = log [(rho+1)/(1-rho)]. We write a function to compute the log posterior of theta.

polycorr2=function(theta,data)

{

# needs package mvtnorm

rho=(exp(theta)-1)/(exp(theta)+1)

n11=data[1,1]; n12=data[1,2]; n21=data[2,1]; n22=data[2,2]

nA=n11+n12; nB=n11+n21; n=n11+n12+n21+n22

c=qnorm(nA/n); d=qnorm(nB/n); pc=nA/n; pd=nB/n

val=0*rho

for (j in 1:length(rho))

{

C=matrix(c(1,rho[j],rho[j],1),c(2,2)) p00=pmvnorm(lower=-Inf*c(1,1),upper=c(c,d),corr=C) val[j]=n11*log(pc-p00)+n12*log(1-pc-pd+p00)+n21*log(p00)+n22*log(pd-p00) }

return(val+log(1-rho)+log(1+rho))

}

We find the normal approximation using the function laplace. Here the approximation is that the transformed rho is normal with mean 1.662 and variance .0692.

fit1=laplace(polycorr2,0,10,data)

We plot the exact and approximate posteriors — here the normal approximation appears very accurate.

theta=seq(0.4,3.0,by=.01)

gpost=exp(polycorr2(theta,data)) plot(theta,gpost/sum(gpost)/.01,type=”l”,lwd=3,col=”red”, ylab=”DENSITY”,xlab=”THETA”) lines(theta,dnorm(theta,fit1$mode,sqrt(fit1$var)),lwd=3,col=”blue”)

legend(locator(1),c(“EXACT”,”NORMAL APPROX”),col=c(“red”,”blue”),lwd=2)

## Inferences for Gamma Sampling Problem

In the previous post, I considered the problem of modeling lengths of cell phone calls. Here we focus on several types of inferences and predictions that might be of interest.

Following the general computing strategy described in Chapter 5 of BCWR, I first transform the gamma parameters (alpha, beta) to (theta1 = log alpha, theta2 = log mu= log (alpha beta)). The function gamma.sampling.post computes the posterior of (theta1, theta2). The function mycontour draws a contour plot and the function simcontour simulates from this grid. The figure shows the contour plot with the simulated draws placed on top.

> y=c(12.2,.9,.8,5.3,2,1.2,1.2,1,.3,1.8,3.1,2.8)

> library(LearnBayes)

> gamma.sampling.post=function(theta,data)

+ {

+ a=exp(theta[,1])

+ mu=exp(theta[,2])

+ n=length(data)

+ val=0*a

+ for (i in 1:n) val=val+dgamma(data[i],shape=a,scale=mu/a,log=TRUE)

+ return(val-log(a)+log(a)+log(mu))

+ }

> mycontour(gamma.sampling.post,c(-1.5,1.5,0,3),y)

> title(main=”POSTERIOR OF (LOG ALPHA, LOG MU)”,xlab=”log alpha”,

+ ylab=”log mu”)

> s=simcontour(gamma.sampling.post,c(-1.5,1.5,0,3),y,1000)

> points(s$x,s$y)

Suppose we are interested in the mean length of cell phone calls mu. In particular, what is the probability that the mean length exceeds 4 minutes? The figure displays a density estimate of the simulated draws of mu, and I have labeled the desired probability.

> mu=exp(s$y)

> alpha=exp(s$x)

> beta=mu/alpha

> plot(density(mu),main=”POSTERIOR OF MEAN LENGTH”,xlab=”mu”,lwd=3)

> lines(c(4,4),c(0,ss$y[135]),lwd=3)

> text(8,.15,”P(MU > 4) = 0.178″)

> arrows(7,.1,4.5,.05,lwd=2)

Next, suppose we are interested in the predictive distribution of the length of a single cell phone call. Since we have already collected simulated draws from the posterior of (alpha, beta), it just takes one additional command to simulate the predictive distribution of y* (using the function rgamma). I have displayed a density estimate of the predictive density.

Note that the probability the mean call length exceeds 4 minutes is 0.178; the probability a future call exceeds 4 minutes is 0.263

> ys=rgamma(1000,shape=alpha,scale=beta)

> plot(density(ys),xlab=”CALL LENGTH”, lwd=3, main=”POSTERIOR PREDICTIVE DENSITY”)

> mean(ys>4)

[1] 0.263

Last, suppose you plan on making 20 calls next month and you’re interested in the total amount of time used. By use of a loop, we simulate 20 draws from the predictive distribution — the variable ysum contains 1000 realizations of the total.

> ysum=rep(0,1000)

> for (j in 1:20) ysum=ysum+rgamma(1000,shape=alpha,scale=beta)

> hist(ysum, main=”PREDICTIVE DISTRIBUTION OF LENGTH OF 20 CALLS”)

## Modeling Cell Phone Call Durations with a Gamma Density

Suppose we observe a sample y1, …, yn from a gamma(alpha, beta) density where the sampling density is proportional to y^{alpha-1} exp(-y/beta), and we assign a uniform prior on (alpha, beta).

As an example, suppose we wish to fit a gamma density to the durations (in minutes) of a group of cell phone calls.

12.2 0.9 0.8 5.3 2.0 1.2 1.2 1.0 0.3 1.8 3.1 2.8

Here is the R function that computes the log posterior of the density:

gamma.sampling.post1=function(theta,data)

{

a=theta[,1]

b=theta[,2]

n=length(data)

val=0*a

for (i in 1:n) val=val+dgamma(data[i],shape=a,scale=b,log=TRUE)

return(val)

}

The first figure is a contour graph of the posterior density of (alpha, beta). (In R, beta is called the scale parameter.)

Note the strong curvature in the posterior.

Instead, suppose we consider the joint posterior of alpha and the “rate” parameter theta = 1/beta. Here is a contour plot of the posterior of (alpha, theta).

This doesn’t display the strong curvature.

Last, suppose you consider the joint posterior of alpha and the mean mu = alpha beta. The last figure displays the posterior of (alpha, mu).The moral here is that the choice of parameterization can be important when summarizing the posterior distribution. In the next chapter, we’ll suggest a rule of thumb for transforming parameters that makes it easier to summarize many posteriors.

## Fitting a Beta Sampling Model

To illustrate a “brute-force” method of summarizing a posterior, suppose that we observe a sample y1, …, yn from a beta distribution with parameters a and b. If we assign (a, b) a uniform prior, then the posterior density is given by

g(a, b | data) propto prod_{i=1}^n f(y_i; a, b),

where f(y; a, b) = 1/B(a, b) y^(a-1) (1-y)^(b-1) is the beta density. As an example, suppose we are given the following proportions of students who are “math proficient” on the Ohio Graduation Test for a random sample of 20 schools in Ohio.

y=c(0.955, 0.819, 0.472, 0.925, 0.780, 0.931, 0.945, 0.926, 0.852, 0.920, 0.885, 0.890, 0.789, 0.973, 0.831, 0.835, 0.884, 0.904, 0.900, 0.806)

Here is our method:

1. We write a short R function betasampling.post.R that computes the logarithm of the posterior density. Note that the built-in function dbeta is used — the log=TRUE option gives the logarithm of the beta density.

betasampling.post=function(theta,data)

{

a=theta[,1]

b=theta[,2]

n=length(data)

val=0*a

for (i in 1:n) val=val+dbeta(data[i],a,b,log=TRUE)

return(val)

}

2. Next, by trial and error, we find a rectangle (a_lo, a_hi, b_lo, b_hi) that contains the contour plot of the joint posterior (remember that the contours given in the function mycontour.R are located at 10%, 1%, and 0.1% of the height of the density at the mode.)

mycontour(betasampling.post,c(.001,35,.001,6),y)

title(main=”Posterior density of (a, b)”,xlab=”a”,ylab=”b”)

3. We then sample from the grid of values of (a, b) that is used in constructing the scatterplot.

s=simcontour(betasampling.post,c(.001,35,.001,6),y,1000)

4. Suppose we are interested in the marginal posterior densities of a and b. We find these by use of density estimates on the simulated draws of a and b.

par(mfrow=c(2,1))

plot(density(s$x),main=”POSTERIOR OF a”,xlab=”a”,lwd=3)

plot(density(s$y),main=”POSTERIOR OF b”,xlab=”b”,lwd=3)

## Fitting a logistic model

Freshman students at BGSU take a mathematics placement test and a “placement score” is used to advise the student on the proper first mathematics course. For students taking a business calculus course, we record (1) his or her placement score and (2) his or her grade in the course. There are five possible placement levels that we code as 1, 2, 3, 4, 5. Let yi denote the number of students receiving A at placement level i. We suppose that yi is binomial(ni, pi), where pi is the probability a student at level i receives an A in the class. We let the probabilities satisfy the logistic model

logit(pi) = beta0 + beta1 i.

Assuming a uniform prior for beta = (beta0, beta1), the posterior distribution is proportional to

g(beta) = product pi^yi (1-pi)^(ni-yi).

The definition of the log posterior of beta is defined in the R function logisticpost.R.

We illustrate a “brute force” method of fitting this model.

1. We first read in the data — we create three vectors y, n, and level. The matrix

data has columns level, n, and y.

> y=c(2,15,29,39,15)

> n=c(34,170,283,243,59)

> level=1:5

> data=cbind(level,n,y)

> data

level n y

[1,] 1 34 2

[2,] 2 170 15

[3,] 3 283 29

[4,] 4 243 39

[5,] 5 59 15

2. We illustrate the usual MLE fit using the R function glm. The MLE will be helpful in finding where the posterior is located.

> response=cbind(y,n-y)

> glm(response~level,family=binomial)

Coefficients:

(Intercept) level

-3.328 0.423

3. After some trial and error, we find a rectangle where the posterior is concentrated. The function mycontour is used to draw a contour plot.

> mycontour(logisticpost,c(-5,-1.5,-.2,1),data)

4. The function simcontour is used to simulate draws from the posterior computed on this grid. We plot the simulated draws on top of the scatterplot.

> s=simcontour(logisticpost,c(-5,-1.5,-.2,1),data,1000)

> points(s$x,s$y)

> title(xlab=”BETA0″,ylab=”BETA1″)

## Brute-force computation of a posterior

Suppose we observe y that is normal with mean theta and standard deviation sigma. Instead of using a conjugate prior, suppose that theta has a t distribution with location mu, scale tau, and degrees of freedom df. Although there is not a nice form for the posterior density, it is straightforward to compute the posterior by use of the “prior x likelihood” recipe. We write a function post.norm.t.R that computes the posterior.

# we source this function into R

source(url(“http://bayes.bgsu.edu/m648/post.norm.t.R”))

# define parameters of problem

s=list(y=125,sigma=15/2,mu=100,tau=6.85,df=2)

# set up grid of values of theta

theta=seq(80,160,length=100)

# compute the posterior on the grid

post=post.norm.t(theta,s)

# convert the posterior value to probabilities

post.prob=post/sum(post)

# sample from discrete distribution on grid

sim.theta=sample(theta,size=10000,replace=TRUE,prob=post.prob)

# construct a histogram of simulated sample

# and place exact posterior on top

hist(sim.theta, freq=FALSE)

d=diff(theta[1:2])

con=sum(d*post) # this is normalizing constant

lines(theta,post/con)

From the simulated sample, we can compute any summary of the posterior of interest.