## Variance components model

Here is a simple illustration of an variance components model given by “Dyes” in the WinBUGS 1.4 Examples, volume 1:

******************************************************

Box and Tiao (1973) analyse data first presented by Davies (1967) concerning batch to batch variation in yields of dyestuff. The data (shown below) arise from a balanced experiment whereby the total product yield was determined for 5 samples from each of 6 randomly chosen batches of raw material.

Batch Yield (in grams)

_______________________________________

1 1545 1440 1440 1520 1580

2 1540 1555 1490 1560 1495

3 1595 1550 1605 1510 1560

4 1445 1440 1595 1465 1545

5 1595 1630 1515 1635 1625

6 1520 1455 1450 1480 1445

*******************************************************

Let denote the jth observation in batch i. To determine the relative importance of between batch variation versus sampling variation, we fit the multilevel model.

1. is N()

2. are iid N(0,

3. assigned a uniform prior

In this situation, the focus is on the marginal posterior distribution of . It is possible to analytically integrate out the random effects , resulting in the marginal posterior

density

where is the “within batch” sum of squares for the ith batch. To use the computational algorithms in LearnBayes, we consider the log posterior distribution of

that is programed in the function logpostnorm1:

logpostnorm1=function(theta,y)

{

mu = theta[1]; sigma.y = exp(theta[2]); sigma.b = exp(theta[3])

p.means=apply(y,1,mean); n=dim(y)[2]

like1=-(apply(sweep(y,1,p.means)^2,1,sum))/2/sigma.y^2-n*log(sigma.y)

like2=-(p.means-mu)^2/2/(sigma.y^2/n+sigma.b^2)-.5*log(sigma.y^2/n+sigma.b^2)

return(sum(like1+like2)+theta[2]+theta[3])

}

In the following R code, I load the LearnBayes package and read in the function logpostnorm1.R and the Dyes dataset stored in “dyes.txt”.

Then I summarize the posterior by use of the laplace function — the mode of () is (3.80, 3.79).

> library(LearnBayes)

> source(“logpostnorm1.R”)

> y=read.table(“dyes.txt”)

> fit=laplace(logpostnorm1,c(1500,3,3),y)

> fit$mode

[,1] [,2] [,3]

[1,] 1527.5 3.804004 3.787452

## Fitting an Exchangeable Model

Continuing our baseball example, we observe yi home runs in ei at-bats for the ith player. We assume

1. y1, …, y20 are independent, yi is Poisson(lambda_i)

2. the true rates lambda_1,…, lambda_20 are independent Gamma(alpha, alpha/mu)

3. the hyperparameters alpha, mu are independent, mu is distributed 1/mu, alpha is distributed according to the proper prior z0/(alpha+z0)^2, where z0 is the prior median

The data is stored as a matrix d, where the first column are the ei and the second column are the yi. In our example, we let z0 = 1; that is, the prior median of alpha is one.

Here is our computing strategy.

1. First we learn about the hyperparameters (alpha, mu). The posterior of (log alpha, log mu) is programmed in the LearnBayes function poissgamexch. Here is a contour plot.

ycontour(poissgamexch,c(-1,8,-4.2,-2.5),datapar)

title(xlab=”LOG ALPHA”,ylab=”LOG MU”)

We use the function laplace to find the posterior mode and associated variance-covariance matrix. The output from laplace is used to find a proposal variance and scale parameter to use in a random walk Metropolis chain. We simulate 10,000 draws from the posterior of (log alpha, log mu). (This chain has approximately a 30% acceptance rate.)

fit=laplace(poissgamexch,c(2,-3.2),datapar)

proposal=list(var=fit$var,scale=2)

mcmcfit=rwmetrop(poissgamexch,proposal,c(1,-3),10000,datapar)

By exponentiating the simulated draws from mcmcfit, we get draws from the marginal posteriors of alpha and mu. We draw a density plot of alpha and superimpose the prior density of alpha.

alpha=exp(mcmcfit$par[,1])

mu=exp(mcmcfit$par[,2])

plot(density(alpha,adjust=2),xlim=c(0,20),col=”blue”,lwd=3,xlab=”ALPHA”,

main=”POSTERIOR OF ALPHA”)

prior=function(alpha,z0) z0/(alpha+z0)^2

theta=seq(.0001,20,length=200)

lines(theta,prior(theta,1),col=”red”,lwd=3)

2. Now we can learn about the true rate parameters. Conditional on hyperparameter values, lambda_1, …, lambda_20 have independent gamma posteriors.

We write a short function to simulate draws of lambda_j for a particular player j.

trueratesim=function(j,data,alpha,mu)

{

e=data[,1]; y=data[,2]

rgamma(length(alpha),shape=alpha+y[j],rate=alpha/mu+e[j])

}

Then we can simulate draws of all the true rates by use of the sapply function.

lam=sapply(1:20,trueratesim,d,alpha,mu)

The output lam is a 10,000 by 20 matrix. We can compute posterior means by the apply function.

post.means=apply(lam,2,mean)

To show the behavior of the posterior means, we draw a plot where

(1) we show the observed rates yi/ei as red dots

(2) we show the posterior means as blue dots

(3) a horizontal line is draw at the mean home run rates for all 20 players.

## Fitting a Mixture Sampling Model

To illustrate the application of a MCMC random walk algorithm, consider the following mixture model. Suppose we observe season home run counts y1, …, y20 from a group of baseball players. Some of these players are “sluggers” who average many home runs (per season) and other players are “light hitters” who don’t hit as many home runs. We know that 40% of all players are sluggers, but we don’t know the identity (slugger or non-slugger) for each player. The sampling density for the ith player’s home run count yi has the density

f(yi | lambda1, lambda2) = p f(yi | lambda1) + (1- p) f(yi | lambda2),

where we assume f(yi | lambda) is Poisson with mean lambda and we assume we know p = .4.

Suppose we assume that (lambda1, lambda2) has the prior 1/(lambda1 lambda2). Then the posterior is given (up to a proportionality constant) by

g(lambda1, lambda2 | y) = 1/(lambda1 lambda2) prod from i=1 to 20 [p f(yi | lambda1) + (1- p) f(yi | lambda2)]

Here is some simulated data

25 42 29 29 28 25 21 34 19 11 16 13 15 17 10 16 21 17 16 20

We first transform the parameters to the real-valued parameters

theta1 = log(lambda1), theta2 = log(lambda2)

and write the following function that computes the posterior of (theta1, theta2). Here the input variable data.p is a list that contains two elements: p, the known value of p, and y, the vector of observations.

poisson.mix=function(theta, data.p)

{

lambda1=exp(theta[,1])

lambda2=exp(theta[,2])

y=data.p$y

p=data.p$p

val=0*lambda1

for (i in 1:length(y))

val=val+log(p*dpois(y[i],lambda1)+(1-p)*dpois(y[i],lambda2))

return(val)

}

The data is stored in the vector y. We use the mycontour function to graph the bivariate posterior. We also show a perspective plot of this posterior.

mycontour(poisson.mix,c(2.2,3.8,2.2,3.8),list(y=y,p=.4))

You might be surprised to see that the posterior is bimodal. This is happening since the model is not well-defined. But in the next posting, we’ll ignore this identifiability problem, and see if a random walk Metropolis can be successful in sampling from this posterior.

## Choosing the scale for a Metropolis RW algorithm

Let’s return to the example where we are sampling from a Cauchy density with unknown median theta and known scale parameter 1 and we place a uniform prior on theta. We observe the data

1.3 1.9 3.2 5.1 1.4 5.8 2.5 4.7 2.9 5.2 11.6 8.8 8.5 10.7 10.2

9.8 12.9 7.2 8.1 9.5

An attractive method of obtaining a simulated sample from this posterior is the Metropolis random walk algorithm. If theta^(t-1) represents the current simulated draw, then the next candidate is generated from the distribution

theta^(t) = theta^(t-1) + c Z,

where Z is standard normal. The main issue here is the selection of the scale constant c.

I did a simple experiment. I simulated samples of 10,000 draws using the scale values 0.2, 1, 5, 25. Here’s the R code for the Metropolis random walk with the choice of scale parameter 0.2.

cpost=function(theta,y)

{

val=0*theta

for (j in 1:length(y))

val=val+dt(y[j]-theta,df=1,log=TRUE)

return(val)

}

proposal=list(var=1,scale=.2)

fit1=rwmetrop(cpost,proposal,20,10000,data)

The acceptance rates for these four values are given by

scale acceptance rate

0.2 93%

1.0 73%

5.0 31%

25 7%

To assess the accuracy of a particular sample, we draw the exact posterior in red, and superimpose a density estimate of the simulated draws in blue.

Comparing these four figures, the scale values of 1 and 5 seem to do pretty well.

## Summarizing a posterior

Continuing our selected data example, remember that we have programmed the posterior of the transformed parameters theta1 and theta2 in the R function kaminsky.

To find a normal approximation to the posterior, we apply the function laplace in the LearnBayes package. The inputs are (1) the function defining the log posterior, (2) a starting guess at the mode, (3) the number of iterations of the Newton algorithm, and (4) the data vector (the 5th and 15th order statistics).

> start=array(c(-2,-1),c(1,2))

> fit.laplace=laplace(kaminsky,start,10,data)

The output is a list containing mode, the value of the posterior mode, and var, the estimate at the variance-covariance matrix.

> fit.laplace$mode

[,1] [,2]

[1,] -2.367745 -1.091989

> fit.laplace$var

[,1] [,2]

[1,] 0.3201467 0.1191059

[2,] 0.1191059 0.1191059

We can get more accurate summaries of the posterior by means of a Metropolis random walk algorithm. The function rwmetrop implements this algorithm for an arbitrary posterior. To use this function, we define “proposal”, a list containing the variance and scale parameter for the normal proposal density, the starting value for the MCMC chain, the number of simulated draws, and the data vector. Note that we are using the approximate variance-covariance matrix from laplace in the proposal density for rwmetrop.

> proposal=list(var=fit.laplace$var,scale=2)

> fit.mcmc=rwmetrop(kaminsky,proposal,start,10000,data)

The output of rwmetrop is a list containing accept, the acceptance rate for the chain, and par, the matrix of simulated draws.

At this point, we should run some convergence diagnostics to see if the simulated draws show sufficient mixing and don’t display unusually high autocorrrelations. For this example, the acceptance rate is about 29% which is within the acceptable range for this algorithm.

We display the simulated draws on top of the contour plot of theta1 and theta2 — it seems that that most the simulated draws fall within the first contour line.

> mycontour(kaminsky,c(-5,0,-2.5,1),data)

> title(xlab=”LOG(Y5-MU)”,ylab=”LOG BETA”)

> points(fit.mcmc$par[,1],fit.mcmc$par[,2])

We are interested in the parameters mu and beta. We first compute vectors of simulated draws of mu and beta by transforming back the simulated draws of theta1 and theta.

> MU=data[1]-exp(fit.mcmc$par[,1])

> BETA=exp(fit.mcmc$par[,2])

We display the marginal posteriors of mu and beta.

> par(mfrow=c(2,1))

> plot(density(MU),lwd=3,main=”POSTERIOR OF MU”,xlab=”MU”)

> plot(density(BETA),lwd=3,main=”POSTERIOR OF BETA”,xlab=”BETA”)

We construct 90% interval estimates by extracting quantiles from the collection of simulated draws.

> quantile(MU,c(.05,.95))

5% 95%

9.864888 10.065765

> quantile(BETA,c(.05,.95))

5% 95%

0.2012635 0.6520858

Last, suppose we are interested in predicting the 5th order statistic ys5 from a future sample of 20 observations.

To simulate from the distribution of ys5, we (1) simulate (mu, beta) from the posterior and then (2) simulate a future sample y1,…,y20 from the exponential distribution with parameters mu and beta, and (3) storing the 5th ordered observation from the simulated sample. We repeat this process 1000 times, obtaining a simulated sample from ys5. We display this predictive distribution by a histogram.

ys5=rep(0,1000)

for (j in 1:1000)

{

ys=rexp(20,rate=1/BETA[5000+j])+MU[5000+j]

ys5[j]=sort(ys)[5]

}

hist(ys5,col=”orange”)

## Learning from selected order statistics

To illustrate some computational methods for summarizing a posterior, I describe a missing data problem motivated from my undergraduate days at Bucknell. Ken Kaminsky and Paul Nelson, two of my Bucknell professors, were interested in learning about populations based on selected order statistics. (I wrote an undergraduate honors thesis on this topic. ) Here is a simple illustration of the problem.

Suppose a sample y1, …, y20 is taken from the two-parameter exponential distribution of the form f(y | mu, beta) =1/beta exp(-(y-mu)/beta), y > mu. But you don’t observe the complete dataset — all you observe are the two order statistics y(5) and y(15) (the order statistics are the observations arranged in ascending order).

Based on this selected data, we wish to (1) estimate the parameters mu and beta by 90% interval estimates and (2) predict the value of the order statistics y*(5) and y*(20) from a future sample taken from the same population.

Here’s the plan:

1. First, we write the likelihood which is the density of the observed data (y(5) and y(20)) given values of the exponential parameters mu and beta. One can show that this likelihood is given by

L(mu, beta) = f(y(5)) f(y(15)) F(y(5))^4 (F(y(15) )-F(y(5)))^9 (1- P(y(15)))^5, mu>0, beta>0.

2. Assuming a flat (uniform) prior on (mu, beta), the posterior density is proportional to the likelihood. We write a R function kaminsky0.R that computes the logarithm of the posterior — here the parameters are (mu, beta) and the data is (y(5), y(15)).

kaminsky0=function(theta,data)

{

f=function(y,mu,beta)

return(dexp(y-mu,rate=1/beta))

F=function(y,mu,beta)

return(pexp(y-mu,rate=1/beta))

y5=data[1]; y15=data[2]

mu=theta[,1]

beta=theta[,2]

loglike=log(f(y5,mu,beta))+log(f(y15,mu,beta))+

4*log(F(y5,mu,beta))+9*log(F(y15,mu,beta)-F(y5,mu,beta))+

5*log(1-F(y15,mu,beta))

return(loglike)

}

3. Graphing the posterior of (mu, beta), we see strong skewness in both parameters.

It is usually helpful to transform to real-valued parameters

theta1 = log(y(5) – mu) , theta1 = log(beta).

We write the following function kaminsky.R that computes the log posterior of (theta1, theta2).

kaminsky=function(theta,data)

{

f=function(y,mu,beta)

return(dexp(y-mu,rate=1/beta))

F=function(y,mu,beta)

return(pexp(y-mu,rate=1/beta))

y5=data[1]; y15=data[2]

mu=y5-exp(theta[,1])

beta=exp(theta[,2])

loglike=log(f(y5,mu,beta))+log(f(y15,mu,beta))+

4*log(F(y5,mu,beta))+9*log(F(y15,mu,beta)-F(y5,mu,beta))+

5*log(1-F(y15,mu,beta))

logjack=theta[,1]+theta[,2]

return(loglike+logjack)

}

Here’s a graph of the posterior of the reexpressed parameters — note that it is much more normal-shaped.

4. We’ll use several functions in the next posting to summarize the posterior.

(a) The laplace function is useful in finding the posterior mode and normal approximation to the posterior.

(b) By use of the rwmetrop function, we construct a random-walk Metropolis algorithm to simulate from the joint posterior.

## The SIR algorithm

There is a simple method, called the SIR algorithm, of taking a simulated sample of draws from one distribution p, and using these draws to produce a sample from a different distribution g. We illustrate this method for the Cauchy sampling model example introduced in the last post.

Suppose that we have a proposal density g(theta) that we believe is a rough approximation to the posterior (in terms of location and spread). Here we suppose that a t density with mean 7, variance 9 and degrees of freedom 3 is a rough approximation to our bimodal posterior density for theta.

There are three steps in the SIR algorithm.

1. (S) We Sample 1000 draws from the proposal density p. We are storing these in the vector theta.p.

theta.p=sqrt(VAR)*rt(1000,DF)+MEAN

2. (I) We compute Importance sampling weights for this sample equal to the ratios of the target density (g) to the proposal density (p).

p.theta=dt(theta.p-MEAN,DF)/sqrt(VAR)

g.theta=exp(cpost(theta.p,y))

weights=g.theta/p.theta

The following figure plots the simulated draws from the proposal density against the weights.

plot(theta.p,weights,xlim=c(2,12),xlab=”THETA”,ylab=”POST/PROPOSAL”)

3. (R) We resample 1000 draws with replacement from the simulated draws theta.p, where the sampling probabilities are proportional to the weights.

probs=weights/sum(weights)

theta.sample=sample(theta.p,size=1000,prob=probs,replace=TRUE)

The values in the vector theta should be (approximately) from the posterior density g.

The sir algorithm with a t proposal density is implemented in the function sir.R in the LearnBayes algorithm.

To illustrate this function, remember the definition of the log posterior is in cpost and the data is stored in the vector y. We create a list tpar that contains the components of the t proposal density.

MEAN=7; VAR=9; DF=3

tpar=list(m=MEAN,var=VAR,df=DF)

Then we implement the algorithm using the sir function — the output is a vector of simulated draws from the posterior.

s=sir(cpost,tpar,1000,y)