## A Prediction Contest

Concluding our baseball example, recall that we observed the home run rates for 20 players in the month of April and we were interested in predicting their home run rates for the next month. Since we have collected data for both April and May, we can check the accuracy of three prediction methods.

1. The naive method would be to simply use the April rates to predict the May rates. Recall that the data matrix is d where the first column are the at-bats and the second column are the home run counts.

pred1=d[,2]/d[,1]

2. A second method, which we called the “pooled” method, predicts each player’s home run rate by the pooled home run rate for all 20 players in April.

pred2=sum(d[,2])/sum(d[,1])

3. The Bayesian method, predicts a player’s May rate by his posterior mean of his true rate lambda_j.

pred3=post.means

One measure of accuracy is the sum of absolute prediction errors.

The May home run rates are stored in the vector may.rates. We first compute the individual prediction errors for all three methods.

error1=abs(pred1-may.rates)

error2=abs(pred2-may.rates)

error3=abs(pred3-may.rates)

We use the apply statement to sum the absolute errors.

errors=cbind(error1,error2,error3)

apply(errors,2,sum) # sum of absolute errors for all methods

error1 error2 error3

0.3393553 0.3111552 0.2622523

By this criterion, Bayes beats Pooled beats Naive.

Finally, suppose we are interested in predicting the number of home runs hit by the first player Chase Utley in May. Suppose we know that he’ll have 115 at-bats in May.

1. We have already simulated 10,000 draws from Utley’s true home run rate lambda1 — these are stored in the first column of the matrix lam vector.

2. Then 10,000 draws from Utley’s posterior predictive distribution are obtained by use of the rpois function.

ys1=rpois(10000,AB*lam[,1])

We graph this predictive distribution by the command

plot(table(ys1),xlab=”NUMBER OF MAY HOME RUNS”)

The most likely number of May home runs is 3, but a 90% prediction interval is [1, 9].

## Fitting an Exchangeable Model

Continuing our baseball example, we observe yi home runs in ei at-bats for the ith player. We assume

1. y1, …, y20 are independent, yi is Poisson(lambda_i)

2. the true rates lambda_1,…, lambda_20 are independent Gamma(alpha, alpha/mu)

3. the hyperparameters alpha, mu are independent, mu is distributed 1/mu, alpha is distributed according to the proper prior z0/(alpha+z0)^2, where z0 is the prior median

The data is stored as a matrix d, where the first column are the ei and the second column are the yi. In our example, we let z0 = 1; that is, the prior median of alpha is one.

Here is our computing strategy.

1. First we learn about the hyperparameters (alpha, mu). The posterior of (log alpha, log mu) is programmed in the LearnBayes function poissgamexch. Here is a contour plot.

ycontour(poissgamexch,c(-1,8,-4.2,-2.5),datapar)

title(xlab=”LOG ALPHA”,ylab=”LOG MU”)

We use the function laplace to find the posterior mode and associated variance-covariance matrix. The output from laplace is used to find a proposal variance and scale parameter to use in a random walk Metropolis chain. We simulate 10,000 draws from the posterior of (log alpha, log mu). (This chain has approximately a 30% acceptance rate.)

fit=laplace(poissgamexch,c(2,-3.2),datapar)

proposal=list(var=fit$var,scale=2)

mcmcfit=rwmetrop(poissgamexch,proposal,c(1,-3),10000,datapar)

By exponentiating the simulated draws from mcmcfit, we get draws from the marginal posteriors of alpha and mu. We draw a density plot of alpha and superimpose the prior density of alpha.

alpha=exp(mcmcfit$par[,1])

mu=exp(mcmcfit$par[,2])

plot(density(alpha,adjust=2),xlim=c(0,20),col=”blue”,lwd=3,xlab=”ALPHA”,

main=”POSTERIOR OF ALPHA”)

prior=function(alpha,z0) z0/(alpha+z0)^2

theta=seq(.0001,20,length=200)

lines(theta,prior(theta,1),col=”red”,lwd=3)

2. Now we can learn about the true rate parameters. Conditional on hyperparameter values, lambda_1, …, lambda_20 have independent gamma posteriors.

We write a short function to simulate draws of lambda_j for a particular player j.

trueratesim=function(j,data,alpha,mu)

{

e=data[,1]; y=data[,2]

rgamma(length(alpha),shape=alpha+y[j],rate=alpha/mu+e[j])

}

Then we can simulate draws of all the true rates by use of the sapply function.

lam=sapply(1:20,trueratesim,d,alpha,mu)

The output lam is a 10,000 by 20 matrix. We can compute posterior means by the apply function.

post.means=apply(lam,2,mean)

To show the behavior of the posterior means, we draw a plot where

(1) we show the observed rates yi/ei as red dots

(2) we show the posterior means as blue dots

(3) a horizontal line is draw at the mean home run rates for all 20 players.

## Predicting home run rates

Here is a simple prediction problem. Suppose we observe the number of home runs and the number of at-bats for 20 baseball players during the first month of the baseball season (April).

We observe corresponding home run rates: Utley 4/88, Weeks 1/77, Aurila 4/71, and so on.

From this data, we want to predict the home run rates for the same 20 players for the next month (May).

How can we best do this? Here are some opening comments.

1. One idea would be to estimate the May rates just by using the April rates. So we predict Utley’s May rate to be 4/88, Weeks rate to be 1/77, etc. This may not be a good idea since we have limited data for each player.

2. Or maybe a better strategy would be to combine the data. Collectively in April, this group hit 60 home runs in 1697 at-bats for a rate of 60/1697 = 0.035. We could estimate each player’s May rate by 0.035. But this would ignore the fact that players have different abilities to hit home runs.

3. Actually, the best prediction strategy is a compromise between the first two ideas. A good plan is to predict a player’s May rate by a “shrinkage” estimate that shrinks a player’s individual rate towards the combined rate.

In the following postings, we’ll illustate how to fit a Bayesian exchangeable model to these data that gives a good prediction strategy.

## View of an Exchangeable Prior

In Chapter 7, we consider the following exchangeable prior for Poisson rates lam_1, …, lam_k that is described in two stages.

Stage I. Conditional on parameters alpha, mu, lam_1, …, lam_k are independent Gamma(alpha, alpha/mu)

Stage II. The parameters (alpha, mu) come from a specified prior g(alpha, mu).

Here mu is the prior mean of lam_i and alpha is a precision parameter. This structure induces the following prior on lam_1, .., lam_k:

g(lam_1, …, lam_k) = integral prod P(lam_j | alpha, mu) g(alpha, mu) dalpha dmu.

To see how this prior reflects dependence between the parameters, suppose we fix alpha to the value alpha_0 and let mu be distributed inverse gamma(a, b). Then one can show the prior on lam_1,…, lam_k is given (up to a proportionality constant) by

g(lam_1, …, lam_k) = P^(alpha_0-1)/(alpha_0 S + b)^(k alpha_0 + a),

where P is the product of lam_j and S is the sum of lam_j.

To see this prior, we program a simple function pgexchprior that computes the logarithm of the prior of lam_1 and lam_2 given parameter values (alpha_0, a, b).

pgexchprior=function(lambda,pars)

{

alpha=pars[1]

a=pars[2]

b=pars[3]

(alpha-1)*log(prod(lambda))-(2*alpha+a)*log(alpha*sum(lambda)+b)

}

The following R commands construct contour plots of the prior for lam_1 and lam_2 for the precision parameters alpha_0 = 5, 20, 80, and 200. (In each case, we assign mu an inverse-gamma (10, 10) prior.)

alpha=c(5,20,80,400)

par(mfrow=c(2,2))

for (j in 1:4)

{

mycontour(pgexchprior,c(.001,5,.001,5),c(alpha[j],10,10))

title(main=paste(“ALPHA = “,alpha[j]),xlab=”LAMBDA1″,ylab=”LAMBDA2”)

}

These plots clearly show that, as alpha increases, the prior induces stronger correlation between the two Poisson rates.

## Robust Modeling

To illustrate some of the computational methods to summarize a posterior, suppose that we observe a sample y1, …, yn from a Cauchy density with location theta and scale parameter 1. If we assign a uniform prior to theta, then the posterior density of theta is proportional to

product from i=1 to n [ (1 + (yi – theta))^{-1} ].

Suppose we observe the following 20 values:

1.3 1.9 3.2 5.1 1.4 5.8 2.5 4.7 2.9 5.2 11.6 8.8 8.5 10.7 10.2

9.8 12.9 7.2 8.1 9.5

In R, we define a new function cpost.R that computes the logarithm of the posterior density.

cpost=function(theta,y)

{

val=0*theta

for (j in 1:length(y))

val=val+dt(y[j]-theta,df=1,log=TRUE)

return(val)

}

We compute and display the posterior on the interval [2, 12].

Note the interesting bimodal shape of the posterior. Clearly a normal approximation to this posterior will not be an accurate representation.

Suppose we wish to simulate a sample from this posterior. A general method for generating a sample is the reject algorithm. To construct a reject algorithm, we find a suitable p that is easily to simulate from and covers the posterior g in the sense that g(theta) <= c p(theta) for all theta. Then one simulates a variate u from a uniform(0, 1) distribution; if u < [ g(u)/c/p(u)], then we accept u as a draw from the target distribution g. Suppose we let p be a t density with mean mu, variance v, and degrees of freedom df. Here we let

MEAN=7; VAR=9; DF=3

To find the bounding constant, we plot the logarithm of the ratio, log g(theta) – log p(theta) over the range of theta.

p.density=dt(theta-MEAN,DF)/sqrt(VAR)

plot(theta,log(post/p.density),type=”l”) From inspection of the graph (and a little calculation), it appears that the log of the ratio is bounded above by -62.98.

In the R code, we draw the posterior density and the constant x proposal density that covers the posterior.

The function rejectsampling.R will implement reject sampling with a t covering density. The inputs to this function are (1) the definition of the log target density, (2) a list giving the parameters of the t covering density (mean, variance, df), (3) the value of the log of the bounding constant, (4) the number of values simulated from the proposal density, and (5) the data used in the target function. The output of this function is a vector of values from the target distribution.

tpar=list(m=MEAN,var=VAR,df=DF)

dmax=-62.98

n=10000

theta.sim=rejectsampling(cpost,tpar,dmax,n,y)

One can compute the acceptance rate of this algorithm by dividing the length of the output vector theta.sim by the number of simulated draws from the proposal. The acceptance rate for this example is about 12%. To demonstrate that this algorithm works, we draw in the following figure (1) the exact posterior in red and (2) a density estimate of about 6000 simulated draws in blue. I’m convinced that the algorithm has indeed produced a sample from the posterior distribution.