Bayesian regression — part I
To illustrate Bayesian regression, I’ll use an example from Simon Sheather’s new book on linear regression. A restaurant guide collects a number of variables from a group of Italian restaurants in New York City. One is interested in predicting the Price of a meal based on Food (measure of quality), Decor (measure of decor of the restaurant), Service (measure of quality of service), and East (dummy variable indicating if the restaurant is east or west of 5th Avenue).
Here is the dataset in csv format: http://www.stat.tamu.edu/~sheather/book/docs/datasets/nyc.csv
Assuming the usual regression model
,
,
suppose we assign the following prior to : we assume independence with
and
. Here
is the precision matrix, the inverse of the variance-covariance matrix.
How do we obtain values of the parameters ? We’ll use a small part of the data to construct our prior, and then use the remaining part of the data to do inference and eventually do some model selection.
First, I read in the data, select a random sample of rows, and partition the data into the two datasets italian1 and italian2.
italian=read.csv(“http://www.stat.tamu.edu/~sheather/book/docs/datasets/nyc.csv”)
indices=sample(168,size=25)
italian1=italian[indices,]
italian2=italian[-indices,]
y=italian1$Price
X=with(italian1,cbind(1,Food,Decor,Service,East))
fit=blinreg(y,X,5000)
Sigma0=cov(fit$beta)
B0=solve(Sigma0)
c0=(25-5)
d0=sum((y-X%*%beta0)^2)
data = italian2, burnin = 0, mcmc = 10000,
thin = 1, verbose = 0, seed = NA, beta.start = NA,
b0 = beta0, B0 = B0, c0 = c0, d0 = d0,
marginal.likelihood = c(“Chib95″))
beta=fit2[,1:5]
sigma=sqrt(fit2[,6])
lp=c(beta%*%x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
43.59 46.44 47.00 46.99 47.53 50.17
summary(ys)
Min. 1st Qu. Median Mean 3rd Qu. Max.
25.10 43.12 46.90 46.93 50.75 67.82
Filed under: Uncategorized | Leave a Comment
Are birth weights normal?
A few weeks ago, I asked my class for an example of data that would be normally distributed and someone mentioned weights of people.
Are weights, more specifically birthweights of newborns normally distributed?
Peter Dunn in a JSE article here analyzes statistics for 44 babies born in a single day in an Australian hospital. The article has a link to the dataset babyboom.dat that contains the data.
If we construct a histogram of the birthweights, we get the following display.
We see some left skewness in this distribution — does this indicate that our normality assumption is false?
One can check the normality assumption by means of the posterior predictive distribution. Here’s an outline of the approach. (There are special functions in the LearnBayes package that make this process simple to implement.)
1. First we fit the normal() model using the traditional noninformative prior proportional to
. A simulated sample from the posterior is found using the function normpostsim.
fit=normpostsim(birth.weight,10000)
2. Next we think of an appropriate testing function. Since the birth weights don’t appear symmetric, a natural checking function is the sample skewness. We write a short function “skew” to compute this statistic.
sum((y-mean(y))^3)/sd(y)^3/(length(y)-1)

Filed under: Uncategorized | Leave a Comment
Here is another illustration of Bayes factors. We are worried about the possibility of outliers and so we assume the observations are distributed from a t distribution with location
, scale
, and degrees of freedom
. We have some prior beliefs about the location and scale: we assume
independent with
and
.
We observe the following data:
112.6,86.0,99.1,106.3,101.5,80.5,99.6,108.6,95.9)
{
mu=theta[1]
sigma=exp(theta[2])
y=stuff$y
df=stuff$df
loglike=sum(dt((y-mu)/sigma, df, log=TRUE)-log(sigma))
logprior=dnorm(mu,100,10,log=TRUE)+dnorm(log(sigma),0,2, log=TRUE)
loglike+logprior
}
Filed under: Uncategorized | Leave a Comment
One problem with a p-value is that it is hard to interpret if you don’t know the sample size. Andy Gelman talks about this problem on his blog.
Here I’ll contrast the message communicated by a p-value and a Bayesian test by looking at the following three tables:
Table A: [32 18; 18 32]
Table B: [272 228; 228,272]
Table C: [2570 2430; 2430,2570]
[1] 0.9436246
[1] -0.3035413
[1] -1.411366
Filed under: Uncategorized | Leave a Comment
Here is today’s example illustrating a Bayesian test of a point null hypothesis. In Agresti and Franklin’s intro stats book, there is an example wondering if Americans work, on average, 40 hours per week. If represents the mean hours worked per week for all Americans, we are interested in testing
against the alternative
.
In the example in Agresti and Franklin’s text, a sample of 868 workers were selected and one observes that the workers worked an average of 39.11 hours and one assumes the population standard deviation is equal to . The test statistic is
and one computes a p-value of 0.061. There seems to be moderately significant evidence that the mean working hours for all workers is different from 40.
Here’s an outline of a Bayesian test (I give more details in class):
1. We assign probabilities to the hypotheses and
. Here we assume each hypothesis is equally likely.
2. Under the hypothesis , the prior for
is a point mass at the hypotheized value of 40.
3. Under the alternative hypothesis, we’ll place a N(40, ) prior on
. This reflects the belief that, under the alternative, values of
near 40 are more likely that values far from 40.
Under these assumptions, we give (in class) the posterior probability of the hypothesis . This probability is programmed in the function mnormt.twosided in the LearnBayes package. The inputs to this function are (1) the value to be tested, (2) the prior probability of
, (3) a vector of values of
and (4) a data vector consisting of
. The outputs are the Bayes factor in support of
and the posterior probability of
.
Since it is not clear how to specify the prior standard deviation — we compute the posterior probability for a range of values of
. I write a short function to output the posterior probability as a function of
and then I use the curve function to plot this probability for values of
from 0.01 to 2.
What we see is that the posterior probability of exceeds 0.35 for all values of
. This suggests that the p-value overstates the evidence against the null hypothesis.
> data=c(39.11, 868, 14)
> prob.H=function(tau)
+ mnormt.twosided(40,.5,tau,data)$post
> curve(prob.H(x),from=.01,to=2,xlab=”TAU”,ylab=”P(H)”)
Filed under: Uncategorized | Leave a Comment
Using the sapply function in R
In R, I think one of the most useful functions is sapply. By using this function, one can avoid writing a loop. Let me illustrate using this function for the binomial/beta exchangeable model.
The model is
1. distributed binomial(
)
2. are iid beta(
)
3. have a vague prior
The log posterior of is programmed in the function betabinexch in the LearnBayes package.
Suppose we wish to obtain a 90% interval estimate for . To simulate from the posterior, we first simulate a value of
from its posterior and then simulate
from the posterior beta(
Let’s consider the famous Efron and Morris baseball data — is the number of hits in
at-bats for the
th player.
I’ll read in the data, get a normal approximation to the posterior (function laplace), and then do a random walk Metropolis chain — I obtain a simulated sample from .
n=rep(45,18)
data=as.matrix(cbind(y,n))
fit=laplace(betabinexch,c(-1,4),data)
mcmc.fit=rwmetrop(betabinexch,list(var=fit$var,scale=2),c(-1,4),10000,data)
{
K=exp(mcmc.fit$par[,2])
eta=exp(mcmc.fit$par[,1])/(1+exp(mcmc.fit$par[,1]))
quantile(thetaj,c(.05,.95))
}
5% 95%
0.2052578 0.3288738
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
Filed under: Uncategorized | Leave a Comment
Robust modeling
The Phillies are still alive in the World Series and so I’m allowed to continue to talk about baseball.
In the situation where we observe a sample of measurement data , a common assumption to assume that they are a random sample from a normal population with mean
and standard deviation
. But sometimes this assumption is inappropriate. One measure of hitting effectiveness is the OBP, the proportion of time that a hitter gets on base. I collected the OBPs (actually 1000 x OBP) for a group of 24 players who played for the west coast NL teams (Dodgers, Giants, and Padres) in the 2004 season. Here’s a jittered dotplot of the OBPs:

Clearly, a normal population is inappropriate here since we have one obvious outlier — Barry Bonds in that season had an usually large OBP of 609.
In many situations, it makes more sense to assume that are distributed from a “flat-tailed” distribution where outliers are more likely to occur. One example of a flat-tailed distribution is the t form that depends on a location
, a scale
and a degrees of freedom
. Here we assume a small value of the degrees of freedom and focus on estimating
.
This is an attractive way of fitting a t sampling model. First, a t distribution can represented as a scale mixtures of normals. Suppose we let be distributed normal with mean
and variance
, and then we let the parameter
be distributed Gamma with shape
and rate
. When you integrate out
, one sees this is equivalent to assuming
is N(
.
Suppose we use this scale mixture representation for each observation. We can write this as a hierarchical model:
1. is distributed N(
)
2. is distributed Gamma (
)
3. has the prior
.
In class, we’ll show that can sample from this model easily using Gibbs sampling. This algorithm is implemented using the function in the LearnBayes package.
The posteriors of the extra scale parameters are of interest, since they reflect the influence of the
th observation on the estimation of the mean and variance. Below I display error bars that show 90% interval estimates for each of the
for my OBP example. Note that the posterior for the scale parameter for Barry Bonds is concentrated near 0. This is reasonable, since it is desirable to downweight the influence of Bonds OBP in the estimation of, say, the average OBP.

Filed under: Uncategorized | Leave a Comment
Home run hitting — priors?
At our department’s colloquium last week, there was an interesting presentation on the use of an exchangeable prior in a regression context. The analysis was done using WinBUGS using some standard choices of “weekly informative” priors. That raises the natural question — how did you decide on this prior?
Let’s return to the home run hitting example where an exchangeable prior was placed on the set of home run hitting probabilities. We assumed that were iid
and then the hyperparameters
and
were assumed independent where
was assigned the prior
and
was assigned the log logistic form
.
When I described these priors, I just said that they were “noninformative” and didn’t explain how I obtained them.
The main questions are:
1. Does the choice of vague prior matter?
2. If the answer to question 1 is “yes”, what should we do?
When one fits an exchangeable model, one is shrinking the observed rates towards a common value. The posterior of
tells us about the common value and the posterior of
tells us about the degree of shrinkage. The choice of vague prior for
is not a concern — we can assume
is uniform or distributed according to the improper prior
. However, the prior on
can be a concern. If one assigns the improper form
, it can be shown that the posterior may be improper. So in general one needs to assign a proper prior to
. One convenient choice is to assume that
has a logistic density with location
and scale
. In my example, I assumed that
.
Since I was arbitrary in my choice of parameters for this logistic prior on , I tried different choices for
and
. I tried values of
between -4 and 4 and values of
between 0.2 and 10. What did I find? The posterior mode of
stayed in the 4.56 – 4.61 range for all of the priors I considered. In this example, we have data from 605 players and clearly the likelihood is driving the inference.
What if the choice of and
does matter? Then one has to think more carefully about these parameters. In this baseball context, I am pretty familiar with home run hitting rates. Based on this knowledge, I have to figure out a reasonable guess at the variability of the home run probabilities that would give me information about
.
Filed under: Uncategorized | Leave a Comment
Home run rates — prediction
Last post, we illustrated fitting an exchangeable model on home run hitting probabilities for all players (non pitchers) in the 2007 season.
But actually, teams aren’t really interested in estimating hitting probabilities. They are primarily interested in predicting a player’s home run performance the following 2008 baseball season.
We already have the posterior distribution for the 2007 home run probabilities. Assuming that the probabilities stay the same for the following season and that the players will have the same number of at-bats, we can easily predict their 2008 home run counts from the posterior predictive distribution.
For each player, I simulated the predictive distribution of the number of home runs in
at-bats in the 2008 season. I summarize each distribution by its mode — this is the most probable value of
from the predictive distribution.
In the first graph, for all players I plot the number of 2007 home runs (horizontal) against the predicted number of 2008 home runs – the number of 2007 home runs (vertical). Given the 2007 home run count, I’m interested in my prediction of the change to the following season. (I jittered the points so you can see the individual observations.) For the weak home run hitters (on the left side of the graph), I predict they will perform a little worse or better the following season. But for the heavy hitters, I predict they will hit 5-7 home runs worse in the 2008 season. Generally, there is a negative slope in the graph that illustrates the “regression to the mean” effect.
Since we have the actual 2008 home run counts, we can actually compare the 2007 and 2008 counts to see if there is a similar effect. In the following graph, I plot the 2007 counts against the change (2008 count – 2007 count). What do we see? There is a negative correlation as we suspected. Hitters who did well in 2007 tended to get worse in 2008, and a number of light hitters in 2007 tended to do better in 2008. The biggest difference between the actual data and our predictions is the magnitude of the changes. Three of the top hitters, for example, hit 15-20 home runs in 2008.


Filed under: Uncategorized | Leave a Comment

Continuing our home run hitting example, we observe home run counts where
is distributed
. Suppose we assign the home run probabilities the following exchangeable model.
1. distributed
2. independent,
distributed
distributed
.
We described the computing strategy in class to summarize the posterior distribution of . We find
1. For , the posterior median is 0.0283 and a 90% interval estimate is (.0268, .0296).
2. For , the posterior median is 99 and a 90% interval estimate is (85, 116).
We find posterior means for the home run probabilities . We plot the posterior means against the square root of the at-bats for the 605 non-pitchers.
This is a very different graph pattern compared to the figure plotting sqrt(AB) against the observed home run rates. Several things to note.
1. For the players with few AB, their home run rates are strongly shrunk towards the overall home run rate. This is reasonable — we have little information about these players’ true home run hitting abilities.
2. Who are the best home run hitters? It is clear from the figure that four hitters stand out — they are the ones in the upper right corner of the graph that I’ve labeled with the number of home runs. The best home run hitters, that is the ones with the highest estimated home run probabilities, are the two player with 46 and 54 at-bats.
3. I’ve drawn a smoothing curve to indicate the main pattern in the graph. Note that it’s an increasing curve — this means that players with more AB tend to have higher home run probabilities. This makes sense — the better hitters tend to get more playing time and more at-bats.
Filed under: Uncategorized | Leave a Comment
Search
-
Blogroll
Recent Entries
- Bayesian regression — part I
- Are birth weights normal?
- A “robustness” illustration of Bayes factors
- A Bayesian test of the equality of proportions
- Bayesian testing of a point null hypothesis
- Using the sapply function in R
- Robust modeling
- Home run hitting — priors?
- Home run rates — prediction
- Home run rates — fitting an exchangeable model
- Estimating home run rates
Categories
- Uncategorized (35)