## Probit Modeling

I should have put more prior modeling in my Bayesian R book. One of the obvious advantages of the Bayesian approach is the ability to incorporate prior information. I’ll illustrate the use of informative priors in a simple setting — binary regression modeling with a probit link where one has prior information about the regression vector.

At my school, many students typically take a precalculus class that prepares them to take a business calculus class. We want our students to do well (that is, get a A or B) in the calculus class and wish to understand how the student’s performance in the precalculus class and his/her ACT math score are useful in predicting the student’s success.

Here is the probit model. If y=1 and y=0 represent respectively a student doing well and not well in the calculus class, then we model the probability that y=1 by

Prob(y=1) = Phi(beta0 + beta1 (PrecalculusGrade) + beta2 (ACT))

where Phi() is the standard normal cdf, PrecalculusGrade is 1 (0) if the student gets an A (B or C) in the precalculus class, and ACT is the math ACT score.

Suppose that the regression vector beta = (beta0, beta1, beta2) is assigned a multivariate normal prior with mean vector beta0 and precision matrix P. Then there is a simple Gibbs sampling algorithm for simulating from the posterior distribution of beta. The algorithm is based on the idea of augmenting the problem with latent continuous data from a normal distribution.

Although you might not understand the code, one can implement one iteration of Gibbs sampling by three lines of R code:

z=rtruncated(N,LO,HI,pnorm,qnorm,X%*%beta,1)

mn=solve(BI+t(X)%*%X,BIbeta0+t(X)%*%z)

beta = t(aa) %*% array(rnorm(p), c(p, 1)) + mn

Anyway, I want to focus on using this model with prior information.

1. First suppose I have a “prior dataset” of 50 students. I fit this probit model with a vague prior on beta. The inputs to the function bayes.probit.prior are (1) the vector of binary responses y, (2) the covariate matrix X, and (3) the number of iterations of the Gibbs sampoler.

fit1=bayes.probit.prior(prior.data[,1],prior.data[,-1],1000)

2. I compute the posterior mean and posterior variance-covariance matrix of the simulated draws of beta. I use these values for my multivariate normal prior on beta.

prior=list(beta=apply(fit1,2,mean),P=solve(var(fit1)))

(Note that I’m inputting the precision matrix P that is the inverse of the var-cov matrix.)

3. Using this informative prior, I fit the probit model with a new sample of 100 students.

fit2=bayes.probit.prior(DATA[,1],DATA[,-1],10000,prior=prior)

The only change in the input is that I input a list “prior” that includes the mean “beta” and the precision matrix P.

4. Now I summarize my fit to learn about the relationship of previous grade and ACT on the success of the calculus students. I define a grid of ACT scores and consider two sets of covariates corresponding to students who were not successful (0) and successful (1) in precalculs.

act=seq(15,29)

x0=cbind(1,0,act)

x1=cbind(1,1,act)

Then I use the function bprobit.probs to obtain posterior samples of the probability of success in calculus for each set of covariates.

fit.x0=bprobit.probs(x0,fit2)

fit.x1=bprobit.probs(x1,fit2)

I graph the posterior means of the fitted probabilities in the below graph. There are two lines — one corresponding to the students who aced the precalculus class, and another line corresponding to the students who did not ace precalculus.

Several things are clear from this graph. The performance in the precalc class matters — students who ace precalculus have a 30% higher chance of succeeding in the calculus class. On the other hand, the ACT score (that the student took during high school) has essentially no predictive ability. It is interesting that the slopes of the lines are negative, but these clearly isn’t significant.

## Bayesian regression

To introduce Bayesian regression modeling, we consider a dataset from De Veaux, Velleman and Bock which collected physical measurements from a sample of 250 males. One is interested in predicting a person’s body fat from his height, waist, and chest measurements.

The file Body_fat.txt contains the data that we read in R.

data=read.table(“Body_fat.txt”,sep=”\t”,header=TRUE)

names(data)

[1] “Pct.BF” “Height” “Waist” “Chest”

attach(data)

Suppose we wish to fit the regression model

Pct.BF ~ Height + Waist + Chest

The standard least-squares fit is done using the lm command in R.

fit=lm(Pct.BF~Height+Waist+Chest)

Here is a portion of the summary of the fit.

summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.06539 7.80232 0.265 0.79145

Height -0.56083 0.10940 -5.126 5.98e-07 ***

Waist 2.19976 0.16755 13.129 < 2e-16 ***

Chest -0.23376 0.08324 -2.808 0.00538 **

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.399 on 246 degrees of freedom

Multiple R-Squared: 0.7221, Adjusted R-squared: 0.7187

F-statistic: 213.1 on 3 and 246 DF, p-value: < 2.2e-16

Now let’s consider a Bayesian fit of this model. Suppose we place the usual noninformative prior on the regression vector beta and the error variance sigma^2.

g(beta, sigma^2) = 1/sigma^2.

Then there is a simple direct method (outlined in BCUR) of simulating from the posterior distribution of (beta, sigma^2).

1. We first create the design matrix X:

X=cbind(1,Height,Waist,Chest)

The response is contained in the vector Pct.BF.

2. To simulate 5000 draws from the posterior of (beta, sigma), we use the function blinreg in the LearnBayes package.

fit=blinreg(Pct.BF, X, 5000)

The output fit is a list with two components: beta is a matrix of simulated draws of beta where each column corresponds to a sample from component of beta, and sigma is a vector of draws from the marginal posterior of sigma.

We can summarize the simulated draws of beta by computing posterior means and standard deviations.

apply(fit$beta,2,mean)

X XHeight XWaist XChest

1.8748122 -0.5579671 2.2031474 -0.2350661

apply(fit$beta,2,sd)

X XHeight XWaist XChest

7.84069390 0.11050071 0.16839919 0.08286758

Here is a graph of density estimates of simulated draws from the four regression parameters.

par(mfrow=c(2,2))

for (j in 1:4) plot(density(fit$beta[,j]),main=paste(“BETA “,j),

lwd=3, col=”red”, xlab=”PAR”)

What’s so special about Bayesian regression if we are essentially replicating the frequentist regression fit?

We’ll talk about the advantages of Bayesian regression in the next blog posting.

## Conditional means prior

In an earlier post, we illustrated Bayesian fitting of a logistic model using a noninformative prior. Suppose instead that we have subjective beliefs about the regression vector. A convenient way of representing these beliefs is by use of a conditional means prior. We illustrate this for our math placement example.

First, we consider the probability p1 that a student in placement level 1 receives an A in the class. Our best guess at p1 is 0.05 and this belief is worth 200 observations — we match this info to a beta(200*0.05, 200*0.95) prior. Next, we consider the probability p5 that a student in level 5 gets an A — our guess at this probability is 0.15 and this guess is worth 200 observations. We match this belief to a beta(200*0.15, 200*0.85) prior. Assuming that our beliefs about p1 and p5 are independent, the joint prior on (p1, p5) is a product of beta densities. Transforming back to the (beta0, beta1) scale, one can show that the prior on beta is given by

g(beta) = p1^a1 (1-p1)^b1 p5^a2 (1-p5)^b2

(Note that the conditional means prior translates to the same functional form as the likelihood where the beta parameters a1, b1, a2, b2 play the role of “prior data”.)

The below figure displays (in red) a sample from the likelihood and (in blue) a sample from the prior. Here we see some conflict between the prior beliefs and the data information about the parameter.

## Fitting a logistic model

Freshman students at BGSU take a mathematics placement test and a “placement score” is used to advise the student on the proper first mathematics course. For students taking a business calculus course, we record (1) his or her placement score and (2) his or her grade in the course. There are five possible placement levels that we code as 1, 2, 3, 4, 5. Let yi denote the number of students receiving A at placement level i. We suppose that yi is binomial(ni, pi), where pi is the probability a student at level i receives an A in the class. We let the probabilities satisfy the logistic model

logit(pi) = beta0 + beta1 i.

Assuming a uniform prior for beta = (beta0, beta1), the posterior distribution is proportional to

g(beta) = product pi^yi (1-pi)^(ni-yi).

The definition of the log posterior of beta is defined in the R function logisticpost.R.

We illustrate a “brute force” method of fitting this model.

1. We first read in the data — we create three vectors y, n, and level. The matrix

data has columns level, n, and y.

> y=c(2,15,29,39,15)

> n=c(34,170,283,243,59)

> level=1:5

> data=cbind(level,n,y)

> data

level n y

[1,] 1 34 2

[2,] 2 170 15

[3,] 3 283 29

[4,] 4 243 39

[5,] 5 59 15

2. We illustrate the usual MLE fit using the R function glm. The MLE will be helpful in finding where the posterior is located.

> response=cbind(y,n-y)

> glm(response~level,family=binomial)

Coefficients:

(Intercept) level

-3.328 0.423

3. After some trial and error, we find a rectangle where the posterior is concentrated. The function mycontour is used to draw a contour plot.

> mycontour(logisticpost,c(-5,-1.5,-.2,1),data)

4. The function simcontour is used to simulate draws from the posterior computed on this grid. We plot the simulated draws on top of the scatterplot.

> s=simcontour(logisticpost,c(-5,-1.5,-.2,1),data,1000)

> points(s$x,s$y)

> title(xlab=”BETA0″,ylab=”BETA1″)