Archive for the ‘Priors’ Category

Graphical elicitation of a beta prior

December 6, 2011 Leave a comment

My students are doing a project where they have to elicit a beta prior for a proportion.  This is a difficult task and any tools to make this process a little easier are helpful.

I’ve been playing with a R package manipulate that is included as part of the RStudio installation (for those who don’t know, RStudio is a nice interface for R).  This package makes it easy to construct sliders, push buttons, and pull-down menus to provide a nice user interface for functions one creates.

Here is a simple illustration of the use of the manipulate package.  One assesses the shape parameters of the beta indirectly by specifying (1) the prior median and (2) the prior 90th percentile p.90.   One inputs values of and p.90 by sliders and you see the beta density curve that matches this information.  I wrote a simple R script that constructs the graph with sliders — the main tools are the manipulate package and the LearnBayes package (the function).

Here is the script and here is a movie that shows the sliders in action.



Categories: Priors

Home run hitting — priors?

November 1, 2009 Leave a comment

At our department’s colloquium last week, there was an interesting presentation on the use of an exchangeable prior in a regression context.  The analysis was done using WinBUGS using some standard choices of “weekly informative” priors.  That raises the natural question — how did you decide on this prior?

Let’s return to the home run hitting example where an exchangeable prior was placed on the set of home run hitting probabilities.  We assumed that p_1, ..., p_k were iid B(\eta, K) and then the hyperparameters \eta and K were assumed independent where \eta was assigned the prior \eta^{-1}(1-\eta)^{-1} and K was assigned the log logistic form g(K) = 1/(K+1)^2.

When I described these priors, I just said that they were “noninformative” and didn’t explain how I obtained them.

The main questions are:

1.  Does the choice of vague prior matter?
2.  If the answer to question 1 is “yes”, what should we do?

When one fits an exchangeable model, one is shrinking the observed rates y_i/n_i towards a common value.  The posterior of \eta tells us about the common value and the posterior of K tells us about the degree of shrinkage.  The choice of vague prior for \eta is not a concern — we can assume \eta is uniform or distributed according to the improper prior \eta^{-1}(1-\eta)^{-1}.   However, the prior on K can be a concern.  If one assigns the improper form g(K) = 1/K, it can be shown that the posterior may be improper.  So in general one needs to assign a proper prior to K.  One convenient choice is to assume that \log K has a logistic density with location \mu and scale \tau.  In my example, I assumed that \mu = 0, \tau = 1.

Since I was arbitrary in my choice of parameters for this logistic prior on \log K, I tried different choices for \mu and \tau.  I tried values of \mu between -4 and 4 and values of \tau between 0.2 and 10.  What did I find?  The posterior mode of \log K stayed in the 4.56 – 4.61 range for all of the priors I considered.  In this example, we have data from 605 players and clearly the likelihood is driving the inference.

What if the choice of \mu and \tau does matter?  Then one has to think more carefully about these parameters.  In this baseball context, I am pretty familiar with home run hitting rates.  Based on this knowledge, I have to figure out a reasonable guess at the variability of the home run probabilities that would give me information about K.


Constructing a prior in a 2 x 2 table

September 26, 2009 Leave a comment
& (0,21] & (21,100] \\
(0,3] & 711 & 310 \\
(3,10] & 783 & 745 \\

At BGSU, all freshmen take a mathematics placement test and the results of the test, together with some other information, is used to advise students on the appropriate first math class to take.

The other information consists of the student’s score on the ACT Math exam and his/her high school grade point average, or GPA.  We are interested in examining the relationship between the GPA and the ACT Math score.

For the student taking Form B of the math placement exam, 63% of the students have a GPA over 3.0.    Suppose you categorize these students into two groups, those who have scored  21 or less on the ACT exam, and those who have scored over 21 on the ACT.

GPA <= 3.0  GPA > 3.0
ACT <=21      p1                  1 – p1
ACT > 21        p2                  1 – p2

How do we construct a prior on the proportions p1 and p2?    Let’s step through the process.

1.  Here it doesn’t make any sense to assign p1 and p2 independent distributions.  If I thought that if student probability of getting a low GPA is 0.37 and then you told me that the student scored low on the ACT exam, then I would think it is more likely that the student gets a low GPA.  That is, I would think the student’s chance of getting a  GPA <= 3.0 would be higher than 0.37.  Clearly, p1 and p2 should have a dependent prior.

2.  A popular way of expressing the relationship between p1 and p2 is by use of a log odds-ratio.  Let T1 = log (p1/(1-p1)) be the log odds of  p1, and let T2 = log(p2/(1-p2)) be the log odds of p2.  A log odds converts a proportion (between 0 and 1) to a real-valued quantity.  A proportion of 0.5 corresponds to a log odds of 0, a probability less than 0.5 corresponds to a negative odds ratio, and so on.  Then one compares the proportions by means of the difference in log odds, or the log odds ratio

\theta_1 = \log \frac{p_1}{1-p_1} - \log \frac{p_2}{1-p_2} = \log \frac{p_1/(1-p_1)}{p_2/(1-p_2)}.

3.  Suppose we transform the proportions p_1 and p_2 to the log odds ratio \theta_1 and the sum of log odds (or logits)

\theta_2= \log \frac{p_1}{1-p_1} + \log \frac{p_2}{1-p_2} ,

and place independent priors on \theta_1 and \theta_2.

4.  Let’s illustrate using this prior.  I assume \theta_1 and \theta_2 are independent with

\theta_1 \sim N(0.5, 0.3), \theta_2 \sim N(0, 2).

The prior says that (1) I don’t know much about the sizes of the proportions (a vague prior placed on \theta_2), but (2) I believe that a larger proportion of the “low ACT” students will have low GPAs than the “high ACT students” (an informative prior placed on \theta_1).

5.  This prior induces a dependent prior on p_1 and p_2.  This is easy to illustrate using R.  First I’ll simulate 1000 draws from the posterior of \theta_1 and \theta_2 and transform these values back to the proportions p_1 and p_2.  I construct a scatterplot of (p_1, p_2) and put the line p_2 = p_1 on top.  This clearly shows the dependence in the prior and it reflects the belief that p_1 > p_2.

logitp1 = (theta1 + theta2)/2
logitp2 = (theta2 – theta1)/2

Categories: Priors

Constructing a prior for a Poisson mean

September 19, 2009 Leave a comment

I am interested in learning about my son’s cell phone use.   Suppose the mean number of text messages that he sends and receives per day is equal to \lambda.   I will observe the number of text messages y_1, ..., y_n for n days — we’ll assume that conditional on \lambda, y_1, ...., y_n are independent Poisson(\lambda).

How do I construct a prior density on \lambda?   Suppose I model my beliefs by use of a gamma density with shape parameter a and rate parameter b.  I want to figure out the values of the prior parameters a and b.

There are two distinct ways of assessing your prior.  One can think about plausible values of the mean number of daily text messages \lambda.  Alternatively, it may be easier to think about the actual number of text messages y — if we assume a gamma prior, then one can show that the predictive density of y is given by

f(y) = \frac{\Gamma(a+y)}{\Gamma(a) y!} \frac{b^a}{(b+1)^{a+y}}, y= 0, 1, 2, ...

Personally, I think it is easier to think about the number of text messages y.  My best prediction at y is 10.  After some additional thought, I decide that my standard deviation for y is equal to 3.5.    One can show that the mean and standard deviation of the predictive density are given by

E(y) = \frac{a}{b}, SD(y) = \sqrt{\frac{a}{b} + \frac{a}{b^2}}

If one matches my guesses with these expressions, one can show a = 44.4, b = 4.4.

To see if these values make any sense, I plot my predictive density.  This density is a special case of a negative binomial density where (using the R notation)

size = a,  prob = b /(b + 1).

I graph using the following R commands.  One can compute P(5 <= y <= 15) = 0.89 which means that on 89% of the days, I believe Steven will send between 5 and 15 messages.

a = 44.4; b = 4.4
xlab=”y”, ylab=”Predictive Prob”, col=”red”, lwd=2)

a = 44.4; b = 4.4
xlab=”y”, ylab=”Predictive Prob”, col=”red”, lwd=2)


Categories: Priors, Single parameter

Probit Modeling

December 1, 2007 Leave a comment

I should have put more prior modeling in my Bayesian R book. One of the obvious advantages of the Bayesian approach is the ability to incorporate prior information. I’ll illustrate the use of informative priors in a simple setting — binary regression modeling with a probit link where one has prior information about the regression vector.

At my school, many students typically take a precalculus class that prepares them to take a business calculus class. We want our students to do well (that is, get a A or B) in the calculus class and wish to understand how the student’s performance in the precalculus class and his/her ACT math score are useful in predicting the student’s success.

Here is the probit model. If y=1 and y=0 represent respectively a student doing well and not well in the calculus class, then we model the probability that y=1 by

Prob(y=1) = Phi(beta0 + beta1 (PrecalculusGrade) + beta2 (ACT))

where Phi() is the standard normal cdf, PrecalculusGrade is 1 (0) if the student gets an A (B or C) in the precalculus class, and ACT is the math ACT score.

Suppose that the regression vector beta = (beta0, beta1, beta2) is assigned a multivariate normal prior with mean vector beta0 and precision matrix P. Then there is a simple Gibbs sampling algorithm for simulating from the posterior distribution of beta. The algorithm is based on the idea of augmenting the problem with latent continuous data from a normal distribution.

Although you might not understand the code, one can implement one iteration of Gibbs sampling by three lines of R code:

beta = t(aa) %*% array(rnorm(p), c(p, 1)) + mn

Anyway, I want to focus on using this model with prior information.

1. First suppose I have a “prior dataset” of 50 students. I fit this probit model with a vague prior on beta. The inputs to the function bayes.probit.prior are (1) the vector of binary responses y, (2) the covariate matrix X, and (3) the number of iterations of the Gibbs sampoler.


2. I compute the posterior mean and posterior variance-covariance matrix of the simulated draws of beta. I use these values for my multivariate normal prior on beta.


(Note that I’m inputting the precision matrix P that is the inverse of the var-cov matrix.)

3. Using this informative prior, I fit the probit model with a new sample of 100 students.


The only change in the input is that I input a list “prior” that includes the mean “beta” and the precision matrix P.

4. Now I summarize my fit to learn about the relationship of previous grade and ACT on the success of the calculus students. I define a grid of ACT scores and consider two sets of covariates corresponding to students who were not successful (0) and successful (1) in precalculs.


Then I use the function bprobit.probs to obtain posterior samples of the probability of success in calculus for each set of covariates.


I graph the posterior means of the fitted probabilities in the below graph. There are two lines — one corresponding to the students who aced the precalculus class, and another line corresponding to the students who did not ace precalculus.

Several things are clear from this graph. The performance in the precalc class matters — students who ace precalculus have a 30% higher chance of succeeding in the calculus class. On the other hand, the ACT score (that the student took during high school) has essentially no predictive ability. It is interesting that the slopes of the lines are negative, but these clearly isn’t significant.

Categories: Priors, Regression

Conditional means prior

September 23, 2007 Leave a comment

In an earlier post, we illustrated Bayesian fitting of a logistic model using a noninformative prior. Suppose instead that we have subjective beliefs about the regression vector. A convenient way of representing these beliefs is by use of a conditional means prior. We illustrate this for our math placement example.

First, we consider the probability p1 that a student in placement level 1 receives an A in the class. Our best guess at p1 is 0.05 and this belief is worth 200 observations — we match this info to a beta(200*0.05, 200*0.95) prior. Next, we consider the probability p5 that a student in level 5 gets an A — our guess at this probability is 0.15 and this guess is worth 200 observations. We match this belief to a beta(200*0.15, 200*0.85) prior. Assuming that our beliefs about p1 and p5 are independent, the joint prior on (p1, p5) is a product of beta densities. Transforming back to the (beta0, beta1) scale, one can show that the prior on beta is given by

g(beta) = p1^a1 (1-p1)^b1 p5^a2 (1-p5)^b2

(Note that the conditional means prior translates to the same functional form as the likelihood where the beta parameters a1, b1, a2, b2 play the role of “prior data”.)

The below figure displays (in red) a sample from the likelihood and (in blue) a sample from the prior. Here we see some conflict between the prior beliefs and the data information about the parameter.

Categories: Priors, Regression