Home > Hierarchical modeling, Priors > Home run hitting — priors?

## Home run hitting — priors?

At our department’s colloquium last week, there was an interesting presentation on the use of an exchangeable prior in a regression context.  The analysis was done using WinBUGS using some standard choices of “weekly informative” priors.  That raises the natural question — how did you decide on this prior?

Let’s return to the home run hitting example where an exchangeable prior was placed on the set of home run hitting probabilities.  We assumed that $p_1, ..., p_k$ were iid $B(\eta, K)$ and then the hyperparameters $\eta$ and $K$ were assumed independent where $\eta$ was assigned the prior $\eta^{-1}(1-\eta)^{-1}$ and $K$ was assigned the log logistic form $g(K) = 1/(K+1)^2$.

When I described these priors, I just said that they were “noninformative” and didn’t explain how I obtained them.

The main questions are:

1.  Does the choice of vague prior matter?
2.  If the answer to question 1 is “yes”, what should we do?

When one fits an exchangeable model, one is shrinking the observed rates $y_i/n_i$ towards a common value.  The posterior of $\eta$ tells us about the common value and the posterior of $K$ tells us about the degree of shrinkage.  The choice of vague prior for $\eta$ is not a concern — we can assume $\eta$ is uniform or distributed according to the improper prior $\eta^{-1}(1-\eta)^{-1}$.   However, the prior on $K$ can be a concern.  If one assigns the improper form $g(K) = 1/K$, it can be shown that the posterior may be improper.  So in general one needs to assign a proper prior to $K$.  One convenient choice is to assume that $\log K$ has a logistic density with location $\mu$ and scale $\tau$.  In my example, I assumed that $\mu = 0, \tau = 1$.

Since I was arbitrary in my choice of parameters for this logistic prior on $\log K$, I tried different choices for $\mu$ and $\tau$.  I tried values of $\mu$ between -4 and 4 and values of $\tau$ between 0.2 and 10.  What did I find?  The posterior mode of $\log K$ stayed in the 4.56 – 4.61 range for all of the priors I considered.  In this example, we have data from 605 players and clearly the likelihood is driving the inference.

What if the choice of $\mu$ and $\tau$ does matter?  Then one has to think more carefully about these parameters.  In this baseball context, I am pretty familiar with home run hitting rates.  Based on this knowledge, I have to figure out a reasonable guess at the variability of the home run probabilities that would give me information about $K$.