Home > Hierarchical modeling > Robust modeling

Robust modeling

The Phillies are still alive in the World Series and so I’m allowed to continue to talk about baseball.

In the situation where we observe a sample of measurement data y_1, ..., y_n, a common assumption to assume that they are a random sample from a normal population with mean \mu and standard deviation \sigma.  But sometimes this assumption is inappropriate.  One measure of hitting effectiveness is the OBP, the proportion of time that a hitter gets on base.  I collected the OBPs (actually 1000 x OBP) for a group of 24 players who played for the west coast NL teams (Dodgers, Giants, and Padres) in the 2004 season.  Here’s a jittered dotplot of the OBPs:


Clearly, a normal population is inappropriate here since we have one obvious outlier — Barry Bonds in that season had an usually large OBP of 609.

In many situations, it makes more sense to assume that y_1, ..., y_n are distributed from a “flat-tailed” distribution where outliers are more likely to occur.  One example of a flat-tailed distribution is the t form that depends on a location \mu, a scale \sigma and a degrees of freedom \nu.  Here we assume a small value of the degrees of freedom and focus on estimating (\mu, \sigma).

This is an attractive way of fitting a t sampling model.  First, a t distribution can represented as a scale mixtures of normals.  Suppose we let y be distributed normal with mean \mu and variance \sigma^2/\lambda, and then we let the parameter \lambda be distributed Gamma with shape \nu/2 and rate \nu/2.  When you integrate out \lambda, one sees this is equivalent to assuming y is N(\mu, \sigma^2).

Suppose we use this scale mixture representation for each observation.  We can write this as a hierarchical model:

1.  y_i is distributed N(\mu, \sigma^2/\lambda_i)

2. \lambda_i is distributed Gamma (\nu/2,\nu/2)

3. (\mu, \sigma^2) has the prior 1/\sigma^2.

In class, we’ll show that can sample from this model easily using Gibbs sampling.  This algorithm is implemented using the function robustt in the LearnBayes package.

The posteriors of the extra scale parameters \lambda_i are of interest, since they reflect the influence of the ith observation on the estimation of the mean and variance.  Below I display error bars that show 90% interval estimates for each of the \lambda_i for my OBP example.  Note that the posterior for the scale parameter for Barry Bonds is concentrated near 0.  This is reasonable, since it is desirable to downweight the influence of Bonds OBP in the estimation of, say, the average OBP.


Categories: Hierarchical modeling
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: