A few weeks ago, I asked my class for an example of data that would be normally distributed and someone mentioned weights of people.

Are weights, more specifically birthweights of newborns normally distributed?

Peter Dunn in a JSE article here analyzes statistics for 44 babies born in a single day in an Australian hospital. The article has a link to the dataset babyboom.dat that contains the data.

If we construct a histogram of the birthweights, we get the following display.

We see some left skewness in this distribution — does this indicate that our normality assumption is false?

One can check the normality assumption by means of the posterior predictive distribution. Here’s an outline of the approach. (There are special functions in the LearnBayes package that make this process simple to implement.)

1. First we fit the normal() model using the traditional noninformative prior proportional to . A simulated sample from the posterior is found using the function normpostsim.

fit=normpostsim(birth.weight,10000)

2. Next we think of an appropriate testing function. Since the birth weights don’t appear symmetric, a natural checking function is the sample skewness. We write a short function “skew” to compute this statistic.

skew=function(y)

sum((y-mean(y))^3)/sd(y)^3/(length(y)-1)

3. Last, we simulate draws from the posterior predictive distribution of the sample skewness. This is implemented using the function normpostpred — the arguments are the list of simulated draws of the posterior (obtained by normpostsim), the size of the future sample (we’ll use the same sample size as that for the observed sample), and the function defining the testing function. The vector of simulated draws from the testing function are stored in SKEW.

SKEW=normpostpred(fit,length(birth.weight),skew)

4. I plot a histogram of the predictive distribution of the skew statistic and overlay a vertical line showing the skew of the observed data.

hist(SKEW); abline(v=skew(birth.weight),col=”red”,lwd=3)

Clearly the skew in the data is not representative of the skew values predicted from the fitted model. The normal population assumption appears inappropriate.