Archive for the ‘Model checking & comparison’ Category

Three Sampling Models for Fire Calls

October 31, 2011 Leave a comment

Continuing our example, suppose our prior beliefs about the mean count of fire calls \theta is Gamma(280, 4).  (Essentially this says that our prior guess at \theta is 70 and the prior standard deviation is about 4.2.)  But we’re unsure about the sampling model — it could be (M1) exponential(\theta), (M2) Poisson(\theta), or (M3) normal with mean \theta and standard deviation 12.

To get some sense about the best sampling model, I’ve plotted a histogram of the fire call counts below.  I’ve overlaid fitted exponential, Poisson, and normal distributions where I estimate \theta by the sample mean.

I think it is pretty clear from the graph that the exponential model is a poor fit.  The fits of the Poisson and normal (sd = 12) models are similar, although I’d give a slight preference to the normal model.

For each model, I compute the logarithm of the predictive probability f(y).   (I wrote a short function defining the log posterior of \theta and then used the laplace function in the LearnBayes package to compute \log f(y).)

Here are the results:

Model     \log f(y)
Poisson              -148.0368
exponential        -194.3483
normal(12)        -144.3027

The exponential model is a clear loser and the Poisson and normal(12) models are close.  The Bayes factor in support of the normal(12) model is

BF_{NP} = \exp( -144.3037 + 148.0368 ) = 41.8



Modeling fire alarm counts

October 30, 2011 Leave a comment

Sorry for the long delay since my last post.  It seems that one needs to keep a regular posting pattern, say several days a week. to keep this blog active.

We’re now talking about comparing models by Bayes factors.  To motivate the discussion of plausible models, I found the following website that gives the number of fire calls for each month in Franklinville, NC for the last several years.

Suppose we observe the fire call counts y_1, ..., y_N for N consecutive months.  Here is a general model for these data.

  1. y_1, ..., y_N are independent f(y | \theta)
  2. \theta has a prior g(\theta)
Also, suppose we have some prior beliefs about the mean fire count E(y).  We believe that this mean is about 70 and the standard deviation of this guess is 10.
Given this general model structure, we have to think of possible choices for f, the sampling density.  We think of the popular distributions, say Poisson, normal, exponential, etc.  Also we should think about different choices for the prior density.   For the prior, there are many possible choices — we typically choose one that can represent my prior information.
Once we decide on several plausible choices of sampling density and prior, then we’ll compare the models by Bayes factors.  To do this, we compute the prior predictive density of the actual data for each possible model.  It is very convenient to perform this calculation for discrete models (where the parameter \theta is discrete) and we’ll first illustrate Bayes factor computations in this special case

Are birth weights normal?

November 16, 2009 Leave a comment

A few weeks ago, I asked my class for an example of data that would be normally distributed and someone mentioned weights of people.

Are weights, more specifically birthweights of newborns normally distributed?

Peter Dunn in a JSE article here analyzes statistics for 44 babies born in a single day in an Australian hospital.  The article has a link to the dataset babyboom.dat that contains the data.

If we construct a histogram of the birthweights, we get the following display.

babyWe see some left skewness in this distribution — does this indicate that our normality assumption is false?

One can check the normality assumption by means of the posterior predictive distribution.  Here’s an outline of the approach.  (There are special functions in the LearnBayes package that make this process simple to implement.)

1.  First we fit the normal(\mu, \sigma^2)) model using the traditional noninformative prior proportional to 1/\sigma^2.  A simulated sample from the posterior is found using the function normpostsim.


2.  Next we think of an appropriate testing function.  Since the birth weights don’t appear symmetric, a natural checking function is the sample skewness.  We write a short function “skew” to compute this statistic.

3.  Last, we simulate draws from the posterior predictive distribution of the sample skewness.  This is implemented using the function normpostpred — the arguments are the list of simulated draws of the posterior (obtained by normpostsim), the size of the future sample (we’ll use the same sample size as that for the observed sample), and the function defining the testing function.  The vector of simulated draws from the testing function are stored in SKEW.
4.  I plot a histogram of the predictive distribution of the skew statistic and overlay a vertical line showing the skew of the observed data.
hist(SKEW); abline(v=skew(birth.weight),col=”red”,lwd=3)
Clearly the skew in the data is not representative of the skew values predicted from the fitted model.   The normal population assumption appears inappropriate.

A “robustness” illustration of Bayes factors

November 16, 2009 Leave a comment

Here is another illustration of Bayes factors.   We are worried about the possibility of outliers and so we assume the observations y_1, ..., y_n are distributed from a t distribution with location \mu, scale \sigma, and degrees of freedom df.  We have some prior beliefs about the location and scale:  we assume \mu, \sigma independent with \mu \sim N(100, 10) and \log \sigma \sim N(0, 2).

We observe the following data:

We define a model M_{df} that defines the degrees of freedom parameter to be a specific value.
We can use Bayes factors to decide on appropriate values of df.
I define a function tlikepost defining the log posterior of \theta = (\mu, \log \sigma).  Here the argument stuff is a list with two components, the data vector y and the degrees of freedom value df.  Here the sampling density of y_i is given by
f(y_i) = \frac{1}{\sigma} dt\left(\frac{y_i-\mu}{\sigma}, df\right)
loglike=sum(dt((y-mu)/sigma, df, log=TRUE)-log(sigma))
logprior=dnorm(mu,100,10,log=TRUE)+dnorm(log(sigma),0,2, log=TRUE)
By using the function laplace with the “int” component, we can approximate the log of the marginal density of y.   To illustrate, suppose we wish to compute \log f(y) for a model that assumes that we have 30 degrees of freedom.
> laplace(tlikepost,c(0,0),list(y=y.out,df=30))$int
[1] -89.23436
So for the model M_{30}, we have log f(y) = -89.23.  To compare this model with, say the model M_4, we repeat this using df=4, compare the log marginal density, and then we can compare the two models by a Bayes factor.
Here are some things to try.
1.  With this data set, try comparing different values for the degrees of freedom parameter.  What you should find is that large df values are supported — this indicates that a normal sampling model is appropriate.
2.  Now change the dataset by introducing an outlier and repeat the analysis, comparing different M_{df} models.  What you should find that a t model with small degrees of freedom is supported with this new dataset.

A Bayesian test of the equality of proportions

November 15, 2009 Leave a comment

One problem with a p-value is that it is hard to interpret if you don’t know the sample size.  Andy Gelman talks about this problem on his blog.

Here I’ll contrast the message communicated by a p-value and a Bayesian test by looking at the following three tables:

Table A:  [32 18; 18 32]
Table B: [272 228; 228,272]
Table C: [2570 2430; 2430,2570]

Each of the tables has a Pearson chi-square value in the (7.4, 7.8) range and so each has (approximately) the same p-value.  But, following Gelman’s reasoning, the small p-value for the large sample size in Table C doesn’t have much significance and and the same p-value for Table A (small sample size) is more impressive.
Here is a reasonable Bayesian test of the equality of two proportions.  First, we define a class of priors indexed by the parameter K:
1.  p_1, p_2 are iid beta(K\eta, K(1-\eta))
2.  \eta has a uniform density
Testing H: p_1 = p_2, A: p_1 \neq p_2 is equivalent to testing H: K = \infty against K = 1 (I’ll talk more about this in class.)
Chapter 8 in BCWR gives an expression for the Bayes factor in support of A over H.   In the LearnBayes package, suppose the data is a matrix of two columns where the first column contains the counts of successes of the two samples and the second column contains the sample sizes.  Then the Bayes factor (on the log scale) is computed by the single line
Let’s illustrate using this for the three tables:
> laplace(bfexch,0,list(data=cbind(c(32,18),c(50,50)),K=1))$int
[1] 0.9436246
> laplace(bfexch,0,list(data=cbind(c(272,228),c(500,500)),K=1))$int
[1] -0.3035413
> laplace(bfexch,0,list(data=cbind(c(2570,2430),c(5000,5000)),K=1))$int
[1] -1.411366
So the Bayes factor against equality of proportions is exp(.94) = 2.56 for Table A, exp(-0.30) = 0.74 for Table B, and exp(-1.41) = 0.24 for Table C.  In contrast to the message communicated by a p-value, the Bayes factor calculation indicates that Table A is the most significant table.

Bayesian testing of a point null hypothesis

November 13, 2009 Leave a comment

Here is today’s example illustrating a Bayesian test of a point null hypothesis.  In Agresti and Franklin’s intro stats book, there is an example wondering if Americans work, on average, 40 hours per week.   If \mu represents the mean hours worked per week for all Americans, we are interested in testing H: \mu = 40 against the alternative A: \mu \neq 40.

In the example in Agresti and Franklin’s text, a sample of 868 workers were selected and one observes that the workers worked an average of 39.11 hours and one assumes the population standard deviation is equal to \sigma = 14.  The test statistic is

z = \frac{\sqrt{868}(\bar y - 40)}{14} = -1.87

and one computes a p-value of 0.061.  There seems to be moderately significant evidence that the mean working hours for all workers is different from 40.

Here’s an outline of a Bayesian test (I give more details in class):

1.  We assign probabilities to the hypotheses H and A.  Here we assume each hypothesis is equally likely.

2.  Under the hypothesis H, the prior for \mu is a point mass at the hypotheized value of 40.

3.  Under the alternative hypothesis, we’ll place a N(40, \tau) prior on \mu.  This reflects the belief that, under the alternative, values of \mu near 40 are more likely that values far from 40.

Under these assumptions, we give (in class) the posterior probability of the hypothesis H. This probability is programmed in the function mnormt.twosided in the LearnBayes package.  The inputs to this function are (1) the value to be tested, (2) the prior probability of H, (3) a vector of values of \tau and (4) a data vector consisting of \bar y, n, \sigma.  The outputs are the Bayes factor in support of H and the posterior probability of H.

Since it is not clear how to specify the prior standard deviation \tau — we compute the posterior probability for a range of values of \tau.  I write a short function to output the posterior probability as a function of \tau and then I use the curve function to plot this probability for values of \tau from 0.01 to 2.

What we see is that the posterior probability of \mu = 40 exceeds 0.35 for all values of \tau.  This suggests that the p-value overstates the evidence against the null hypothesis.

> data=c(39.11, 868, 14)
> prob.H=function(tau)
+   mnormt.twosided(40,.5,tau,data)$post

> curve(prob.H(x),from=.01,to=2,xlab=”TAU”,ylab=”P(H)”)testing



















Comparing sampling models by Bayes factors

November 29, 2007 Leave a comment

In the last posting, I illustrated fitting a t(4) sampling model to some baseball data where I suspected there was an outlier. To compare this model to other sampling models, we can use Bayes factors.

Here is a t sampling model with a convenient choice of prior. a random sample from
2. and are independent, with distributed and distributed

I define a R function tsampling.R that computes the logarithm of the posterior. Note that I am careful to include all of the normalizing constants, since I am primarily interested in computing the marginal density of .

mu=theta[1]; logsigma=theta[2]; sigma=exp(logsigma)

y=datapar$y; df=datapar$df



We use this function together with the function laplace to compute the log marginal density for two models.

Model 1 — t(4) sampling, is normal(90, 20), is normal(1, 1).

Model 2 — t(30) sampling, is normal(90, 20), is normal(1, 1).

Note that I’m using the same prior for both models — the only difference is the choice of degrees of freedom in the sampling density.

dataprior1=list(y=y, df=4, mu.mean=90,,
log.m1=laplace(tsampling,c(80, 3), dataprior1)$int

dataprior2=list(y=y, df=30, mu.mean=90,,
log.m2=laplace(tsampling,c(80, 3), dataprior2)$int

[1] 14.8463

We see that there is substantial support for the t(4) model over the “close to normal” t(30) model.