Home > Model checking & comparison > A “robustness” illustration of Bayes factors

## A “robustness” illustration of Bayes factors

Here is another illustration of Bayes factors.   We are worried about the possibility of outliers and so we assume the observations $y_1, ..., y_n$ are distributed from a t distribution with location $\mu$, scale $\sigma$, and degrees of freedom $df$.  We have some prior beliefs about the location and scale:  we assume $\mu, \sigma$ independent with $\mu \sim N(100, 10)$ and $\log \sigma \sim N(0, 2)$.

We observe the following data:

y=c(114.1,113.1,103.5,89.4,94.7,110.6,103.2,123.0,112.3,110.976.8,
112.6,86.0,99.1,106.3,101.5,80.5,99.6,108.6,95.9)
We define a model $M_{df}$ that defines the degrees of freedom parameter to be a specific value.
We can use Bayes factors to decide on appropriate values of $df$.
I define a function tlikepost defining the log posterior of $\theta = (\mu, \log \sigma)$.  Here the argument stuff is a list with two components, the data vector y and the degrees of freedom value df.  Here the sampling density of $y_i$ is given by
$f(y_i) = \frac{1}{\sigma} dt\left(\frac{y_i-\mu}{\sigma}, df\right)$
tlikepost=function(theta,stuff)
{
mu=theta[1]
sigma=exp(theta[2])
y=stuff$y df=stuff$df
loglike=sum(dt((y-mu)/sigma, df, log=TRUE)-log(sigma))
logprior=dnorm(mu,100,10,log=TRUE)+dnorm(log(sigma),0,2, log=TRUE)
loglike+logprior
}
By using the function laplace with the “int” component, we can approximate the log of the marginal density of $y$.   To illustrate, suppose we wish to compute $\log f(y)$ for a model that assumes that we have 30 degrees of freedom.
> laplace(tlikepost,c(0,0),list(y=y.out,df=30))\$int
[1] -89.23436
So for the model $M_{30}$, we have log f(y) = -89.23.  To compare this model with, say the model $M_4$, we repeat this using $df=4$, compare the log marginal density, and then we can compare the two models by a Bayes factor.
Here are some things to try.
1.  With this data set, try comparing different values for the degrees of freedom parameter.  What you should find is that large $df$ values are supported — this indicates that a normal sampling model is appropriate.
2.  Now change the dataset by introducing an outlier and repeat the analysis, comparing different $M_{df}$ models.  What you should find that a t model with small degrees of freedom is supported with this new dataset.