## Bayesian software

A lot has happened in, say, the last 10 years with respect to Bayesian software. This has been a controversial subject and it would be worthwhile to talk about some of the main issues.

1. First, if one was going to design a Bayesian software package, what would it look like? One advantage of the Bayesian paradigm is its flexibility in defining models, doing inference, and checking and comparing models. So any software should allow for the input of “general” models including priors, a variety of methods for fitting the models, and also a variety of methods for doing inference (say, find a marginal posterior for a function of parameters of interest) and checking the validity of the model.

2. Of course, the most popular Bayesian software program is Bugs that includes all of the derivatives of Bugs including WinBugs and OpenBugs. It allows for general model specifications by writing a “model script”, it has a general MCMC computing engine that works for many problems, and it allows for general inference and model checking.

3. Ok, we should all use bugs for Bayesian computing? Actually, I purposely don’t use bugs in my Bayesian class and instead use my package LearnBayes in the R system. Why? Well, although bugs is pretty easy to use, it is sort of a black box where one can use it without understanding the issues in MCMC computing and diagnostics. I want my students to understand the basic MCMC algorithms like Gibbs and Metropolis sampling and get some experience in implementing these algorithms to understand the pitfalls. I would feel more comfortable teaching bugs after the student has had some practice with MCMC, especially for examples where MCMC hasn’t converged or has mixing problems.

4. Another approach is to program MCMC algorithms for specific Bayesian models. This approach is taken using the R package MCMCpack. For example, suppose I want to do a Bayesian linear regression using a normal prior on the regression vector and a inverse gamma prior on the variance. Then there is a function in MCMCpack that will work fine, implement the Gibbs sampling, and give you a matrix of simulated draws and also the prior predictive density value that can be used in model comparison. But suppose I want to use a t prior instead of normal for beta — then I’m stuck. These specific algorithms are useful if you want to fit “standard” models, but we lose the flexibility that is one of the advantages of Bayes.

5. Of course, as the programs become more flexible, it takes a more experienced Bayesian who can actually run the programs. If we wish to introduce Bayes to the masses, maybe we need to provide a suite of canned programs.

It will be interesting to see how Bayesian software will evolve. It is pretty clear that bugs will be a major player in the future, perhaps with a new interface.

## Illustration of using hpd function

I promised my Bayesian students that I’d illustrate the hpd function in the TeachingDemos package.

Suppose I’m interested in estimating p, the proportion of undergraduate BGSU students whose home is 200 miles or further from Bowling Green. Initially, I place a uniform prior on p. I collect data from 16 students — only 1 lives 200+ miles from BG. My posterior for p is beta(2, 16).

I’ll assume that you have already installed the TeachingDemos package in R. Then I’ll simulate 10,000 draws from the beta(2, 16) distribution and use the function emp.hpd to find a 90% interval estimate (you use hpd if you actually have the quantile function of distribution available).

> library(TeachingDemos)

> p=rbeta(10000,2,16)

> emp.hpd(p,conf=0.90)

[1] 0.005806384 0.211324962

You could also find an equal-tails interval estimate:

> qbeta(c(.05,.95),2,16)

[1] 0.02131763 0.25012445

But the length of the equal-tails interval is 0.229 which is substantially larger than the length of the HPD interval (0.206). There will be a difference between the two intervals when the posterior density is asymmetric as in this situation.

## A Poisson Change-Point Model

In Chapter 11 of BCWR, I describe an analysis of a famous dataset, the counts of British coal mining disasters described in Carlin et al (1992). We observe the number of disasters for year , where actual year – 1850. We assume for early years (), has a Poisson distribution with mean , and for the later years, is Poisson(). Suppose we place vague priors on . (Specifically, we’ll put a common gamma(c0, d0) prior on each Poisson mean.)

This model can be fit by the use of Gibbs sampling through the introduction of latent data. For each year, one introduces a state where or 2 if is Poisson() or Poisson(). Then one implements Gibbs sampling on the vector , …, ).

In Chapter 11, I illustrate the use of WinBUGS and the R interface to simulate from this model. MCMCpack also offers a R function MCMCpoissonChangepoint to fit from this model which I’ll illustrate here.

First we load in the MCMC package.

library(MCMCpack)

We load in the disaster numbers in a vector data.

data=c(4,5,4,1,0,4,3,4,0,6,

3,3,4,0,2,6,3,3,5,4,5,3,1,4,4,1,5,5,3,4,2,5,2,2,3,4,2,1,3,2,

1,1,1,1,1,3,0,0,1,0,1,1,0,0,3,1,0,3,2,2,

0,1,1,1,0,1,0,1,0,0,0,2,1,0,0,0,1,1,0,2,

2,3,1,1,2,1,1,1,1,2,4,2,0,0,0,1,4,0,0,0,

1,0,0,0,0,0,1,0,0,1,0,0)

Suppose we decide to assign gamma(c0, d0) priors on each Poisson mean where c0=1 and d0=1. Then we fit this changepoint model simply by running the function MCMCpoissonChangepoint:

fit=MCMCpoissonChangepoint(data, m = 1, c0 = 1, d0 = 1,

burnin = 10000, mcmc = 10000)

I have included the important arguments: data is obviously the vector of counts, m is the number of unknown changepoints (here m = 1), c0, d0 are the gamma prior parameters, we choose to have a burnin period of 10,000 iterations, and then collect the following 10,000 iterations.

MCMCpack includes several graphical and numerical summaries of the MCMC output: plot(fit), summary(fit), plotState(fit), and plotChangepoint(fit).

plot(fit) shows trace plots and density estimates for the two Poisson means.

summary(fit) gives you summary statistics, including suitable standard errors, for each Poisson mean

Iterations = 10001:20000

Thinning interval = 1

Number of chains = 1

Sample size per chain = 10000

1. Empirical mean and standard deviation for each variable,

plus standard error of the mean:

Mean SD Naive SE Time-series SE

lambda.1 3.0799 0.2870 0.002870 0.002804

lambda.2 0.8935 0.1130 0.001130 0.001170

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%

lambda.1 2.5411 2.8861 3.0660 3.271 3.667

lambda.2 0.6853 0.8153 0.8895 0.966 1.130

plotState(fit) – this shows the probability that the process falls in each of the two states for all years

plotChangepoint(fit) — this displays the posterior distribution of the changepoint location.

This analysis agrees with the analysis of this problem using WinBUGS described in Chapter 11 of BCWR.

## Probit modeling via MCMCpack

I thought briefly about doing a survey of Bayesian R packages in my book. I’m sure a comparative survey would be helpful to many users, but it is difficult to cover all of the packages in any depth in a 30 page chapter. Also, since packages are evolving so fast, much of what I could say would quickly be out of date.

One package that looks attractive is the MCMCpack package written by Andrew Martin and Kevin Quinn. They provide MCMC algorithms for many popular statistical models and it seems, at first glance, easy to use.

Since I just demonstrated the use of Gibbs sampling for a probit model with a normal prior, let’s fit this model by MCMCpack.

The appropriate R function to use is MCMCprobit which uses the same Albert-Chib sampling algorithm– in it’s most basic form, the function looks like

fit = MCMCprobit(model, data, burnin, mcmc, thin, b0, B0)

Here

fit: is a description of the probit model, written as any R model like lm.

data: is the data frame that is used

burnin: is the number of iterations for the burnin period

mcmc: is the number of Gibbs iterations

thin: is the thinning interval

b0: is the prior mean of the multivariate prior

B0: is the prior precision matrix

For my model, here is the syntax:

fit=MCMCprobit(success~prev.success+act, data=as.data.frame(DATA), burnin=0,

mcmc=10000, thin=1, b0=prior$beta, B0=prior$P)

After it is run, one can get summaries of the simulated draws of beta by the summary command.

summary(fit)

Iterations = 1:10000

Thinning interval = 1

Number of chains = 1

Sample size per chain = 10000

1. Empirical mean and standard deviation for each variable,

plus standard error of the mean:

Mean SD Naive SE Time-series SE

(Intercept) -1.53215 0.75595 0.0075595 0.0110739

prev.success 1.03590 0.24887 0.0024887 0.0038060

act 0.05093 0.03700 0.0003700 0.0005172

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%

(Intercept) -3.03647 -2.03970 -1.53613 -1.02398 -0.04206

prev.success 0.54531 0.86842 1.03246 1.20129 1.52685

act -0.02117 0.02567 0.05132 0.07559 0.12451

Also, one can get trace plots and density estimates by the plot command.

plot(command)

What do I think about this particular function in MCMCpack?

1. The execution time for the MCMC is much faster using MCMCprobit since the sampling is done using compiled C++ code. How much faster? For 10,000 iterations of Gibbs sampling, it took my laptop 0.58 seconds to do this sampling in MCMCpack compared with 4.53 seconds using my R function.

2. MCMCprobit allows for more user input such as the burnin period, thinning rate, starting values, random number seed, etc.

3. It allows one to output latent residuals (Albert and Chib, Biometrika) and compute marginal likelihoods by the Laplace method.

Generally, the function worked fine and I got essentially the same results as I had before. My only quibble is that it took two tries for MCMCprobit to run. It complained that my prior precision matrix was not symmetric, although I computed this matrix by the var command in R. There was a quick fix — I rounded the values of this matrix to two decimal places and MCMCprobit didn’t complain.

## Illustration of a gui for the Bayesian triplot

Richard Gonzalez found my triplot.R function and enhanced it by adding a graphical user interface. By the use of sliders, one can change the beta parameters a and b, the data values s and f, and see the effect of the changes on the triplot (prior, likelihood, and posterior) and the predictive distribution.

Here are a couple of illustrations.

In the first example, the prior is beta(3, 7) and one observes 6 successes and 14 failures. The prior information is consistent with the data information and the observed number of successes is in the middle of the predictive distribution.

In the second example, the beta(6, 2) prior is in conflict with the data (s=6, f=14) and the observed number of successes is in the tail of the predictive distribution.