Archive

Archive for the ‘MCMC’ Category

Learning about exponential parameters, part II

In the previous post, I described using a random walk Metropolis algorithm to sample from the posterior of $(\mu, \beta)$, where $\mu$ is the location and $\beta$ is the rate parameter for the exponential.   But since the MCMC algorithm is based on steps generated from a multivariate normal distribution, it seems that this algorithm would perform better if the parameters are real-valued.

So it makes sense to transform each parameter into ones that are real-valued.  We let

$\theta_1 = \log (\min y - \mu), \theta_2 = \log(\beta)$.

When we transform $(\mu, \beta)$ to $(\theta_1, \theta_2)$, we need to include a Jacobian to write down the posterior for $(\theta_1, \theta_2)$.  It can be shown that the Jacobian is equal to $J = \exp(\theta_1+\theta_2)$ and the new posterior is

$g_{new}(\theta_1, \theta_2 | y) = g(\mu, \beta|y) |J|$

Here is the function for computing the log posterior of $(\theta_1, \theta_2)$ and a contour plot of the new parameters.  Note that the shape of the posterior is close to normal, suggesting that one can use the random walk Metropolis algorithm effectively with the new posterior.

logpost2=function(theta,y)
{
mu=min(y)-exp(theta[1])
beta=exp(theta[2])
sum(dexp(y-mu,beta,log=TRUE))+sum(theta)
}
fit=laplace(logpost2,c(0,0),y)
mycontour(logpost2,c(-10,1,-2.2,0),y,
xlab=”LOG(MIN Y – MU)”,ylab=”LOG BETA”)

logpost2=function(theta,y)
{
mu=min(y)-exp(theta[1])
beta=exp(theta[2])
sum(dexp(y-mu,beta,log=TRUE))+sum(theta)
}
mycontour(logpost2,c(-10,1,-2.2,0),y,
xlab=”LOG(MIN Y – MU)”,ylab=”LOG BETA”)

Categories: MCMC

Learning about an exponential rate and location

To illustrate MCMC fitting for a two parameter problem, suppose one observes a sample $y_1, ..., y_n$ from a gamma density with location $\mu$ and rate $\beta$:

$f(y) = \beta exp(-\beta(y - \mu)), y \ge \mu .$

If we assume a uniform prior on $(\mu, \beta)$, then the posterior density is given by

$g(\mu, \beta | y) \propto \prod f(y_i)$,

where $-\infty < \mu < \min y_i, \beta > 0$.

Suppose we observe the following data:

[1]  5.1  5.3  7.1  6.5  3.2  5.2  6.4  9.2 11.4  3.1  3.0  9.4  3.6  5.7  5.0
[16]  3.6  3.1 11.5  3.5  5.5

5.1  5.3  7.1  6.5  3.2  5.2  6.4  9.2 11.4  3.1  3.0  9.4  3.6  5.7  5.0
3.6  3.1 11.5  3.5  5.5

1.  First we write a function logpost that computes the log posterior of $(\mu, \beta)$.  Since $\beta > 0$, we have to be careful when negative values are input for $\beta$.  In the ifelse function, we assign a very large negative value when $\beta < 0$.

logpost=function(theta,y)
ifelse(theta[2]>0,sum(dexp(y-theta[1],theta[2],log=TRUE)),-9e200)

2.  Using the mycontour function in the LearnBayes package, we construct a contour plot of the posterior — we add the line $\mu = \min y$ to show the boundary for $\mu$.

mycontour(logpost,c(1.5,3.5,.1,.8),y,xlab=”MU”,ylab=”BETA”,col=”red”)
abline(v=min(y))

3.   We’ll try using a random walk Metropolis algorithm where a candidate value is found by
$(\mu, \beta)^c = (\mu, \beta)^i + scale \times Z$,
where $Z$ is normal with mean 0 and variance-covariance matrix $V$.  Here it is not quite obvious how to choose $scale$ and $V$ — a normal approximation won’t work here.  Looking at the graph, it seems that most of the probability for $\mu$ is in (2, 3) and most of the probability for $\beta$ is in (0.1, 0.6).  If we use the rough rule of thumb “4 sd = range of data”, I get estimates for the two standard deviations and these can be squared to get estimates at the variances.  I obtain the estimate $V$ = diag(.06, .015).  After some trial and error, it seems that the value scale = 1 works ok — the acceptance rate is 25%.
Here are the arguments for the function rwmetrop — I show the simulated draws on top of the scatterplot.  It looks like a reasonable sample from the posterior.
fit=rwmetrop(logpost,list(var=diag(c(.06,.015)),scale=1),c(2,.3),10000,y)
Categories: MCMC

A Poisson Change-Point Model

In Chapter 11 of BCWR, I describe an analysis of a famous dataset, the counts of British coal mining disasters described in Carlin et al (1992). We observe the number of disasters  for year , where  actual year – 1850. We assume for early years (),  has a Poisson distribution with mean , and for the later years,  is Poisson(). Suppose we place vague priors on . (Specifically, we’ll put a common gamma(c0, d0) prior on each Poisson mean.)

This model can be fit by the use of Gibbs sampling through the introduction of latent data. For each year, one introduces a state  where  or 2 if  is Poisson() or Poisson(). Then one implements Gibbs sampling on the vector , …, ).

In Chapter 11, I illustrate the use of WinBUGS and the R interface to simulate from this model. MCMCpack also offers a R function MCMCpoissonChangepoint to fit from this model which I’ll illustrate here.

First we load in the MCMC package.

library(MCMCpack)

We load in the disaster numbers in a vector data.

data=c(4,5,4,1,0,4,3,4,0,6,
3,3,4,0,2,6,3,3,5,4,5,3,1,4,4,1,5,5,3,4,2,5,2,2,3,4,2,1,3,2,
1,1,1,1,1,3,0,0,1,0,1,1,0,0,3,1,0,3,2,2,
0,1,1,1,0,1,0,1,0,0,0,2,1,0,0,0,1,1,0,2,
2,3,1,1,2,1,1,1,1,2,4,2,0,0,0,1,4,0,0,0,
1,0,0,0,0,0,1,0,0,1,0,0)

Suppose we decide to assign gamma(c0, d0) priors on each Poisson mean where c0=1 and d0=1. Then we fit this changepoint model simply by running the function MCMCpoissonChangepoint:

fit=MCMCpoissonChangepoint(data, m = 1, c0 = 1, d0 = 1,
burnin = 10000, mcmc = 10000)

I have included the important arguments: data is obviously the vector of counts, m is the number of unknown changepoints (here m = 1), c0, d0 are the gamma prior parameters, we choose to have a burnin period of 10,000 iterations, and then collect the following 10,000 iterations.

MCMCpack includes several graphical and numerical summaries of the MCMC output: plot(fit), summary(fit), plotState(fit), and plotChangepoint(fit).

plot(fit) shows trace plots and density estimates for the two Poisson means.

summary(fit) gives you summary statistics, including suitable standard errors, for each Poisson mean

Iterations = 10001:20000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 10000

1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:

Mean SD Naive SE Time-series SE
lambda.1 3.0799 0.2870 0.002870 0.002804
lambda.2 0.8935 0.1130 0.001130 0.001170

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%
lambda.1 2.5411 2.8861 3.0660 3.271 3.667
lambda.2 0.6853 0.8153 0.8895 0.966 1.130

plotState(fit) – this shows the probability that the process falls in each of the two states for all years

plotChangepoint(fit) — this displays the posterior distribution of the changepoint location.
This analysis agrees with the analysis of this problem using WinBUGS described in Chapter 11 of BCWR.

Categories: Bayesian software, MCMC

Robust Modeling using Gibbs Sampling

To illustrate the use of Gibbs sampling for robust modeling, here are the batting statistics for the “regular” players on the San Francisco Giants for the 2007 baseball season:

Pos Player              Ag   G   AB    R    H   2B 3B  HR  RBI  BB  SO   BA    OBP   SLG  SB  CS  GDP HBP  SH  SF IBB  OPS+---+-------------------+--+----+----+----+----+---+--+---+----+---+----+-----+-----+-----+---+---+---+---+---+---+---+----+C   Bengie Molina       32  134  497   38  137  19  1  19   81  15   53  .276  .298  .433   0   0  13   2   1   2   2   861B *Ryan Klesko         36  116  362   51   94  27  3   6   44  46   68  .260  .344  .401   5   1  14   1   1   1   2   922B #Ray Durham          35  138  464   56  101  21  2  11   71  53   75  .218  .295  .343  10   2  18   2   0   9   6   653B  Pedro Feliz         32  150  557   61  141  28  2  20   72  29   70  .253  .290  .418   2   2  15   1   0   3   2   81SS #Omar Vizquel        40  145  513   54  126  18  3   4   51  44   48  .246  .305  .316  14   6  14   1  14   3   6   62LF *Barry Bonds         42  126  340   75   94  14  0  28   66 132   54  .276  .480  .565   5   0  13   3   0   2  43  170CF *Dave Roberts        35  114  396   61  103  17  9   2   23  42   66  .260  .331  .364  31   5   4   0   4   0   1   80RF #Randy Winn          33  155  593   73  178  42  1  14   65  44   85  .300  .353  .445  15   3  12   7   4   5   3  105    Rich Aurilia        35   99  329   40   83  19  2   5   33  22   45  .252  .304  .368   0   0   8   4   0   3   1   73    Kevin Frandsen      25  109  264   26   71  12  1   5   31  21   24  .269  .331  .379   4   3  17   5   3   3   3   84   *Fred Lewis          26   58  157   34   45   6  2   3   19  19   32  .287  .374  .408   5   1   4   3   1   0   0  103   #Dan Ortmeier        26   62  157   20   45   7  4   6   16   7   41  .287  .317  .497   2   1   2   1   0   2   1  107    Rajai Davis         26   51  142   26   40   9  1   1    7  14   25  .282  .363  .380  17   4   0   4   2   0   1   93   *Nate Schierholtz    23   39  112    9   34   5  3   0   10   2   19  .304  .316  .402   3   1   0   1   0   2   0   85   *Mark Sweeney        37   76   90   18   23   8  0   2   10  13   18  .256  .368  .411   2   0   0   3   1   0   0  102

We’ll focus on the last batting measure OPS that is a summary of a player’s batting effectiveness.

We read the OPS values for the 15 players into R into the vector y.

> y
[1] 86 92 65 81 62 170 80 105 73 84 103 107 93 85 102

We assume that the observations y1, …, y15 are iid from a t distribution with location , scale  and 4 degrees of freedom. We place the usual noninformative prior  on .

We implement 10,000 iterations of Gibbs sampling by use of the function robustt.R in the LearnBayes package. This function is easy to use — we just input the data vector y, the degrees of freedom, and the number of iterations.

fit=robustt(y,4,10000)

The object fit is a list with components mu, sigma2, and lam — mu is a vector of simulated draws of , sigma2 is a vector of simulated draws of , and lam is a matrix of simulated draws of the scale parameters . (The  are helpful for identifying outliers in the data — here the outlier is pretty obvious.)

Below I have graphed the data as solid dots and placed a density estimate of the posterior of  on top. We see that the estimate of  appears to ignore the one outlier (Barry Bonds) in the data.

plot(y,0*y,cex=1.5,pch=19,ylim=c(0,.1),ylab=”DENSITY”,xlab=”MU”)
lines(density(fit\$mu),lwd=3,col=”red”)

By the way, we have assumed that robust modeling was suitable for this dataset since I knew that the Giants had one unusually good hitter on their team. Can we formally show that robust modeling the t(4) distribution is better than normal modeling for these data?

Sure — we can define two models (with the choice of proper prior distributions) and compare the models by use of a Bayes factor. I’ll illustrate this in my next posting.

Categories: MCMC

Gibbs Sampling for Censored Regression

Gibbs sampling is very convenient for many “missing data” problems. To illustrate this situation, suppose we have the simple regression model

,

where the errors  are iid . The problem is that some of the response variables  are right-censored and we actually observe , where  is a known censoring value. We know which observations are uncensored () and which observations are censored ().

We illustrate this situation by use of a picture inspired by a similar one in Martin Tanner’s book. We show a scatterplot of the covariate  against the observed . The points highlighted in red correspond to censored observations — the actual observations (the ) exceed the censored values, which we indicate by arrows.

To apply Gibbs sampling to this situation, we imagine a complete data set where all of the ‘s are known. The unknowns in this problem are the regression coefficients , the error variance , and the ‘s corresponding to the censored observations.

The joint posterior of all unobservables (assuming the usual vague prior for regression) has the form



where  is equal to 1 if the observation is not censored, and  if the observation is censored at .

With the introduction of the complete data set, Gibbs sampling is straightforward.

Suppose one has initial estimates at the regression coefficients and the error variance. Then

(1) One simulates from the distribution of the missing data (the  for the censored observations) given the parameters. Specifically  is simulated from a normal() distribution censored below by .

(2) Given the complete data, one simulates values of the parameters using the usual approach for a normal linear regression model.

To do this on R, I have a couple of useful tools. The function rtruncated.R will simulate draws from an arbitrary truncated distribution

rtruncated=function(n,lo,hi,pf,qf,…)
qf(pf(lo,…)+runif(n)*(pf(hi,…)-pf(lo,…)),…)

For example, if one wishes to simulate 20 draws of a normal(mean = 10, sd = 2) distribution that is truncated below by 4, one writes

rtruncated(20, 4, Inf, pnorm, qnorm, 10, 2)

Also the function blinreg.R in the LearnBayes package will simulate draws of a regression coefficient  and the error variance  for a normal linear model with a noninformative prior.

Categories: MCMC

A Comparison of Two MCMC Methods

In the last posting, I illustrated a Gibbs sampling algorithm for simulating from a normal/normal hierarchical model. This method was based on successive substitution sampling from the conditional posterior densities of theta1, …, thetak, mu, and tau2. A second sampling method is outlined in Chapter 7 of BCUR. In this method that we’ll call “Exact MCMC” one integrates out the first-stage parameters theta1, …, thetak and uses a random-walk Metropolis algorithm to simulate from the marginal posterior of (mu, log tau).

We’ll use a few pictures to compare these two methods, specifically in learning about the variance hyperparameter tau. Here’s the exact MCMC method:

1. We write a function defining the log posterior of mu and log tau.

2. We simulate from (mu, log tau) using the function rwmetrop.R in the LearnBayes package. We choose a suitable proposal function from output from the laplace.R function. The acceptance rate of this random walk algorithm is 20%. We sample for 10,000 iterations, saving the draws of log tau in a vector.

We then run the Gibbs sampler for 10,000 iterations, also saving the draws of log tau.

DENSITY PLOTS. We first compare density estimates of the simulated samples of log tau using the two methods. The two estimates seem pretty similar. The density estimate for the Gibbs sampling draws looks smoother, but that doesn’t mean this sample is a better estimate of the posterior of log tau.

TRACE PLOTS. We compare trace plots of the two sets of simulated draws. They look pretty similar, but there are differences in closer inspection. The draws from the Gibbs sampling run look more irregular, or perhaps they have more of a “snake-like” appearance.

AUTOCORRELATION PLOTS. A autocorrelation plot is a graph of the autocorrelations

corr (theta(j), theta(j-g))

graphed against the lag g. For most MCMC runs, the values of the stream of simulated draws will show positive autocorrelation, but hopefully the autocorrelation values decrease quickly as the lag increases. It is pretty obvious that the lag autocorrelation values drop off slower for method 2 (Gibbs sampling); in contrast, the lag autocorrelations decrease to zero for method 1 (Exact MCMC) for values of the lag from 1 to 15.

The moral of the story is that the exact MCMC method shows better mixing than Gibbs sampling in this example. This means that, for a given sample size (here 10,000), one will have more accurate estimates at the posterior of tau using the exact MCMC method.

Categories: MCMC

Gibbs Sampling for Hierarchical Models

Gibbs sampling is an attractive “automatic” method of setting up a MCMC algorithm for many classes of models. Here we illustrate using R to write a Gibbs sampling algorithm for the normal/normal exchangeable model.

We write the model in three stages as follows.

1. The observations y1, …, yk are independent where yj is N(thetaj, sigmaj^2), where we write N(mu, sigma2) to denote the normal density with mean mu and variance sigma2. We assume the sampling variances sigma1^2, …, sigmak^2 are known.

2. The means theta1,…, thetak are a random sample from a N(mu, tau2) population. (tau2 is the variance of the population).

3. The hyperparameters (mu, tau) are assumed to have a uniform distribution. This implies that the parameters (mu, tau2) have a prior proportional to (tau2)^(-1/2).

To write a Gibbs sampling algorithm, we write down the joint posterior of all parameters (theta1, …, thetak, mu, tau2):



From this expression, one can see

1. The posterior of thetaj conditional on all remaining parameters is normal, where the mean and variance are given by the usual normal density/normal prior updating formula.

2. The hyperparameter mu has a normal posterior with mean theta_bar (the sample mean of the thetaj) and variance tau2/k.

3. The hyperparameter tau2 has an inverse gamma posterior with shape (k-1)/2 and rate 1/2 sum(thetaj – mu)^2.

Given that all the conditional posteriors have convenient functional forms, we write a R function to implement the Gibbs sampling. The only inputs are the data matrix (columns containing yj and sigmaj^2) and the number of iterations m.

I’ll display the function normnormexch.gibbs.R with notes.

normnormexch.gibbs=function(data,m)
{
y=data[,1]; k=length(y); sigma2=data[,2] # HERE I READ IN THE DATA

THETA=array(0,c(m,k)); MU=rep(0,m); TAU2=rep(0,m) # SET UP STORAGE

mu=mean(y); tau2=median(sigma2) # INITIAL ESTIMATES OF MU AND TAU2

for (j in 1:m) # HERE’S THE GIBBS SAMPLING
{
p.means=(y/sigma2+mu/tau2)/(1/sigma2+1/tau2) # CONDITIONAL POSTERIORS
p.sds=sqrt(1/(1/sigma2+1/tau2)) # OF THETA1,…THETAK
theta=rnorm(k,mean=p.means,sd=p.sds)

mu=rnorm(1,mean=mean(theta),sd=sqrt(tau2/k)) # CONDITIONAL POSTERIOR OF MU

tau2=rigamma(1,(k-1)/2,sum((theta-mu)^2)/2) # CONDITIONAL POSTERIOR OF TAU2

THETA[j,]=theta; MU[j]=mu; TAU2[j]=tau2 # STORE SIMULATED DRAWS
}
return(list(theta=THETA,mu=MU,tau2=TAU2)) # RETURN A LIST WITH SAMPLES
}

Here is an illustration of this algorithm for the SAT example from Gelman et al.

y=c(28,8,-3,7,-1,1,18,12)
sigma=c(15,10,16,11,9,11,10,18)
data=cbind(y,sigma^2)
fit=normnormexch.gibbs(data,1000)

In the Chapter 7 exercise that used this example, a different sampling algorithm was used to simulate from the joint posterior of (theta, mu, tau2) — it was a direct sampling algorithm based on the decomposition

[theta, mu, tau2] = [mu, tau2] [theta | mu, tau2]

In a later posting, I’ll compare the two sampling algorithms.