### Archive

Archive for the ‘MCMC’ Category

## Cell phone story — part 3

After talking about this problem in class on Monday, I realized that we don’t understand the texting patterns of undergraduates very well.  So I’m reluctant to specify an informative prior in this situation for the simple reason that I don’t understand this that well.  So I’m going to illustrate using bugs to fit this model using a vague prior.

I have observed the number of text messages for my son for 16 days in this months billing period.  Since his monthly allocation is 5000 messages, I am focusing on the event “number of messages in the 30 days exceeds 5000.”  Currently my son has had 3314 messages and so I’m interested in computing the predictive probability that the number of messages in the remaining 14 days exceeds 1686.

Here I’ll outline using the openbugs software with the R interface in the BRugs package to fit our model.

I’m assuming that you’re on a Windows machine and have already installed the BRugs package in R.

First we write a script that describes the normal sampling model.  Following Kruschke’s Doing Bayesian Data Analysis text, I can enter this model into the R console and write it to a file model.txt.

modelString = ”
model {
for( i in 1:n ){
y[i] ~ dnorm( mu, P )
}
mu ~ dnorm( mu0, P0 )
mu0 <- 100
P0 <- 0.00001
P ~ dgamma(a, b)
a <- 0.001
b <- 0.001
}

writeLines(modelString, con=”model.txt”)

# We load in the BRugs package (this includes the openbugs software).

library(BRugs)

# Set up the initial values for $\mu$ and $P$ in the MCMC iteration — this puts the values in a file called inits.txt.

bugsInits(list(c(mu = 200, P = 0.05)),
numChains = 1, fileName = “inits.txt”)

# Have openbugs check that the model specification is okay.

modelCheck(“model.txt”)

# Here is our data — n is the number of observations and y is the vector of text message counts.

dataList = list(
n = 16,
y=c(207, 121, 144, 229, 113, 262, 169, 330,
168, 132, 224, 188, 231, 207, 268, 321)
)

# Enter the data into openbugs.

modelData( bugsData( dataList ))

# Compile the model.

modelCompile()

# Read in the initial values for the simulation.

modelInits(“inits.txt”)

# We are going to monitor the mean and the precision.

samplesSet( c(“mu”, “P”))

# We’ll try 10,000 iterations of MCMC.

chainLength = 10000

# Update (actually run the MCMC) — it is very quick.

modelUpdate( chainLength )

# The function samplesSample collects the simulated draws, samplesStats computes summary statistics.

muSample = samplesSample( “mu”)
muSummary = samplesStats( “mu”)

PSample = samplesSample( “P”)
PSummary = samplesStats( “P”)

# From this output, we can compute summaries of any function of the parameters of interest (like the normal standard deviation) and compute the predictive probability of interest.

Let’s focus on the prediction problem.  The variable of interest is z, the number of text messages in the 14 remaining days in the month.  The sampling model assumes that the number of daily text messages is N($\mu, \sigma$), so the sum of text messages for 14 games is N($14 \mu, \sqrt{14} \sigma)$.

To simulate a single value of z from the posterior predictive distribution, we (1) simulate a value of $(\mu, \sigma)$ from its posterior distribution, and (2) simulate z from a normal distribution using these simulated draws as parameters.

Categories: MCMC

## Cell phone story — part 2

It is relatively easy to set up a Gibbs sampling algorithm for the normal sampling problem when independent priors (of the conjugate type) are assigned to the mean and precision.  Here we outline how to do this on R.

We start with an expression for the joint posterior of the mean $\mu$ and the precision $P$:

(Here S is the sum of squares of the observations about the mean.)

1.  To start, we recognize the two conditional distributions.

• The posterior of $\mu$ given $P$ is given by the usual updating formula for a normal mean and a normal prior.  (Essentially this formula says that the posterior precision is the sum of the prior and data precisions and the posterior mean is a weighted average of the prior mean and the sample mean where the weights are proportional to the corresponding precisions.
• The posterior of $P$ given $\mu$ has a gamma form where the shape is given by $a + n/2$ and the scale is easy to pick up.

2.  Now we’re ready to use R.  I’ve written a short function that implements a single Gibbs sampling cycle.  To understand the code, here are the variables:

– ybar is the sample mean, S is the sum of squares about the mean, and n is the sample size
– the prior parameters are (mu0, tau) for the prior and (a, b) for the precision
– theta is the current value of ($\mu, P$)

The function performs the simulations from the distributions [$\mu | P$] and $P | \mu]$ and returns a new value of ($\mu, P$)

one.cycle=function(theta){ mu = theta[1]; P = theta[2] P1 = 1/tau0^2 + n*P mu1 = (mu0/tau0^2 + ybar*n*P) / P1 tau1 = sqrt(1/P1) mu = rnorm(1, mu1, tau1)

a1 = a + n/2
b1 = b + S/2 + n/2*(mu – ybar)^2
P = rgamma(1, a1, b1)
c(mu, P)
}

All there is left in the programming is some set up code (bring in the data and define the prior parameters), give a starting value, and collect the vectors of simulated draws in a matrix.

Categories: MCMC

## Cell phone story

I’m interested in learning about the pattern of text message use for my college son.  I pay the monthly cell phone bill and I want to be pretty sure that he won’t exceed his monthly allowance of 5000 messages.

We’ll put this problem in the context of a normal distribution inference problem.  Suppose $y$, the number of daily text messages (received and sent) is normal with mean $\mu$ and standard deviation $\sigma$.  We’ll observe $y_1, ..., y_{13}$, the number of text messages in the first 13 days in the billing month.  I’m interested in the predictive probability that the total of the count of text message in the next 17 days exceeds 5000.

1. First I talk about some prior beliefs about $(\mu, \sigma)$ that we’ll model by independent conjugate priors.
2.  I’ll discuss the use of Gibbs sampling to simulate from the posterior distribution.
3. Last, we’ll use the output of the Gibbs sampler to get a prediction interval for the sum of text messages in the next 17 days.
Here we talk about prior beliefs.  To be honest, I didn’t think too long about my beliefs about my son’s text message usage, but here is what I have.
1. First I assume that my prior beliefs about the mean $\mu$ and standard deviation $\sigma$ of the population of text messages are independent.  This seems reasonable, especially since it is easier to think about each parameter separately.
2. I’ll use conjugate priors to model beliefs about each parameter.  I believe my son makes, on average, 40 messages per day but I could easily be off by 15.  So I let $\mu \sim N(40, 15)$.
3. It is harder to think about my beliefs about the standard deviation $\sigma$ of the text message population.   After some thought, I decide that my prior mean and standard deviation of $\sigma$ are 5 and 2, respectively.  We’ll see shortly that it is convenient to model the precision $P = 1/\sigma^2$ by a $gamma(a, b)$ distribution.  It turns out that $P \sim gamma(3, 60)$ is a reasonable match to my prior information about $\sigma$.
In the next blog posting, I’ll illustrate writing a R script to implement the Gibbs sampling.
Categories: MCMC

## Learning from the extremes – part 3

Continuing our selected data example, suppose we want to fit our Bayesian model by using a MCMC algorithm.  As described in class, the Metropolis-Hastings random walk algorithm is a convenient MCMC algorithm for sampling from this posterior density.  Let’s walk through the steps of doing this using LearnBayes.

1.  As before, we write a function minmaxpost that contains the definition of the log posterior.  (See an earlier post for this function.)

2.  To get some initial ideas about the location of ($\mu, \log \sigma$), we use the laplace function to get an estimate at the mean and variance-covariance matrix.

data=list(n=10, min=52, max=84)
library(LearnBayes)
fit = laplace(minmaxpost, c(70, 2), data)
mo = fit$mode v = fit$var

Here mo is a vector with the posterior mode and v is a matrix containing the associated var-cov matrix.

Now we are ready to use the rwmetrop function that implements the M-H random walk algorithm.  There are four inputs:  (1) the function defining the log posterior, (2) a list containing var, the estimated var-cov matrix, and scale, the M-H random walk scale constant, (3) the starting value in the Markov Chain simulation, (4) the number of iterations of the algorithm, and (5) any data and prior parameters used in the log posterior density.

Here we’ll use v as our estimated var-cov matrix, use a scale value of 3, start the simulation at $(\mu, \log \sigma) = (70, 2)$ and try 10,000 iterations.

s = rwmetrop(minmaxpost, list(var=v, scale=3), c(70, 2), 10000, data)

I display the acceptance rate — here it is 19% which is a reasonable value.

> s$accept [1] 0.1943 Here we can display the contours of the exact posterior and overlay the simulated draws. mycontour(minmaxpost, c(45, 95, 1.5, 4), data, xlab=expression(mu), ylab=expression(paste("log ",sigma))) points(s$par)


It seems like we have been successful in getting a good sample from this posterior distribution.

Categories: MCMC

## Learning from the extremes – part 2

In the last post, I described a problem with selected data.  You observe speeds of  10 cards but only collect the minimum speed 52 and the maximum speed of 84.  We want to learn about the mean and standard deviation of the underlying normal distribution.

We’ll work with the parameterization $(\mu, \log \sigma)$ which will give us a better normal approximation.  A standard noninformative prior is uniform on  $(\mu, \log \sigma)$.

1.  First I write a short function minmaxpost that computes the logarithm of the posterior density.  The arguments to this function are $\theta = (\mu, \log \sigma)$ and data which is a list with components n, min, and max.  I’d recommend using the R functions pnorm and dnorm in computing the density — it saves typing errors.

minmaxpost=function(theta, data){
mu = theta[1]
sigma = exp(theta[2])
dnorm(data$min, mu, sigma, log=TRUE) + dnorm(data$max, mu, sigma, log=TRUE) +
(data$n - 2)*log(pnorm(data$max, mu, sigma)-pnorm(data$min, mu, sigma)) } 2. Then I use the function laplace in the LearnBayes package to summarize this posterior. The arguments to laplace are the name of the log posterior function, an initial estimate at $\theta$ and the data that is used in the log posterior function. data=list(n=10, min=52, max=84) library(LearnBayes) fit = laplace(minmaxpost, c(70, 2), data) 3. The output of laplace includes mode, the posterior mode, and var, the corresponding estimate at the variance-covariance matrix. fit$mode
[1] 67.999960  2.298369

$var [,1] [,2] [1,] 1.920690e+01 -1.900688e-06 [2,] -1.900688e-06 6.031533e-02 4. I demonstrate below that we obtain a pretty good approximation in this situation. I use the mycontour function to display contours of the exact posterior and overlay the matching normal approximation using a second application of mycontour. mycontour(minmaxpost, c(45, 95, 1.5, 4), data, xlab=expression(mu), ylab=expression(paste("log ",sigma))) mycontour(lbinorm, c(45, 95, 1.5, 4), list(m=fit$mode, v=fit\$var), add=TRUE, col="red")

Categories: MCMC

## Learning from the extremes

Here is an interesting problem with “selected data”.  Suppose you are measuring the speeds of cars driving on an interstate.  You assume the speeds are normally distributed with mean $\mu$ and standard deviation $\sigma$.  You see 10 cars pass by and you only record the minimum and maximum speeds.  What have you learned about the normal parameters?

First we’ll describe the construction of the likelihood function.  We’ll combine the likelihood with the standard noninformative prior for a mean and standard deviation.   Then we’ll illustrate the use of a normal approximation to learn about the parameters.

Here we focus on the construction of the likelihood.  Given values of the normal parameters, what is the probability of observing minimum = x and the maximum = y in a sample of size n?

Essentially we’re looking for the joint density of two order statistics which is a standard result.  Let f and F denote the density and cdf of a normal density with mean $\mu$ and standard deviation $\sigma$.  Then the joint density of (x, y) is given by

$f(x, y | \mu, \sigma) \propto f(x) f(y) [F(y) - F(x)]^{n-2}, x < y$

After we observe data, the likelihood is this sampling density viewed as function of the parameters.  Suppose we take a sample of size 10 and we observe x = 52, y = 84.  Then the likelihood is given by

$L(\mu, \sigma) \propto f(52) f(84) [F(84) - F(52)]^{8}$

In the next blog posting, I’ll describe how to summarize this posterior by a normal approximation in LearnBayes.

Categories: MCMC

## Learning about exponential parameters — part III

The two-parameter exponential sampling problem can be used to illustrate Gibbs sampling.  The  joint posterior density for $(\mu, \beta)$ has the form

$g(\mu, \beta | {\rm data}) \propto \beta^n \exp(-\beta(s - n\mu)) I(\mu < \min y)$

Note that each of the two conditional posteriors have simple forms.

1.  If we fix a value of $\mu$, the conditional posterior of $\beta$ has the gamma form with shape $n+1$ and rate $(s - n\mu)$.

2.  Turning things around, if we fix a value of $\beta$, the conditional posterior of $\mu$ has the form

$g(\mu | y, \beta) \propto \exp(n\beta \mu), \mu < \min y$.

This is the mirror image of an exponential random variable with rate beta and location $\min y$.  By use of the inversion method, one can simulate a value of $\mu$.

Here is a short function to implement Gibbs sampling for this example.  The inputs are the data vector y and the number of cycles of Gibbs sampling iter.   One cycle of GS is accomplished by the two lines

beta=rgamma(1,shape=n+1,rate=s-n*mu)  # simulates from [$\beta | \mu$]
mu=log(runif(1))/(beta*n)+min(y)  # simulates from [$\mu | \beta$]

gibbs.exp=function(y,iter)
{
n=length(y)
s=sum(y)
theta=array(0,c(iter,2))
mu=min(y)
for (j in 1:iter)
{
beta=rgamma(1,shape=n+1,rate=s-n*mu)
mu=log(runif(1))/(beta*n)+min(y)
theta[j,]=c(mu,beta)
}

gibbs.exp=function(y,iter)
{
n=length(y)
s=sum(y)
theta=array(0,c(iter,2))
mu=min(y)
for (j in 1:iter)
{
beta=rgamma(1,shape=n+1,rate=s-n*mu)
mu=log(runif(1))/(beta*n)+min(y)
theta[j,]=c(mu,beta)
}

I ran this for 10,000 iterations and the MCMC chain appeared to mix well.
Categories: MCMC