## Learning from the extremes – part 3

Continuing our selected data example, suppose we want to fit our Bayesian model by using a MCMC algorithm. As described in class, the Metropolis-Hastings random walk algorithm is a convenient MCMC algorithm for sampling from this posterior density. Let’s walk through the steps of doing this using LearnBayes.

1. As before, we write a function minmaxpost that contains the definition of the log posterior. (See an earlier post for this function.)

2. To get some initial ideas about the location of (), we use the laplace function to get an estimate at the mean and variance-covariance matrix.

data=list(n=10, min=52, max=84) library(LearnBayes) fit = laplace(minmaxpost, c(70, 2), data) mo = fit$mode v = fit$var

Here mo is a vector with the posterior mode and v is a matrix containing the associated var-cov matrix.

Now we are ready to use the rwmetrop function that implements the M-H random walk algorithm. There are four inputs: (1) the function defining the log posterior, (2) a list containing var, the estimated var-cov matrix, and scale, the M-H random walk scale constant, (3) the starting value in the Markov Chain simulation, (4) the number of iterations of the algorithm, and (5) any data and prior parameters used in the log posterior density.

Here we’ll use v as our estimated var-cov matrix, use a scale value of 3, start the simulation at and try 10,000 iterations.

s = rwmetrop(minmaxpost, list(var=v, scale=3), c(70, 2), 10000, data)

I display the acceptance rate — here it is 19% which is a reasonable value.

> s$accept [1] 0.1943

Here we can display the contours of the exact posterior and overlay the simulated draws.

mycontour(minmaxpost, c(45, 95, 1.5, 4), data, xlab=expression(mu), ylab=expression(paste("log ",sigma))) points(s$par)

It seems like we have been successful in getting a good sample from this posterior distribution.

## Learning from the extremes – part 2

In the last post, I described a problem with selected data. You observe speeds of 10 cards but only collect the minimum speed 52 and the maximum speed of 84. We want to learn about the mean and standard deviation of the underlying normal distribution.

We’ll work with the parameterization which will give us a better normal approximation. A standard noninformative prior is uniform on .

1. First I write a short function minmaxpost that computes the logarithm of the posterior density. The arguments to this function are and data which is a list with components n, min, and max. I’d recommend using the R functions pnorm and dnorm in computing the density — it saves typing errors.

minmaxpost=function(theta, data){ mu = theta[1] sigma = exp(theta[2]) dnorm(data$min, mu, sigma, log=TRUE) + dnorm(data$max, mu, sigma, log=TRUE) + (data$n - 2)*log(pnorm(data$max, mu, sigma)-pnorm(data$min, mu, sigma)) }

2. Then I use the function laplace in the LearnBayes package to summarize this posterior. The arguments to laplace are the name of the log posterior function, an initial estimate at and the data that is used in the log posterior function.

data=list(n=10, min=52, max=84) library(LearnBayes) fit = laplace(minmaxpost, c(70, 2), data)

3. The output of laplace includes mode, the posterior mode, and var, the corresponding estimate at the variance-covariance matrix.

fit $mode [1] 67.999960 2.298369 $var [,1] [,2] [1,] 1.920690e+01 -1.900688e-06 [2,] -1.900688e-06 6.031533e-02

4. I demonstrate below that we obtain a pretty good approximation in this situation. I use the mycontour function to display contours of the exact posterior and overlay the matching normal approximation using a second application of mycontour.

mycontour(minmaxpost, c(45, 95, 1.5, 4), data, xlab=expression(mu), ylab=expression(paste("log ",sigma))) mycontour(lbinorm, c(45, 95, 1.5, 4), list(m=fit$mode, v=fit$var), add=TRUE, col="red")

## Learning from the extremes

Here is an interesting problem with “selected data”. Suppose you are measuring the speeds of cars driving on an interstate. You assume the speeds are normally distributed with mean and standard deviation . You see 10 cars pass by and you only record the minimum and maximum speeds. What have you learned about the normal parameters?

First we’ll describe the construction of the likelihood function. We’ll combine the likelihood with the standard noninformative prior for a mean and standard deviation. Then we’ll illustrate the use of a normal approximation to learn about the parameters.

Here we focus on the construction of the likelihood. Given values of the normal parameters, what is the probability of observing minimum = x and the maximum = y in a sample of size n?

Essentially we’re looking for the joint density of two order statistics which is a standard result. Let f and F denote the density and cdf of a normal density with mean and standard deviation . Then the joint density of (x, y) is given by

After we observe data, the likelihood is this sampling density viewed as function of the parameters. Suppose we take a sample of size 10 and we observe x = 52, y = 84. Then the likelihood is given by

In the next blog posting, I’ll describe how to summarize this posterior by a normal approximation in LearnBayes.

## Normal approximation to posterior

To illustrate a normal approximation to a posterior, let’s return to the fire call example described in the September 6 post. Here we had Poisson sampling with mean and a normal prior on .

1. First we write a short function lpost that computes the logarithm of the posterior. I show the expressions for the log likelihood and the log prior. On the log scale, the log posterior is the log likelihood PLUS the log prior.

lpost = function(lambda){ loglike = -6*lambda + 8*log(lambda) logprior = dnorm(lambda, 3, 1, log=TRUE) loglike + logprior }

2. I plot the normalized version of the posterior below. I first write a short function post that computes the posterior, use the integrate function to numerically integrate the density from 0 to 10, and then use the curve function to display the normalized posterior.

post = function(lambda) exp(lpost(lambda)) C = integrate(post, 0, 10) curve(exp(lpost(x))/C$value, 0, 5)

3. There is a useful function laplace in the LearnBayes package that conveniently finds the matching normal approximation to the posterior. The function inputs are (1) the function defining the log posterior, and (2) a guess at the posterior mode. Typically the log posterior might depend on data and prior parameters and that would be the last input to laplace (here we are not using that extra input).

library(LearnBayes) fit=laplace(lpost, 1) fit $mode [1] 1.7 $var [,1] [1,] 0.2653809

The important output here is (1) the mode of the posterior and (2) the corresponding approximation to the posterior variance. By looking at the output, we see that the posterior of is approximately N(1.7, ).

4. To check the accuracy of this approximation, we use the curve function to add a matching normal density. (I have also added a legend by use of the legend function.) Note that the normal approximation is pretty accurate in this particular case.

curve(dnorm(x, fit$mode, sqrt(fit$var)), add=TRUE, col="red") legend("topright", legend=c("Exact", "Normal Approximation"), lty = 1, col=c("black", "red"))

## Modeling field goal kicking – part III

Now that we have formed our prior beliefs, we’ll summarize the posterior by computing the density on a fine grid of points. The functions mycontour and simcontour in the LearnBayes package are helpful here.

After loading LearnBayes, the logarithm of the logistic posterior is programmed in the function logisticpost. There are two arguments to this posterior, the vector theta of regression coefficients and the data which is a matrix of the form [s n x], where s is the vector of successes, n is the vector of sample sizes, and x is the vector of covariates. When we use the conditional means prior (described in the previous post), the prior is in that form, so we simply augment the data with the prior “data” and that is the input for logisticpost.

Here is the matrix prior that uses the parameters described in the previous post.

prior=rbind(c(30, 8.25 + 1.19, 8.25), c(50, 4.95 + 4.95, 4.95))

We paste the data and prior together in the matrix data.prior.

data.prior=rbind(d, prior)

After some trial and error, we find that the rectangle

(-2, 12, -0.3, 0.05) brackets the posterior. We draw a contour plot of the posterior using the mycontour function — the arguments are the function defining the log posterior, the vector of limits (xlo, xhi, ylo, yhi) and the data.prior matrix.

mycontour(logisticpost, c(-2, 12, -0.3, 0.05), data.prior, xlab=expression(beta[0]), ylab=expression(beta[1]))

Now that we have bracketed the posterior, we use the simcontour function to take a simulated sample from the posterior.

S=simcontour(logisticpost, c(-2, 12, -.3, .05), data.prior, 1000)

I place the points on the contour to demonstrate that this seems to be a reasonable sample.

points(S)

Let’s illustrate doing some inference. Suppose I’m interested in p(40), the probability of success at 40 yards. This is easy to do using the simulated sample. I compute values of p(40) from values of the simulated draws of and then construct a density estimate of the simulated draws.

p40 = exp(S$x + 40*S$y)/(1 + exp(S$x + 40*S$y)) plot(density(p40))

I’m pretty confident that the success rate from this distance is between 0.6 and 0.8.

## Modeling Field Goal Kicking — part 2

Let’s continue the football kicking example.

1. Prior beliefs. On the surface, it seems challenging to talk about prior information since the regression parameters and are not very meaningful. But we might have some beliefs about the probability of kicking a goal successful and we can use these beliefs to indirectly construct a prior on the regression coefficients.

2. Conditional means prior. We instead consider the parameters (p(30) and p(50)), the probabilities of successfully kicking a field goal at 30 yards and 50 yards. After some thought, I believe:

- The median and 90th percentile at p(30) are respectively 0.90 and 0.98 (I’m pretty confident of a successful kick at 30 yards.)
- The median and 90th percentile at p(50) are respectively 0.50 and 0.70. (I’m less confident of a successful kick of 50 yards.)

## Modeling Field Goal Kicking

Since it is (American) football season, it seemed appropriate to do a football example to illustrate summarizing a bivariate posterior.

One way of scoring points in football is by kicking field goals. The ball is lined up at a particular location on the field and the kicker attempts to kick the ball through the uprights.

We are interested in looking at the relationship between the distance of the attempt (in yards) and the result of the kick (success and failure). I collect the results of 31 field goal attempts for the Big Ten college games in the second week of the 2011 season. I read the data into R from the class web folder and display the beginning of the data frame fieldgoal:

> fieldgoal=read.csv("http://personal.bgsu.edu/~albert/MATH6480/DATA/fieldgoal.csv") > head(fieldgoal) Distance Result 1 42 1 2 54 1 3 20 1 4 42 0 5 50 1 6 38 0

A logistic model can be used to represent this data. Let denote the result of the th kick at distance — either the kick is good () or it is missed (). Let denote the probability that . A simple logistic model says that

The standard way of fitting this model is based on maximum likelihood based on the iterative reweighted least-squares algorithm. We can implement this algorithm using the R glm function with the family = binomial argument.

> glm(Result ~ Distance, family = binomial, data = fieldgoal) Coefficients: (Intercept) Distance 3.81329 -0.07549

As expected, the slope is negative — this means that it is harder to make a field goal for a longer distance.

We’ll use this example in a future post to illustrate prior modeling and “brute force” computation of the posterior.