## Contour plots on R

In one of the homework problems, you are asked to do a Bayesian sensitivity analysis for a posterior probability with respect to the prior. Specifically, suppose we are interested in learning about a proportion and we wish to compute the posterior probability . We observe the data (25 successes, 40 failures) and we want to know the sensitivity of the posterior probability with respect to the prior mean and the prior precision parameter where and are the beta shape parameters.

It is easy to construct a contour plot to illustrate this sensitivity analysis on R.

1. First we set up a grid of values of and :

eta = seq(.1, .9, by=.1) K = c(1, 2, 8, 16, 32)

2. Next we write a short function that computes the posterior probability as a function of and .

myfun=function(eta, K) 1 - pbeta(.5, K*eta+ 25, K*(1-eta) + 40)

3. We now use the outer function to compute this probability on all combinations of and — the result is a matrix where the rows correspond to and the columns to .

PROB = outer(eta, K, myfun)

4. Last, we use the contour function with inputs the two vectors and the matrix of probabilities. We can specify which levels to draw on the contour graph.

contour(eta, K, PROB, levels=c(0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95), xlab="eta", ylab="K", main="P(theta > 0.5)")

## Nice R Interface

This week I’m introducing R to two different classes, one is a probability class to secondary-ed math majors and the second is a Bayesian course. Recently I learned about a nice R interface called RStudio.

I’ve included a screenshot below (you first install R and then RStudio and then launch the RStudio application).

What’s so special about RStudio?

1. RStudio partitions the screen into four sections. The bottom left is the Console window, the top left window shows R scripts, the top right window shows all the workspace variables, and the bottom right displays the help output and any plots. I like that you can see all parts of R at once.

2. It has a nice interface for importing data. One clicks on an Import Dataset button and it steps the user through the process of reading in datasets from a file or from a web location.

3. It is user friendly when one types in commands. It automatically completes left parentheses and brackets. Also it has a nice Code Completion — when you hit the tab key it will give you different options for completing the typing. Also it gives you a glimpse into the arguments of the particular function typed.

I’m sure there are more nice features. But I found RStudio easy to use and I’ve been using it all of the time the past few months.

## Using the sapply function in R

In R, I think one of the most useful functions is sapply. By using this function, one can avoid writing a loop. Let me illustrate using this function for the binomial/beta exchangeable model.

The model is

1. distributed binomial()

2. are iid beta()

3. have a vague prior

The log posterior of is programmed in the function betabinexch in the LearnBayes package.

Suppose we wish to obtain a 90% interval estimate for . To simulate from the posterior, we first simulate a value of from its posterior and then simulate from the posterior beta(

Let’s consider the famous Efron and Morris baseball data — is the number of hits in at-bats for the th player.

I’ll read in the data, get a normal approximation to the posterior (function laplace), and then do a random walk Metropolis chain — I obtain a simulated sample from .

n=rep(45,18)

data=as.matrix(cbind(y,n))

fit=laplace(betabinexch,c(-1,4),data)

mcmc.fit=rwmetrop(betabinexch,list(var=fit$var,scale=2),c(-1,4),10000,data)

{

K=exp(mcmc.fit$par[,2])

eta=exp(mcmc.fit$par[,1])/(1+exp(mcmc.fit$par[,1]))

quantile(thetaj,c(.05,.95))

}

5% 95%

0.2052578 0.3288738

[,1] [,2] [,3] [,4] [,5] [,6] [,7]

## Illustration of using hpd function

I promised my Bayesian students that I’d illustrate the hpd function in the TeachingDemos package.

Suppose I’m interested in estimating p, the proportion of undergraduate BGSU students whose home is 200 miles or further from Bowling Green. Initially, I place a uniform prior on p. I collect data from 16 students — only 1 lives 200+ miles from BG. My posterior for p is beta(2, 16).

I’ll assume that you have already installed the TeachingDemos package in R. Then I’ll simulate 10,000 draws from the beta(2, 16) distribution and use the function emp.hpd to find a 90% interval estimate (you use hpd if you actually have the quantile function of distribution available).

> library(TeachingDemos)

> p=rbeta(10000,2,16)

> emp.hpd(p,conf=0.90)

[1] 0.005806384 0.211324962

You could also find an equal-tails interval estimate:

> qbeta(c(.05,.95),2,16)

[1] 0.02131763 0.25012445

But the length of the equal-tails interval is 0.229 which is substantially larger than the length of the HPD interval (0.206). There will be a difference between the two intervals when the posterior density is asymmetric as in this situation.

## No more loops!

One aspect of the LearnBayes package that I’ve been uncomfortable with is the use of loops in my definition of the functions defining log posteriors. I had a reason for using the loops (the theta argument to the function could be a matrix), but it is generally bad programming practice. The worst aspect of the looping is that it makes the process of a writing a posterior more difficult than it really is. Looping is bad from the user’s perspective. After all, we are teaching statistics not programming, and I want to encourage people to code the posteriors for their problems using R.

Here is a simple example. Suppose your model is that y1,…, yn are independent Cauchy with location mu and scale sigma. The log posterior is given by

log g = sum (log f),

where log f is the log of the Cauchy density conditional on parameters. My old way of programming the posterior had the loop

for (i in 1:length(data))

val = val + log(dt((data[i] – mu)/sigma, df = 1)/sigma)

I think this new way is preferable. First you define the function logf for a single observation y:

logf=function(y,mu,sigma) log(dt((y-mu)/sigma,df=1)/sigma)

Then the log posterior is given by

sum(logf(data,mu,sigma))

Anyway, I think that by avoiding loops, the function for the log posterior becomes more transparent.

The new version of the LearnBayes package will contain fewer loops.

## Any R problems?

Here’s your opportunity to post any problems you’re having using R. I’ll try to respond to each problem.

## R computations for a proportion using a beta prior

Today we talked about using a beta prior to learn about a proportion. Inference about p is done by use of the beta posterior distribution and prediction about future samples is done by means of the predictive distribution.

Here are the R computations for the cell-phone example. I’ll first illustrate inference for the proportion p, and then I’ll illustrate the use of the special function pbetap (in the LearnBayes package) to compute the beta-binomial predictive distribution to learn about the number of successes in a future sample.

> library(LearnBayes)

>

> a=6.8; b=2.5 # parameters of beta prior

> n=24; y=9 # sample size and number of yes’s in sample

>

> a1=a+y; b1=b+n-y # parameters of beta posterior

>

> # I’ll illustrate different types of inferences

>

> # a point estimate is given by the posterior mean

> a1/(a1+b1)

[1] 0.4744745

>

> # or you could find the posterior median

> qbeta(.5,a1,b1)

[1] 0.4739574

>

> # a 90% interval estimate is found by use of the 5th and 95th quantiles

> # of the beta curve

> qbeta(c(.05,.95),a1,b1)

[1] 0.3348724 0.6158472

>

> # we illustrate prediction by use of the function pbetap

> # suppose we take a future sample of 20

> # how many will be driving when using a cell phone?

>

> m=20; ys=0:m

> pred.probs=pbetap(c(a1,b1),m,ys)

>

> # display the predictive probabilities

>

> cbind(ys,pred.probs)

ys pred.probs

[1,] 0 7.443708e-05

[2,] 1 6.444416e-04

[3,] 2 2.897264e-03

[4,] 3 8.968922e-03

[5,] 4 2.139155e-02

[6,] 5 4.170364e-02

[7,] 6 6.884411e-02

[8,] 7 9.841322e-02

[9,] 8 1.236003e-01

[10,] 9 1.376228e-01

[11,] 10 1.365218e-01

[12,] 11 1.208324e-01

[13,] 12 9.524434e-02

[14,] 13 6.650657e-02

[15,] 14 4.075296e-02

[16,] 15 2.159001e-02

[17,] 16 9.665296e-03

[18,] 17 3.527764e-03

[19,] 18 9.889799e-04

[20,] 19 1.901993e-04

[21,] 20 1.891124e-05

>

> # what is the probability that there are at least

> # 10 cell phone drivers in my sample?

>

> sum(pred.probs*(ys>=10))

[1] 0.4958392

>