### Archive

Archive for September, 2007

## Evolution of statistical thought

One of you asked for some Bayesian/frequentist humor. This cartoon describes the historical evolution of statistical thinking. (This was taken from Mike West’s website.)

Categories: General

## Any R problems?

Here’s your opportunity to post any problems you’re having using R. I’ll try to respond to each problem.

Categories: R work

## Test of hypothesis that coin is fair

In Section 3.5, I describe a Bayesian test of the hypothesis H that a proportion is equal to 0.5. The R function pbetat.R (in the LearnBayes package) computes the posterior probability of H assuming a beta prior on the alternative hypothesis. We illustrate how the posterior probability depends on the beta parameter a.

Here’s a quick way of constructing Figure 3.5 (page 52) using the curve function.

First we revise the function pbetat to accept a matrix argument for the beta parameters where each row of the matrix corresponds to a pair (a, b). The new function is called pbetat.v. Next, we write a short function best that computes the posterior probability of H for our example (5 successes and 15 failures) for a vector of values of log a. (We are assuming a symmetric beta(a, a) prior on the alternative hypothesis.)

best=function(loga)
{p0=.5; prob=.5

data=c(5,15)

AB=exp(cbind(loga,loga))

s=pbetat.v(p0,prob,AB,data)
return(s\$post)}

To generate the figure, we can use the curve function.

curve(best,from=-4,to=5,xlab=”log a”,ylab=”Prob(coin is fair)”,lwd=3)

We see that for all values of the beta parameter a, the posterior probability of the hypothesis exceeds 0.2.

## Illustration of a gui for the Bayesian triplot

Richard Gonzalez found my triplot.R function and enhanced it by adding a graphical user interface. By the use of sliders, one can change the beta parameters a and b, the data values s and f, and see the effect of the changes on the triplot (prior, likelihood, and posterior) and the predictive distribution.

Here are a couple of illustrations.
In the first example, the prior is beta(3, 7) and one observes 6 successes and 14 failures. The prior information is consistent with the data information and the observed number of successes is in the middle of the predictive distribution.

In the second example, the beta(6, 2) prior is in conflict with the data (s=6, f=14) and the observed number of successes is in the tail of the predictive distribution.

## Brute-force computation of a posterior

Suppose we observe y that is normal with mean theta and standard deviation sigma. Instead of using a conjugate prior, suppose that theta has a t distribution with location mu, scale tau, and degrees of freedom df. Although there is not a nice form for the posterior density, it is straightforward to compute the posterior by use of the “prior x likelihood” recipe. We write a function post.norm.t.R that computes the posterior.

# we source this function into R

source(url(“http://bayes.bgsu.edu/m648/post.norm.t.R&#8221;))

# define parameters of problem

s=list(y=125,sigma=15/2,mu=100,tau=6.85,df=2)

# set up grid of values of theta

theta=seq(80,160,length=100)

# compute the posterior on the grid

post=post.norm.t(theta,s)

# convert the posterior value to probabilities

post.prob=post/sum(post)

# sample from discrete distribution on grid

sim.theta=sample(theta,size=10000,replace=TRUE,prob=post.prob)

# construct a histogram of simulated sample
# and place exact posterior on top

hist(sim.theta, freq=FALSE)
d=diff(theta[1:2])
con=sum(d*post) # this is normalizing constant
lines(theta,post/con)

From the simulated sample, we can compute any summary of the posterior of interest.

## Inference for a Poisson Rate

In 2006, Ryan Howard had 58 home runs in 581 at bats. Suppose the number of home runs y is distributed Poisson with mean ab lambda, where lambda is the true home run rate. If we assume the usual noninformative prior, the posterior for lambda is Gamma with shape alpha=y and rate ab.

To learn about lambda, we simulate 1000 draws from the Gamma posterior. We construct a density estimate of the simulated draws and then find a 90% interval estimate by finding the 5th and 95th percentiles of the draws.

> y=58
> t=581
> lambda=rgamma(1000,shape=y,rate=t)

> plot(density(lambda),xlab=”LAMBDA”,main=”POSTERIOR”)
> quantile(lambda,c(.05,.95))
5% 95%

0.07897515 0.12170184

We see that a 90% interval estimate for lambda is (.079, .122).

Next, suppose that Howard has 550 at-bats in 2007 — how many home runs will he hit? We learn about the future number of home runs ys by use of the posterior predictive distribution. In the R output, we (1) simulate 1000 draws from the posterior predictive distribution, (2) tabulate the values by use of the table function, (3) construct a graph of this distribution, and (4) construct a 90% interval estimate for ys by use of the discint function (from the LearnBayes package).

> ys=rpois(1000,500*lambda)
> T=table(ys)

> plot(T,main=”POSTERIOR PREDICTIVE”)
>

> ys=as.integer(names(T))

> freq=as.integer(T)
> freq=freq/sum(freq)

> dist=cbind(ys,freq)

> discint(dist,.9)

\$prob

[1] 0.905

\$set

[1] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

[26] 61 63 64 65 66 67

Categories: Single parameter

## Plot of two distributions for proportion inference

In tomorrow’s class, I’ll hand out the graph below that shows a Bayesian “triplot” (showing the likelihood, prior, and posterior) and the prior predictive distribution. The triplot is useful to show how the the two types of information (data and prior) are combined. The predictive plot is helpful in judging the suitability of the Bayesian model.

I wrote two short R functions, triplot and predplot, that construct the graphs. I assume that you have already loaded the LearnBayes package.

> library(LearnBayes)
> source(url(“http://bayes.bgsu.edu/m648/triplot.R&#8221;))
> source(url(“http://bayes.bgsu.edu/m648/predplot.R&#8221;))
> prior=c(6.8,2.5)
> data=c(9,15)
> n=sum(data)
> par(mfrow=c(2,1))
> triplot(prior,data)
> predplot(prior,n,data[1])

Categories: Single parameter