## Discrete Bayes, part II

In the previous post, I used the R function dexp in my discrete Bayes function. Sometimes you need to make slight changes to the functions in the R stats package for a given problem.

Here is a finite population sampling problem. A small community has 100 voters and you are interested in learning about the number, say M, who will vote for a school levy. You take a random sample of size 20 (without replacement) and 12 are in support of the levy. What have you learned about M?

Here the sampling density is hypergeometric — the probability that will be in favor out of 20 given that M in the population favor the levy — this is given by the hypergeometric probability

There is a function dhyper that computes this probability, but uses a different parametrization. So I write a new function dhyper2 that uses my parametrization.

dhyper2=function(y,M,sample.size=n,pop.size=N)

dhyper(y,M,N-M,n)

## Discrete Bayes

One good way of introducing Bayesian inference is by the use of discrete priors. I recently wrote a simple generic R function that does discrete Bayes for arbitrary choice of a prior and sampling density. I’ll illustrate this function here and in the next posting.

Suppose I’m interested in learning about the rate parameter of an exponential density. We observe a sample from the density defined by

where we are interested in learning about the rate parameter .

In R, I read in the function “discrete.bayes” and a few associated methods by typing

> source(“http://bayes.bgsu.edu/m6480/R/discrete.bayes.functions.R”)

## Sequential learning by Bayes’ rule

I’m talking about Bayes’ rule now and here’s a quality control example (I think it comes from an intro Bayesian book by Schmitt that was published in the 1960’s):

A machine in a small factory is producing a particular automotive component. Most of the time (specifically, 90% from historical records), the machine is working well and produces 95% good parts. Some days, the machine doesn’t work as well (it’s broken) and produces only 70% good parts. A worker inspects the first dozen parts produced by this machine on a particular morning and obtains the following results (g represents a good component and b a bad component): g, b, g, g, g, g, g, g, g, b, g, b. The worker is interested in assessing the probability the machine is working well.

Here we have two models: W, the machine is working, and B, the machine is broken, and we initially believe P(W) = 0.9 and P(B) = 0.1. We know the characteristics of each machine. If the machine is working, it produces 95% good parts, and if the machine is broken, it produces only 70% good parts.

We use the odds form of Bayes’ rule:

Posterior Odds of W = (Prior Odds of W) x BF,

where BF (Bayes factor) is the ratio P(data | W)/P(data|B). It is convenient to express this on a log scale:

log Post Odds of W = log Prior Odds of W + log BF

Let’s illustrate how this works. Initially the prior odds of working is

log Prior Odds = log [P(W)/P(B)] = log(0.9/0.1) = 2.20.

We first observe a good part (g). The log BF is

log BF = log (P(g | W)/log(g | B)) = log(0.95/0.70) = 0.31,

so the new log odds of working is

log Posterior Odds = 2.20 + 0.31 = 2.51

so we are more confident that the machine is in good condition.

We can continue doing this in a sequential fashion. Our posterior odds becomes our new prior odds and we update this odds with each observation. If we observe a good part (g), we’ll add 0.31 to the log odds; if we observe a bad part (b), we’ll add

log [P(b | W) / P(b | B)] = log(0.05/0.30) = -1.79

to the log odds. I did this for all 12 observations. Here’s a graph of the results (first I’ll show the R code I used).

d=c(“g”,”b”,”g”,”g”,”g”,”g”,”g”,”g”,”g”,”b”,”g”,”b”)

log.BF=ifelse(d==”g”,log(.95/.70),log(.05/.30))

prior.log.odds=log(.9/.1)

plot(0:12,c(prior.log.odds,prior.log.odds+cumsum(log.BF)),

type=”l”,lwd=3,col=”red”,xlab=”OBSERVATION”,

ylab=”log odds of working”,main=”Sequential Learning by Bayes’ Rule”)

abline(h=0)

After observation, the posterior log odds is -0.43 which translates to a posterior probability of exp(-0.43)/(1+exp(-0.43)) = 0.39 that the machine is working. The machine should be fixed.

## Using a Mixture of Conjugate Priors

One way to extend the use of conjugate priors is by the use of discrete mixtures. Here is a simple example. Suppose you have a coin that you believe may be fair or biased towards heads. If p represents the probability of flipping heads, then suppose your prior is

g(p) = 0.5 beta(p, 20, 20) + 0.5 beta(p, 30, 10)

Suppose that we flip the coin 30 and get 20 heads. It can be shown that the posterior density can also be represented by a mixture of beta distributions.

I have written a short R function pbetamix that computes the posterior distribution when a proportion has a prior that is a mixture of beta distributions.

The matrix bpar contains the beta parameters where each row gives the beta parameters for each component. The vector prob gives the prior probabilities of the components. The vector data contains the number of successes and failures. Here are the values of these components for the example.

> bpar=rbind(c(20,20),c(30,10))

> prob=c(.5,.5)

> data=c(20,10)

We use the function pbetamix with the inputs prob, bpar, and data. The output is the posterior probabilities of the components and the beta parameters of each component.

> pbetamix(prob,bpar,data)

$probs

[1] 0.3457597 0.6542403

$betapar

[,1] [,2]

[1,] 40 30

[2,] 50 20

The posterior density is

g(p | data) = 0.346 beta(p, 40, 30) + 0.654 beta(p, 50, 20)

We plot the prior and posterior densities

> plot(p,.5*dbeta(p,30,30)+.5*dbeta(p,30,10),type=”l”,ylim=c(0,5),col=”red”,lwd=2)

> lines(p,.346*dbeta(p,40,30)+.654*dbeta(p,50,20),lwd=2,col=”blue”)

> text(locator(n=1),”PRIOR”)

> text(locator(n=1),”POSTERIOR”)

Here the data (20 heads and 10 tails) is more supportive of the belief that the coin is biased and the posterior places more of its mass where p > .5.

## Test of hypothesis that coin is fair

In Section 3.5, I describe a Bayesian test of the hypothesis H that a proportion is equal to 0.5. The R function pbetat.R (in the LearnBayes package) computes the posterior probability of H assuming a beta prior on the alternative hypothesis. We illustrate how the posterior probability depends on the beta parameter a.

Here’s a quick way of constructing Figure 3.5 (page 52) using the curve function.

First we revise the function pbetat to accept a matrix argument for the beta parameters where each row of the matrix corresponds to a pair (a, b). The new function is called pbetat.v. Next, we write a short function best that computes the posterior probability of H for our example (5 successes and 15 failures) for a vector of values of log a. (We are assuming a symmetric beta(a, a) prior on the alternative hypothesis.)

best=function(loga)

{p0=.5; prob=.5

data=c(5,15)

AB=exp(cbind(loga,loga))

s=pbetat.v(p0,prob,AB,data)

return(s$post)}

To generate the figure, we can use the curve function.

curve(best,from=-4,to=5,xlab=”log a”,ylab=”Prob(coin is fair)”,lwd=3)

We see that for all values of the beta parameter a, the posterior probability of the hypothesis exceeds 0.2.

## Illustration of a gui for the Bayesian triplot

Richard Gonzalez found my triplot.R function and enhanced it by adding a graphical user interface. By the use of sliders, one can change the beta parameters a and b, the data values s and f, and see the effect of the changes on the triplot (prior, likelihood, and posterior) and the predictive distribution.

Here are a couple of illustrations.

In the first example, the prior is beta(3, 7) and one observes 6 successes and 14 failures. The prior information is consistent with the data information and the observed number of successes is in the middle of the predictive distribution.

In the second example, the beta(6, 2) prior is in conflict with the data (s=6, f=14) and the observed number of successes is in the tail of the predictive distribution.

## Brute-force computation of a posterior

Suppose we observe y that is normal with mean theta and standard deviation sigma. Instead of using a conjugate prior, suppose that theta has a t distribution with location mu, scale tau, and degrees of freedom df. Although there is not a nice form for the posterior density, it is straightforward to compute the posterior by use of the “prior x likelihood” recipe. We write a function post.norm.t.R that computes the posterior.

# we source this function into R

source(url(“http://bayes.bgsu.edu/m648/post.norm.t.R”))

# define parameters of problem

s=list(y=125,sigma=15/2,mu=100,tau=6.85,df=2)

# set up grid of values of theta

theta=seq(80,160,length=100)

# compute the posterior on the grid

post=post.norm.t(theta,s)

# convert the posterior value to probabilities

post.prob=post/sum(post)

# sample from discrete distribution on grid

sim.theta=sample(theta,size=10000,replace=TRUE,prob=post.prob)

# construct a histogram of simulated sample

# and place exact posterior on top

hist(sim.theta, freq=FALSE)

d=diff(theta[1:2])

con=sum(d*post) # this is normalizing constant

lines(theta,post/con)

From the simulated sample, we can compute any summary of the posterior of interest.