## Sequential Learning in Bayes’ Rule

I have pasted the R code for implementing the quality control example I talked about in class

` `

########################################################

# Illustration of function bayes to illustrate

# sequential learning in Bayes’ rule

# Jim Albert, August 2013

########################################################

bayes <- function(prior, likelihood, data){

probs <- matrix(0, length(data) + 1, length(prior))

dimnames(probs)[[1]] <- c(“prior”, data)

dimnames(probs)[[2]] <- names(prior)

probs[1, ] <- prior

for(j in 1:length(data))

probs[j+1, ] <- probs[j, ] * likelihood[, data[j]] /

sum(probs[j, ] * likelihood[, data[j]])

dimnames(probs)[[1]] <-

paste(0:length(data), dimnames(probs)[[1]])

data.frame(probs)

}

# quality control example

# machine is either working or broken with prior probs .9 and .1

prior <- c(working = .9, broken = .1)

# outcomes are good (g) or broken (b)

# likelihood matrix gives probs of each outcome for each model

like.working <- c(g=.95, b=.05)

like.broken <- c(g=.7, b=.3)

likelihood <- rbind(like.working, like.broken)

# sequence of data outcomes

data <- c(“g”, “b”, “g”, “g”, “g”, “g”, “g”, “g”, “g”, “b”, “g”, “b”)

# function bayes will computed the posteriors, one datum at a time

# inputs are the prior vector, likelihood matrix, and vector of data

posterior <- bayes(prior, likelihood, data)

posterior

# graph the probs of working as a function of the data observation

plot(posterior$working, type=’b’)

###################################################

Here is the output — the posterior probabilities and the graph.

working broken

0 prior 0.9000000 0.10000000

1 g 0.9243243 0.07567568

2 b 0.6705882 0.32941176

3 g 0.7342373 0.26576271

4 g 0.7894495 0.21055055

5 g 0.8357571 0.16424288

6 g 0.8735119 0.12648812

7 g 0.9035891 0.09641094

8 g 0.9271111 0.07288893

9 g 0.9452420 0.05475796

10 b 0.7420707 0.25792933

11 g 0.7961074 0.20389264

12 b 0.3942173 0.60578268

## Bayes Rule in R

On Friday, I gave several examples of Bayes’ rule in class. I’ll do a slight generalization of the testing for a disease example to illustrate using a special R function bayes to do the calculations.

This special function can be read into R directly from my web site.

> `source("http://personal.bgsu.edu/~albert/MATH6480/R/bayes.R")`

This function has three inputs prior, likelihood, and data.

1. The prior. Here there are two models — either we have the disease (D) or we don’t (not D) — and suppose that the probability of D is only 0.0001. prior is a vector of probabilities, where we give names to the two values:

`> prior = c(D = .0001, not.D = 1 - .0001)`

2. The likelihood. Here the data is the test result (pos or neg) where the test result is incorrect with probability 0.01. We’ll specify the likelihood as a matrix where each row of the matrix gives the probability of pos and neg for each model.

`> like.D = c(pos=.99, neg=.01)`

> like.not.D = c(pos=.01, neg=.99)

> likelihood = rbind(D = like.D, not.D = like.not.D)

> likelihood

pos neg

D 0.99 0.01

not.D 0.01 0.99

3. data is a vector of data values. Suppose we get a positive test result (pos). Then

`> data="pos"`

So we enter as inputs prior, likelhood, and data — the output is a matrix — the first row gives the prior probs and the second row gives the posterior probabilities.

`> bayes(prior, likelihood, data)`

```
```

` D not.D`

prior 0.000100000 0.9999000

pos 0.009803922 0.9901961

Although the positive blood test result is “bad news”, it is still very unlikely that you have the disease.

## Sequential learning by Bayes’ rule

I’m talking about Bayes’ rule now and here’s a quality control example (I think it comes from an intro Bayesian book by Schmitt that was published in the 1960’s):

A machine in a small factory is producing a particular automotive component. Most of the time (specifically, 90% from historical records), the machine is working well and produces 95% good parts. Some days, the machine doesn’t work as well (it’s broken) and produces only 70% good parts. A worker inspects the first dozen parts produced by this machine on a particular morning and obtains the following results (g represents a good component and b a bad component): g, b, g, g, g, g, g, g, g, b, g, b. The worker is interested in assessing the probability the machine is working well.

Here we have two models: W, the machine is working, and B, the machine is broken, and we initially believe P(W) = 0.9 and P(B) = 0.1. We know the characteristics of each machine. If the machine is working, it produces 95% good parts, and if the machine is broken, it produces only 70% good parts.

We use the odds form of Bayes’ rule:

Posterior Odds of W = (Prior Odds of W) x BF,

where BF (Bayes factor) is the ratio P(data | W)/P(data|B). It is convenient to express this on a log scale:

log Post Odds of W = log Prior Odds of W + log BF

Let’s illustrate how this works. Initially the prior odds of working is

log Prior Odds = log [P(W)/P(B)] = log(0.9/0.1) = 2.20.

We first observe a good part (g). The log BF is

log BF = log (P(g | W)/log(g | B)) = log(0.95/0.70) = 0.31,

so the new log odds of working is

log Posterior Odds = 2.20 + 0.31 = 2.51

so we are more confident that the machine is in good condition.

We can continue doing this in a sequential fashion. Our posterior odds becomes our new prior odds and we update this odds with each observation. If we observe a good part (g), we’ll add 0.31 to the log odds; if we observe a bad part (b), we’ll add

log [P(b | W) / P(b | B)] = log(0.05/0.30) = -1.79

to the log odds. I did this for all 12 observations. Here’s a graph of the results (first I’ll show the R code I used).

d=c(“g”,”b”,”g”,”g”,”g”,”g”,”g”,”g”,”g”,”b”,”g”,”b”)

log.BF=ifelse(d==”g”,log(.95/.70),log(.05/.30))

prior.log.odds=log(.9/.1)

plot(0:12,c(prior.log.odds,prior.log.odds+cumsum(log.BF)),

type=”l”,lwd=3,col=”red”,xlab=”OBSERVATION”,

ylab=”log odds of working”,main=”Sequential Learning by Bayes’ Rule”)

abline(h=0)

After observation, the posterior log odds is -0.43 which translates to a posterior probability of exp(-0.43)/(1+exp(-0.43)) = 0.39 that the machine is working. The machine should be fixed.