I’m talking about Bayes’ rule now and here’s a quality control example (I think it comes from an intro Bayesian book by Schmitt that was published in the 1960’s):

A machine in a small factory is producing a particular automotive component. Most of the time (specifically, 90% from historical records), the machine is working well and produces 95% good parts. Some days, the machine doesn’t work as well (it’s broken) and produces only 70% good parts. A worker inspects the first dozen parts produced by this machine on a particular morning and obtains the following results (g represents a good component and b a bad component): g, b, g, g, g, g, g, g, g, b, g, b. The worker is interested in assessing the probability the machine is working well.

Here we have two models: W, the machine is working, and B, the machine is broken, and we initially believe P(W) = 0.9 and P(B) = 0.1. We know the characteristics of each machine. If the machine is working, it produces 95% good parts, and if the machine is broken, it produces only 70% good parts.

We use the odds form of Bayes’ rule:

Posterior Odds of W = (Prior Odds of W) x BF,

where BF (Bayes factor) is the ratio P(data | W)/P(data|B). It is convenient to express this on a log scale:

log Post Odds of W = log Prior Odds of W + log BF

Let’s illustrate how this works. Initially the prior odds of working is

log Prior Odds = log [P(W)/P(B)] = log(0.9/0.1) = 2.20.

We first observe a good part (g). The log BF is

log BF = log (P(g | W)/log(g | B)) = log(0.95/0.70) = 0.31,

so the new log odds of working is

log Posterior Odds = 2.20 + 0.31 = 2.51

so we are more confident that the machine is in good condition.

We can continue doing this in a sequential fashion. Our posterior odds becomes our new prior odds and we update this odds with each observation. If we observe a good part (g), we’ll add 0.31 to the log odds; if we observe a bad part (b), we’ll add

log [P(b | W) / P(b | B)] = log(0.05/0.30) = -1.79

to the log odds. I did this for all 12 observations. Here’s a graph of the results (first I’ll show the R code I used).

d=c(“g”,”b”,”g”,”g”,”g”,”g”,”g”,”g”,”g”,”b”,”g”,”b”)

log.BF=ifelse(d==”g”,log(.95/.70),log(.05/.30))

prior.log.odds=log(.9/.1)

plot(0:12,c(prior.log.odds,prior.log.odds+cumsum(log.BF)),

type=”l”,lwd=3,col=”red”,xlab=”OBSERVATION”,

ylab=”log odds of working”,main=”Sequential Learning by Bayes’ Rule”)

abline(h=0)