### Archive

Archive for August, 2009

## Sequential learning by Bayes’ rule

I’m talking about Bayes’ rule now and here’s a quality control example (I think it comes from an intro Bayesian book by Schmitt that was published in the 1960’s):

A machine in a small factory is producing a particular automotive component.
Most of the time (specifically, 90% from historical records), the machine is
2.4 Sequential Learning 7
working well and produces 95% good parts. Some days, the machine doesn’t
work as well (it’s broken) and produces only 70% good parts. A worker inspects
the first dozen parts produced by this machine on a particular morning and
obtains the following results (g represents a good component and b a bad
component):
g, b, g, g, g, g, g, g, g, b, g, b.
The worker is interested in assessing the probability the machine is working
well.

A machine in a small factory is producing a particular automotive component.  Most of the time (specifically, 90% from historical records), the machine is working well and produces 95% good parts. Some days, the machine doesn’t work as well (it’s broken) and produces only 70% good parts. A worker inspects the first dozen parts produced by this machine on a particular morning and obtains the following results (g represents a good component and b a bad component):  g, b, g, g, g, g, g, g, g, b, g, b.  The worker is interested in assessing the probability the machine is working well.

Here we have two models:  W, the machine is working, and B, the machine is broken, and we initially believe P(W) = 0.9 and P(B) = 0.1.  We know the characteristics of each machine.  If the machine is working, it produces 95% good parts, and if the machine is broken, it produces only 70% good parts.

We use the odds form of Bayes’ rule:

Posterior Odds of W = (Prior Odds of W) x BF,

where BF (Bayes factor) is the ratio  P(data | W)/P(data|B).  It is convenient to express this on a log scale:

log Post Odds of W = log Prior Odds of W + log BF

Let’s illustrate how this works.  Initially the prior odds of working is

log Prior Odds = log [P(W)/P(B)] = log(0.9/0.1) = 2.20.

We first observe a good part (g).  The log BF is

log BF = log (P(g | W)/log(g | B)) = log(0.95/0.70) = 0.31,

so the new log odds of working is

log Posterior Odds = 2.20 + 0.31 = 2.51

so we are more confident that the machine is in good condition.

We can continue doing this in a sequential fashion.  Our posterior odds becomes our new prior odds and we update this odds with each observation.  If we observe a good part (g), we’ll add 0.31 to the log odds; if we observe a bad part (b), we’ll add

log [P(b | W) / P(b | B)] = log(0.05/0.30) = -1.79

to the log odds.  I did this for all 12 observations.  Here’s a graph of the results  (first I’ll show the R code I used).

d=c(“g”,”b”,”g”,”g”,”g”,”g”,”g”,”g”,”g”,”b”,”g”,”b”)
log.BF=ifelse(d==”g”,log(.95/.70),log(.05/.30))
prior.log.odds=log(.9/.1)
plot(0:12,c(prior.log.odds,prior.log.odds+cumsum(log.BF)),
type=”l”,lwd=3,col=”red”,xlab=”OBSERVATION”,
ylab=”log odds of working”,main=”Sequential Learning by Bayes’ Rule”)
abline(h=0)

After observation, the posterior log odds is -0.43 which translates to a posterior probability of  exp(-0.43)/(1+exp(-0.43)) = 0.39  that the machine is working.  The machine should be fixed.

## Why Bayes?

When I start my Bayesian class, I like to mention some reasons why this is a relevant class.  Specifically, what is wrong with frequentist inference and what can Bayes thinking add the statistician’s toolkit?

There is an article published in Science about 10 years ago titled “Bayes Offers a ‘New’ Way to Make Sense of Numbers” that you can find at

http://bayes.bgsu.edu/m6480/LECTURE%20NOTES/science.article.pdf

It does a good job selling Bayes to the public.  Here are a couple of things from the article that I mentioned in my class.

1.  Part of the motivation for considering Bayesian methods are the advances in computers and computational methods together with some limitations of frequentist methods.

2.  Bayesian conclusions are easier to understand.

3.  The FDA is currently encouraging more use of Bayesian methods for clinical trials.  One area where Bayesian methods appear to have an advantage is sequential trials where one is collecting data in time and one wishes to stop the trial when one has sufficient evidence to make a decision.

4.  P-values, one of the standard frequentist summaries, are frequently misinterpreted.  In addition, there is a strong literature that suggests that p-values typically overstate the evidence against the null hypothesis.

5. One the popular computer tools is the Microsoft animated paperclip  http://en.wikipedia.org/wiki/Office_Assistant that is driven by Bayesian methods.  But it seems that people are generally annoyed with this help device and it is going away.

Categories: General

## Welcome to MATH 6480

$g(\theta|y) \propto g(\theta) \times L(\theta)$