### Archive

Archive for September, 2009

## Help with LearnBayes?

We starting to get into the computational part of my Bayesian class, so I thought it would be helpful to point out some of the ways to get help in the special R functions in LearnBayes.

Using R scripts from my website

In the folder http://bayes.bgsu.edu/bcwr/R%20scripts/  I show the R scripts for all of the examples in the 2nd edition of the book.  You can copy and paste these scripts directly into the R console.  Or you can use the source command to bring these examples into R.  Here is an example where I illustrate the multinomial example from Chapter 4.

Using the R demo function:

Also, all of the R scripts in the examples are available as part of the LearnBayes package.

Once you have loaded in the LearnBayes package, you can see a listing of all of these examples by typing

demo()

If I want to run the multinomial demo from Chapter 4, section 3, type

demo(“Chapter.4.3”)

You may get some error messages if  you run these demos on a Macintosh.   I use special R Windows functions so you get to pause the graphical output.  I suspect there are better ways of writing these demos to adapt to the particular environment (Mac or Windows), but I haven’t figured it out yet.

Categories: LearnBayes package

## Estimating a log odds ratio

We were looking at the relationship between a student’s ACT score (lo or hi) and his GPA (lo or hi).  We constructed a prior on $\theta_1$ and $\theta_2$, where $\theta_1$ is the log odds ratio and $\theta_2$ is the sum of the log odds.  We assumed that $\theta_1$ was N(0.5, 0.3), $\theta_2$ was N(0, 2), and the two parameters were independent.

We now observe some data for a sample of students:

GPA <= 3.0  GPA > 3.0
ACT <=21      29                24
ACT > 21        15                 34

The likelihood function is given by

$L(p_1, p_2) = p_1^{29} (1-p_1)^{24} p_2^{15} (1-p_2)^{34}$.

Here is a computing strategy to summarize the posterior density.

1.  We write a R function to compute the log posterior density of $(\theta_1, \theta_2)$.

2.  We use the function mycontour to compute the log posterior density on a grid.

3.  Using the function simcontour, we simulate values of $(\theta_1, \theta_2)$ from the posterior density.

4.  We learn about the log odds ratio $\theta_1$ by summarizing the sample of $\theta_1$ values.

lctablepost=function (theta, s)
{
ilogit=function(x) exp(x)/(1+exp(x))
logitp1 = (theta[1] + theta[2])/2
logitp2 = (theta[2] – theta[1])/2
s$data[1]*ilogit(logitp1)+s$data[2]*(1-ilogit(logitp1))+
s$data[3]*ilogit(logitp2)+s$data[4]*(1-ilogit(logitp2))+
dnorm(theta[1],s$prior[1],s$prior[2],log=TRUE) +
dnorm(theta[2],s$prior[3],s$prior[4],log=TRUE)
}
d=c(29,24,15,34)
p=c(0.5,0.3,0,2)
stuff=list(data=d,prior=p)
mycontour(lctablepost,c(-1,2,-9,3),stuff,
xlab=”THETA1″,ylab=”THETA2″,col=”red”)
s=simcontour(lctablepost,c(-1,2,-9,3),stuff,1000)
points(s)

Here is the R work.  I write a function, lctablepost, that has two inputs, the vector theta ($\theta_1, \theta_2$), and a list s that has two components, data (a vector of table counts) and prior (a vector of the means and standard deviations of the two normal priors).  Note that I use a special function ilogit that computes the inverse logit of a probability.

lctablepost=function (theta, s)
{
ilogit=function(x) exp(x)/(1+exp(x))
logitp1 = (theta[1] + theta[2])/2
logitp2 = (theta[2] – theta[1])/2
s$data[1]*ilogit(logitp1)+s$data[2]*(1-ilogit(logitp1))+
s$data[3]*ilogit(logitp2)+s$data[4]*(1-ilogit(logitp2))+
dnorm(theta[1],s$prior[1],s$prior[2],log=TRUE) +
dnorm(theta[2],s$prior[3],s$prior[4],log=TRUE)
}

We define the data and the prior inputs.

d=c(29,24,15,34)
p=c(0.5,0.3,0,2)
stuff=list(data=d,prior=p)

Now we use the LearnBayes function mycontour — the bounding rectangle of (-1, 2, -9, 3) seems to work well.

mycontour(lctablepost,c(-1,2,-9,3),stuff,
xlab=”THETA1″,ylab=”THETA2″,col=”red”)

Last, I simulate 1000 values from this grid by the LearnBayes function simcontour.  The output is a list s with components x and y — so s$x are the simulated draws of $\theta_1$ and s$y are the simulated draws of $\theta_2$.

s=simcontour(lctablepost,c(-1,2,-9,3),stuff,1000)

I show the simulated sample on my contour by use of the points function.

points(s)

Categories: Bayesian computation

## Why should I buy a textbook?

I just discovered that many of my students didn’t buy my text since there is free electronic access to an older edition of my text through the library system.

That raises the question “Why purchase the textbook?”

I’l give several reasons.

1.  If you are taking a graduate course in your major, you should purchase all of the recommended texts.  After all, you are planning to be a statistician and you’ll need a collection of books that will help you in your job.

2.  There is a reason why authors write second editions of a book.  In my case, I changed the LearnBayes package, and so I wanted to revise the book to be consistent with version 2.0 of the package.  Also there are new sections and new exercises in the 2nd edition.  The student is missing useful material.  There was some confusion on the last homework, partly since the students were not reading the current edition.

3.  “Books are expensive?”  I understand a book can be expensive (actually, my text is relatively inexpensive), but it is certainly cheap relative to the cost of taking the course.

4. By not purchasing the textbook, the student is sending a message to the instructor, saying “I can get through the course without reading the book” which really is an insult to the instructor.  This is common practice among undergraduates, but I hope it isn’t common among statistics students.

Categories: General

## Constructing a prior in a 2 x 2 table

\begin{table}[ht]
\begin{center}
\begin{tabular}{rrr}
\hline
& (0,21] & (21,100] \\
\hline
(0,3] & 711 & 310 \\
(3,10] & 783 & 745 \\
\hline
\end{tabular}
\end{center}
\end{table}

At BGSU, all freshmen take a mathematics placement test and the results of the test, together with some other information, is used to advise students on the appropriate first math class to take.

The other information consists of the student’s score on the ACT Math exam and his/her high school grade point average, or GPA.  We are interested in examining the relationship between the GPA and the ACT Math score.

For the student taking Form B of the math placement exam, 63% of the students have a GPA over 3.0.    Suppose you categorize these students into two groups, those who have scored  21 or less on the ACT exam, and those who have scored over 21 on the ACT.

GPA <= 3.0  GPA > 3.0
ACT <=21      p1                  1 – p1
ACT > 21        p2                  1 – p2

How do we construct a prior on the proportions p1 and p2?    Let’s step through the process.

1.  Here it doesn’t make any sense to assign p1 and p2 independent distributions.  If I thought that if student probability of getting a low GPA is 0.37 and then you told me that the student scored low on the ACT exam, then I would think it is more likely that the student gets a low GPA.  That is, I would think the student’s chance of getting a  GPA <= 3.0 would be higher than 0.37.  Clearly, p1 and p2 should have a dependent prior.

2.  A popular way of expressing the relationship between p1 and p2 is by use of a log odds-ratio.  Let T1 = log (p1/(1-p1)) be the log odds of  p1, and let T2 = log(p2/(1-p2)) be the log odds of p2.  A log odds converts a proportion (between 0 and 1) to a real-valued quantity.  A proportion of 0.5 corresponds to a log odds of 0, a probability less than 0.5 corresponds to a negative odds ratio, and so on.  Then one compares the proportions by means of the difference in log odds, or the log odds ratio

$\theta_1 = \log \frac{p_1}{1-p_1} - \log \frac{p_2}{1-p_2} = \log \frac{p_1/(1-p_1)}{p_2/(1-p_2)}$.

3.  Suppose we transform the proportions $p_1$ and $p_2$ to the log odds ratio $\theta_1$ and the sum of log odds (or logits)

$\theta_2= \log \frac{p_1}{1-p_1} + \log \frac{p_2}{1-p_2}$,

and place independent priors on $\theta_1$ and $\theta_2$.

4.  Let’s illustrate using this prior.  I assume $\theta_1$ and $\theta_2$ are independent with

$\theta_1 \sim N(0.5, 0.3)$, $\theta_2 \sim N(0, 2)$.

The prior says that (1) I don’t know much about the sizes of the proportions (a vague prior placed on $\theta_2$), but (2) I believe that a larger proportion of the “low ACT” students will have low GPAs than the “high ACT students” (an informative prior placed on $\theta_1$).

5.  This prior induces a dependent prior on $p_1$ and $p_2$.  This is easy to illustrate using R.  First I’ll simulate 1000 draws from the posterior of $\theta_1$ and $\theta_2$ and transform these values back to the proportions $p_1$ and $p_2$.  I construct a scatterplot of ($p_1, p_2$) and put the line $p_2 = p_1$ on top.  This clearly shows the dependence in the prior and it reflects the belief that $p_1 > p_2$.

theta1=rnorm(1000,0.5,.3)
theta2=rnorm(1000,0,2)
logitp1 = (theta1 + theta2)/2
logitp2 = (theta2 – theta1)/2
p1=exp(logitp1)/(1+exp(logitp1))
p2=exp(logitp2)/(1+exp(logitp2))
plot(p1,p2,pch=19,col=”red”)
abline(0,1,lwd=3)

Categories: Priors

My recent homework in my Bayesian class was on several one-parameter problems where R was used in the posterior and predictive calculations.

There was much variation in what was turned in.  One student’s homework consisted of 37 pages and every simulated parameter value was displayed.  Another student’s turn-in was 4 pages where all of the R work was displayed (in a 2 point font) on a single page.

Here are some guidelines for what I’d like a student’s homework to look like.

1.  Homework consisting completely of R work (input and output) is clearly inappropriate.

2.  The answers to the exercise questions should be written in paragraph form with complete sentences.  Imagine that the student was supposed to report to his/her boss about what she learned.  She or he would write a report that describes in words what was learned.

3.  Obviously, I’d like to see that the student is using R functions in a reasonable way.  But I’m primarily interested in a copy of what the student entered and the relevant output.   For example, suppose the student is summarizing a beta posterior using simulation.  I don’t want to see the 1000 simulated draws, but the student could convince me that he or she is getting reasonable results by showing several summaries, such as a posterior mean and posterior standard deviation.

4.  If I assign a homework with 8 exercises, then I think that 3 pages is too brief (not enough said), but over 20 pages indicates too much irrelevant R output is included.  The student needs a reasonable balance.  Maybe 10 pages would be an optimal length of a turn-in — maybe longer if the student wishes to include some graphs.

By the way, it was interesting how one particular question was answered.

In Chapter 2, exercise 4, I asked the student to contrast predictions using two different priors, one a discrete one, and the second a beta prior.  Most of the students were successful in computing the predictive probabilities using the two priors.  But there were different comparisons that were done.

1.  DISPLAY?  Some students just displayed the two probability distributions and said they were similar.  Let’s say that this approach wasn’t that persuasive.

2.  GRAPH?  Some students graphed the two sets of predictive probabilities on a single graph.  Assuming the graph is readable, that is a much better idea.  One can quickly see if the distributions are similar by looking at the graph.

3.  SUMMARIZE?  Another approach is to compare the two distributions by summarizing each distribution in some way.  For example, one could compute the mean and standard deviation of each distribution?  Or one could compute a 90% predictive interval for each distribution?

What would I prefer?  It is pretty obvious that simply displaying the two probability distributions is not enough.  I think graphing the two distributions and summarizing the distributions is a good strategy.  Otherwise, you really aren’t answering the question of whether the two distributions are similar.

Categories: General

## Illustration of using hpd function

I promised my Bayesian students that I’d illustrate the hpd function in the TeachingDemos package.

Suppose I’m interested in estimating p, the proportion of undergraduate BGSU students whose home is 200 miles or further from Bowling Green.  Initially, I place a uniform prior on p.  I collect data from 16 students — only 1 lives 200+ miles from BG.  My posterior for p is beta(2, 16).

I’ll assume that you have already installed the TeachingDemos package in R.  Then I’ll simulate 10,000 draws from the beta(2, 16) distribution and use the function emp.hpd to find a 90% interval estimate (you use hpd if you actually have the quantile function of distribution available).

> library(TeachingDemos)
> p=rbeta(10000,2,16)
> emp.hpd(p,conf=0.90)
[1] 0.005806384 0.211324962

You could also find an equal-tails interval estimate:

> qbeta(c(.05,.95),2,16)
[1] 0.02131763 0.25012445

But the length of the equal-tails interval is 0.229 which is substantially larger than the length of the HPD interval (0.206).  There will be a difference between the two intervals when the posterior density is asymmetric as in this situation.

Categories: Bayesian software, R work

## Poisson mean example, part II

In the last post, I described the process of determining a prior for $\lambda$, my son’s average number of text messages per day.   I decided to model my beliefs with a gamma(a, b) density with a = 44.4 and b = 4.4.

Now I observe some data.  I looked at the online record of text messaging for the first seven days that’s been at school and observe the counts

Sat  Sun  Mon  Tue  Wed  Thu  Fri
19     4      26      17    15       0    17

If we assume these counts are Poisson with mean $\lambda$, then the likelihood function is given by

L(lambda) =  lambda^s  exp(-n lambda),

where $n$ is the number of observations (7) and $s$ is the sum of the observations ( 98).

The posterior density (using the prior times likelihood recipe) is given by

L(lambda) x g(lambda) = lambda^{a+s-1}  exp(-(b+n) lambda),

which we recognize as a gamma density with shape $a_1 = a + s = 142.4$ and rate $b_1 = b + n = 11.4$.

Using the following R commands, we construct a triplot (prior, likelihood, and posterior).

> a=44.4; b=4.4
> s=sum(y); n=length(y)

> a=44.4; b=4.4
> s=sum(y); n=length(y)
> curve(dgamma(x,shape=s+a,rate=n+b),col=”red”,xlab=”LAMBDA”,
+   ylab=”DENSITY”,lwd=3,from=3,to=25)
I would like to predict my son’s text message use in the next month.  Let $y_1^*, ..., y_{30}^*$ denote the number of text messages in the next month.  The total number of messages $s^*$ has, conditional on $\lambda$, a Poisson distribution with mean $30 \lambda$.
I am interested in computing the posterior predictive  distribution for $s^*$.  One can simulate this distribution by (1) simulating a value of the parameter $\lambda$ from the gamma(142.4, 11.4) posterior and then (2) simulating $s^*$ from a Poisson distribution with mean $30 \lambda$.
We can do this simulation in R in two lines.   In the last line, I tabulate the values of $s^*$ and plot the counts.