## How Many Electoral Votes will Obama Get?

Yesterday Chris Rump at BGSU gave an interesting presentation about simulating the 2008 Presidential Election. He was explaining the methodology used by Nate Silver in the fivethirtyeight.com site.

Here is a relatively simple Bayesian approach for estimating the number of electoral votes that Barack Obama will get in the election on Tuesday.

First, using the polling data on cnn.com, I collected the percentages for McCain and Obama in the latest poll in each state. The web site only gives the survey percentages and not the sample sizes. A typical sample size in an election size is 1000 — I will assume that each sample size is 500. This is conversative and it allows for some changes in voting behavior in the weeks before Election Day.

Suppose 500 voters in Ohio are sampled and 47% are for McCain and 51% are for Obama — this means that 235 and 255 voters were for the two candidates. Let p.M and p.O denote the proportion of the voting population in Ohio for the two candidates — 1 – p.M – p.O denote the proportion of the population for someone else. Assuming a vague prior on (p.M, p.O, 1-p.M-p.O), the posterior distribution for the proportions is proportional to

p.M^ 235 p.O^255 (1-p.M – p.O)^10

which is a Dirichlet distribution. The probability that McCain wins the election is simply the posterior probability

P(p.M > p.O)

For each state, I can easily estimate this probability by simulation. One simulates 5000 draws from a Dirichlet distribution and computes the proportion of draws where p.M > p.O.

The following table summarizes my calculations. For each state, I give the percentage of voters for McCain and Obama in the latest poll and my computed probability that McCain wins the state based on this data.

State M.pct O.pct prob.M.wins EV

1 Alabama 58 36 1.000 9

2 Alaska 55 37 1.000 3

3 Arizona 53 46 0.946 10

4 Arkansas 53 41 0.997 6

5 California 33 56 0.000 55

6 Colorado 45 53 0.032 9

7 Connecticut 31 56 0.000 7

8 Delaware 38 56 0.000 3

9 D.C. 13 82 0.000 3

10 Florida 47 51 0.189 27

11 Georgia 52 47 0.869 15

12 Hawaii 32 63 0.000 4

13 Idaho 68 26 1.000 4

14 Illinois 35 59 0.000 21

15 Indiana 45 46 0.416 11

16 Iowa 42 52 0.009 7

17 Kansas 63 31 1.000 6

18 Kentucky 55 39 1.000 8

19 Louisiana 50 43 0.949 9

20 Maine 35 56 0.000 4

21 Maryland 39 54 0.000 10

22 Massachusetts 34 53 0.000 12

23 Michigan 36 58 0.000 17

24 Minnesota 38 57 0.000 10

25 Mississippi 46 33 1.000 6

26 Missouri 50 48 0.675 11

27 Montana 48 44 0.825 3

28 Nebraska 43 45 0.329 5

29 Nevada 45 52 0.058 5

30 New Hampshire 39 55 0.000 4

31 New Jersey 36 59 0.000 15

32 New Mexico 40 45 0.117 5

33 New York 31 62 0.000 31

34 North Carolina 46 52 0.088 15

35 North Dakota 43 45 0.318 3

36 Ohio 47 51 0.182 20

37 Oklahoma 61 34 1.000 7

38 Oregon 34 48 0.000 7

39 Pennsylvania 43 55 0.004 21

40 Rhode Island 31 45 0.000 4

41 South Carolina 59 37 1.000 8

42 South Dakota 48 41 0.951 3

43 Tennessee 55 39 1.000 11

44 Texas 57 38 1.000 34

45 Utah 55 32 1.000 5

46 Vermont 36 57 0.000 3

47 Virginia 44 53 0.022 13

48 Washington 34 55 0.000 11

49 West Virginia 53 44 0.978 5

50 Wisconsin 42 53 0.007 10

51 Wyoming 58 32 1.000 3

Once we have these win probabilities for all states, it is easy to simulate the election. Essentially one flips 51 biased coins where the probability that McCain wins are given by these win probabilities. Once you have simulated the state winners, one can accumulate the electoral votes for the two candidates. I’ll focus on the electoral count for Obama since he is predicted to win the election.

I repeated this process for 5000 simulated elections. Here is a histogram of the Obama electoral count. Note that all of the counts exceed 300 indicating that the probability that Obama wins the election is 1.

## Summarizing a posterior

Continuing our selected data example, remember that we have programmed the posterior of the transformed parameters theta1 and theta2 in the R function kaminsky.

To find a normal approximation to the posterior, we apply the function laplace in the LearnBayes package. The inputs are (1) the function defining the log posterior, (2) a starting guess at the mode, (3) the number of iterations of the Newton algorithm, and (4) the data vector (the 5th and 15th order statistics).

> start=array(c(-2,-1),c(1,2))

> fit.laplace=laplace(kaminsky,start,10,data)

The output is a list containing mode, the value of the posterior mode, and var, the estimate at the variance-covariance matrix.

> fit.laplace$mode

[,1] [,2]

[1,] -2.367745 -1.091989

> fit.laplace$var

[,1] [,2]

[1,] 0.3201467 0.1191059

[2,] 0.1191059 0.1191059

We can get more accurate summaries of the posterior by means of a Metropolis random walk algorithm. The function rwmetrop implements this algorithm for an arbitrary posterior. To use this function, we define “proposal”, a list containing the variance and scale parameter for the normal proposal density, the starting value for the MCMC chain, the number of simulated draws, and the data vector. Note that we are using the approximate variance-covariance matrix from laplace in the proposal density for rwmetrop.

> proposal=list(var=fit.laplace$var,scale=2)

> fit.mcmc=rwmetrop(kaminsky,proposal,start,10000,data)

The output of rwmetrop is a list containing accept, the acceptance rate for the chain, and par, the matrix of simulated draws.

At this point, we should run some convergence diagnostics to see if the simulated draws show sufficient mixing and don’t display unusually high autocorrrelations. For this example, the acceptance rate is about 29% which is within the acceptable range for this algorithm.

We display the simulated draws on top of the contour plot of theta1 and theta2 — it seems that that most the simulated draws fall within the first contour line.

> mycontour(kaminsky,c(-5,0,-2.5,1),data)

> title(xlab=”LOG(Y5-MU)”,ylab=”LOG BETA”)

> points(fit.mcmc$par[,1],fit.mcmc$par[,2])

We are interested in the parameters mu and beta. We first compute vectors of simulated draws of mu and beta by transforming back the simulated draws of theta1 and theta.

> MU=data[1]-exp(fit.mcmc$par[,1])

> BETA=exp(fit.mcmc$par[,2])

We display the marginal posteriors of mu and beta.

> par(mfrow=c(2,1))

> plot(density(MU),lwd=3,main=”POSTERIOR OF MU”,xlab=”MU”)

> plot(density(BETA),lwd=3,main=”POSTERIOR OF BETA”,xlab=”BETA”)

We construct 90% interval estimates by extracting quantiles from the collection of simulated draws.

> quantile(MU,c(.05,.95))

5% 95%

9.864888 10.065765

> quantile(BETA,c(.05,.95))

5% 95%

0.2012635 0.6520858

Last, suppose we are interested in predicting the 5th order statistic ys5 from a future sample of 20 observations.

To simulate from the distribution of ys5, we (1) simulate (mu, beta) from the posterior and then (2) simulate a future sample y1,…,y20 from the exponential distribution with parameters mu and beta, and (3) storing the 5th ordered observation from the simulated sample. We repeat this process 1000 times, obtaining a simulated sample from ys5. We display this predictive distribution by a histogram.

ys5=rep(0,1000)

for (j in 1:1000)

{

ys=rexp(20,rate=1/BETA[5000+j])+MU[5000+j]

ys5[j]=sort(ys)[5]

}

hist(ys5,col=”orange”)

## Learning from selected order statistics

To illustrate some computational methods for summarizing a posterior, I describe a missing data problem motivated from my undergraduate days at Bucknell. Ken Kaminsky and Paul Nelson, two of my Bucknell professors, were interested in learning about populations based on selected order statistics. (I wrote an undergraduate honors thesis on this topic. ) Here is a simple illustration of the problem.

Suppose a sample y1, …, y20 is taken from the two-parameter exponential distribution of the form f(y | mu, beta) =1/beta exp(-(y-mu)/beta), y > mu. But you don’t observe the complete dataset — all you observe are the two order statistics y(5) and y(15) (the order statistics are the observations arranged in ascending order).

Based on this selected data, we wish to (1) estimate the parameters mu and beta by 90% interval estimates and (2) predict the value of the order statistics y*(5) and y*(20) from a future sample taken from the same population.

Here’s the plan:

1. First, we write the likelihood which is the density of the observed data (y(5) and y(20)) given values of the exponential parameters mu and beta. One can show that this likelihood is given by

L(mu, beta) = f(y(5)) f(y(15)) F(y(5))^4 (F(y(15) )-F(y(5)))^9 (1- P(y(15)))^5, mu>0, beta>0.

2. Assuming a flat (uniform) prior on (mu, beta), the posterior density is proportional to the likelihood. We write a R function kaminsky0.R that computes the logarithm of the posterior — here the parameters are (mu, beta) and the data is (y(5), y(15)).

kaminsky0=function(theta,data)

{

f=function(y,mu,beta)

return(dexp(y-mu,rate=1/beta))

F=function(y,mu,beta)

return(pexp(y-mu,rate=1/beta))

y5=data[1]; y15=data[2]

mu=theta[,1]

beta=theta[,2]

loglike=log(f(y5,mu,beta))+log(f(y15,mu,beta))+

4*log(F(y5,mu,beta))+9*log(F(y15,mu,beta)-F(y5,mu,beta))+

5*log(1-F(y15,mu,beta))

return(loglike)

}

3. Graphing the posterior of (mu, beta), we see strong skewness in both parameters.

It is usually helpful to transform to real-valued parameters

theta1 = log(y(5) – mu) , theta1 = log(beta).

We write the following function kaminsky.R that computes the log posterior of (theta1, theta2).

kaminsky=function(theta,data)

{

f=function(y,mu,beta)

return(dexp(y-mu,rate=1/beta))

F=function(y,mu,beta)

return(pexp(y-mu,rate=1/beta))

y5=data[1]; y15=data[2]

mu=y5-exp(theta[,1])

beta=exp(theta[,2])

loglike=log(f(y5,mu,beta))+log(f(y15,mu,beta))+

4*log(F(y5,mu,beta))+9*log(F(y15,mu,beta)-F(y5,mu,beta))+

5*log(1-F(y15,mu,beta))

logjack=theta[,1]+theta[,2]

return(loglike+logjack)

}

Here’s a graph of the posterior of the reexpressed parameters — note that it is much more normal-shaped.

4. We’ll use several functions in the next posting to summarize the posterior.

(a) The laplace function is useful in finding the posterior mode and normal approximation to the posterior.

(b) By use of the rwmetrop function, we construct a random-walk Metropolis algorithm to simulate from the joint posterior.

## Tribute to the Phillies

As some of you might know, the Philadelphia Phillies are in the Major League Baseball playoffs which is pretty amazing. So we’ll have to fit a model to some Phillies data. For each game of the 2007 season, we’ll record

(1) if they won or lost the game

(2) the margin of victory which is equal to the winners score minus the losers score

We are interested in exploring the relationship between these two variables. Suppose we classify the margin of victory as “close” (3 runs or less) or a “blowout” (4 runs or more). Here is a 2 x 2 contingency table classifying all games by result and margin of victory

margin

close blowout

L 44 29

W 51 38

One of the oldest approaches to estimating the relationship between two ordinal variables is the polychoric coefficient. One assumes that there is an underlying bivariate normal distribution with zero means, unit variances and correlation rho.

The observed counts are found by dividing this continuous measure by the cutpoints c (on the x scale) and d (on the y scale). One can estimate the cutpoints from the data (here one solves Phi(c) = 63/162 and Phi(d) = 95/162, and the likelihood of the correlation coefficient rho is given by

L(rho) = p1^44 p2^29 p3^51 p4^38,

where p1, p2, p3, p4 are the probabilities (dependent on rho) that the bivariate normal falls in the four regions divided by the cutpoints c and d. If we place a uniform prior on rho, then the posterior density will be proportion to the likelihood.

We’ll use this example to illustrate different computational approaches to summarizing the posterior distribution.

## Inferences for Gamma Sampling Problem

In the previous post, I considered the problem of modeling lengths of cell phone calls. Here we focus on several types of inferences and predictions that might be of interest.

Following the general computing strategy described in Chapter 5 of BCWR, I first transform the gamma parameters (alpha, beta) to (theta1 = log alpha, theta2 = log mu= log (alpha beta)). The function gamma.sampling.post computes the posterior of (theta1, theta2). The function mycontour draws a contour plot and the function simcontour simulates from this grid. The figure shows the contour plot with the simulated draws placed on top.

> y=c(12.2,.9,.8,5.3,2,1.2,1.2,1,.3,1.8,3.1,2.8)

> library(LearnBayes)

> gamma.sampling.post=function(theta,data)

+ {

+ a=exp(theta[,1])

+ mu=exp(theta[,2])

+ n=length(data)

+ val=0*a

+ for (i in 1:n) val=val+dgamma(data[i],shape=a,scale=mu/a,log=TRUE)

+ return(val-log(a)+log(a)+log(mu))

+ }

> mycontour(gamma.sampling.post,c(-1.5,1.5,0,3),y)

> title(main=”POSTERIOR OF (LOG ALPHA, LOG MU)”,xlab=”log alpha”,

+ ylab=”log mu”)

> s=simcontour(gamma.sampling.post,c(-1.5,1.5,0,3),y,1000)

> points(s$x,s$y)

Suppose we are interested in the mean length of cell phone calls mu. In particular, what is the probability that the mean length exceeds 4 minutes? The figure displays a density estimate of the simulated draws of mu, and I have labeled the desired probability.

> mu=exp(s$y)

> alpha=exp(s$x)

> beta=mu/alpha

> plot(density(mu),main=”POSTERIOR OF MEAN LENGTH”,xlab=”mu”,lwd=3)

> lines(c(4,4),c(0,ss$y[135]),lwd=3)

> text(8,.15,”P(MU > 4) = 0.178″)

> arrows(7,.1,4.5,.05,lwd=2)

Next, suppose we are interested in the predictive distribution of the length of a single cell phone call. Since we have already collected simulated draws from the posterior of (alpha, beta), it just takes one additional command to simulate the predictive distribution of y* (using the function rgamma). I have displayed a density estimate of the predictive density.

Note that the probability the mean call length exceeds 4 minutes is 0.178; the probability a future call exceeds 4 minutes is 0.263

> ys=rgamma(1000,shape=alpha,scale=beta)

> plot(density(ys),xlab=”CALL LENGTH”, lwd=3, main=”POSTERIOR PREDICTIVE DENSITY”)

> mean(ys>4)

[1] 0.263

Last, suppose you plan on making 20 calls next month and you’re interested in the total amount of time used. By use of a loop, we simulate 20 draws from the predictive distribution — the variable ysum contains 1000 realizations of the total.

> ysum=rep(0,1000)

> for (j in 1:20) ysum=ysum+rgamma(1000,shape=alpha,scale=beta)

> hist(ysum, main=”PREDICTIVE DISTRIBUTION OF LENGTH OF 20 CALLS”)

## Modeling Cell Phone Call Durations with a Gamma Density

Suppose we observe a sample y1, …, yn from a gamma(alpha, beta) density where the sampling density is proportional to y^{alpha-1} exp(-y/beta), and we assign a uniform prior on (alpha, beta).

As an example, suppose we wish to fit a gamma density to the durations (in minutes) of a group of cell phone calls.

12.2 0.9 0.8 5.3 2.0 1.2 1.2 1.0 0.3 1.8 3.1 2.8

Here is the R function that computes the log posterior of the density:

gamma.sampling.post1=function(theta,data)

{

a=theta[,1]

b=theta[,2]

n=length(data)

val=0*a

for (i in 1:n) val=val+dgamma(data[i],shape=a,scale=b,log=TRUE)

return(val)

}

The first figure is a contour graph of the posterior density of (alpha, beta). (In R, beta is called the scale parameter.)

Note the strong curvature in the posterior.

Instead, suppose we consider the joint posterior of alpha and the “rate” parameter theta = 1/beta. Here is a contour plot of the posterior of (alpha, theta).

This doesn’t display the strong curvature.

Last, suppose you consider the joint posterior of alpha and the mean mu = alpha beta. The last figure displays the posterior of (alpha, mu).The moral here is that the choice of parameterization can be important when summarizing the posterior distribution. In the next chapter, we’ll suggest a rule of thumb for transforming parameters that makes it easier to summarize many posteriors.

## Fitting a Beta Sampling Model

To illustrate a “brute-force” method of summarizing a posterior, suppose that we observe a sample y1, …, yn from a beta distribution with parameters a and b. If we assign (a, b) a uniform prior, then the posterior density is given by

g(a, b | data) propto prod_{i=1}^n f(y_i; a, b),

where f(y; a, b) = 1/B(a, b) y^(a-1) (1-y)^(b-1) is the beta density. As an example, suppose we are given the following proportions of students who are “math proficient” on the Ohio Graduation Test for a random sample of 20 schools in Ohio.

y=c(0.955, 0.819, 0.472, 0.925, 0.780, 0.931, 0.945, 0.926, 0.852, 0.920, 0.885, 0.890, 0.789, 0.973, 0.831, 0.835, 0.884, 0.904, 0.900, 0.806)

Here is our method:

1. We write a short R function betasampling.post.R that computes the logarithm of the posterior density. Note that the built-in function dbeta is used — the log=TRUE option gives the logarithm of the beta density.

betasampling.post=function(theta,data)

{

a=theta[,1]

b=theta[,2]

n=length(data)

val=0*a

for (i in 1:n) val=val+dbeta(data[i],a,b,log=TRUE)

return(val)

}

2. Next, by trial and error, we find a rectangle (a_lo, a_hi, b_lo, b_hi) that contains the contour plot of the joint posterior (remember that the contours given in the function mycontour.R are located at 10%, 1%, and 0.1% of the height of the density at the mode.)

mycontour(betasampling.post,c(.001,35,.001,6),y)

title(main=”Posterior density of (a, b)”,xlab=”a”,ylab=”b”)

3. We then sample from the grid of values of (a, b) that is used in constructing the scatterplot.

s=simcontour(betasampling.post,c(.001,35,.001,6),y,1000)

4. Suppose we are interested in the marginal posterior densities of a and b. We find these by use of density estimates on the simulated draws of a and b.

par(mfrow=c(2,1))

plot(density(s$x),main=”POSTERIOR OF a”,xlab=”a”,lwd=3)

plot(density(s$y),main=”POSTERIOR OF b”,xlab=”b”,lwd=3)