Home > R work > Using the sapply function in R

## Using the sapply function in R

In R, I think one of the most useful functions is sapply.   By using this function, one can avoid writing a loop.  Let me illustrate using this function for the binomial/beta exchangeable model.

The model is

1.  $y_i$ distributed binomial($n_i, p_i$)
2.  $p_1, ..., p_k$ are iid beta($K, \eta$)
3.  $(K, \eta)$ have a vague prior

The log posterior of $\theta_1 = logit(\eta), \theta_2 = \log K$ is programmed in the function betabinexch in the LearnBayes package.

Suppose we wish to obtain a 90% interval estimate for $p_j$.  To simulate from the posterior, we first simulate a value of $(K, \eta)$ from its posterior and then simulate $p_j$ from the posterior  beta($y_j + K \eta, n_j - y_j + K (1-\eta) )$

Let’s consider the famous Efron and Morris baseball data — $y_j$ is the number of hits in $n_j$ at-bats for the $j$th player.

I’ll read in the data, get a normal approximation to the posterior (function laplace), and then do a random walk Metropolis chain — I obtain a simulated sample from $(\theta_1, \theta_2)$.

y=c(12,7,14,9,18,16,15,13,8,10,17,10,11,10,14,11,10,10)
n=rep(45,18)
library(LearnBayes)
data=as.matrix(cbind(y,n))
fit=laplace(betabinexch,c(-1,4),data)
mcmc.fit=rwmetrop(betabinexch,list(var=fit$var,scale=2),c(-1,4),10000,data) Now I write a function interval.estimate that has a single input j — it will simulate from the posterior of $p_j$ and output a 90% interval estimate. interval.estimate=function(j) { K=exp(mcmc.fit$par[,2])
eta=exp(mcmc.fit$par[,1])/(1+exp(mcmc.fit$par[,1]))
thetaj=rbeta(1000,y[j]+K*eta,n[j]-y[j]+K*(1-eta))
quantile(thetaj,c(.05,.95))
}
If I wish to obtain a 90% interval estimate for $p_1$, I type
> interval.estimate(1)
5%       95%
0.2052578 0.3288738
To do this in a single stroke for all 18 probabilities, I use sapply.  I wish to apply interval.estimate over all values 1, …, 18:
> sapply(1:18,interval.estimate)
[,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
5%  0.1981516 0.1535653 0.2240657 0.1825992 0.2481277 0.2339640 0.2203171
95% 0.3314778 0.2906053 0.3578334 0.3076716 0.4013249 0.3805313 0.3658903
[,8]      [,9]     [,10]     [,11]     [,12]     [,13]     [,14]
5%  0.2061546 0.1637018 0.1841709 0.2375183 0.1837387 0.1948228 0.1878043
95% 0.3435467 0.3000630 0.3211896 0.3969801 0.3151900 0.3279358 0.3136143
[,15]     [,16]     [,17]     [,18]
5%  0.2193047 0.1949027 0.1821835 0.1864202
95% 0.3616939 0.3240116 0.3109836 0.3109164
I get interval estimates for all probabilities.  The 18th column contains the 5th and 95th percentiles of the posterior of $p_{18}$.