Computational Statistics Course

Lab Class 8: Monte Carlo Methods

Example 1 - Simple Monte Carlo estimator

Use the Monte Carlo approach to estimate the standard normal cdf

\(\Phi(x) = \int_{-\infty}^x \frac{1}{\sqrt{2\pi}} \text{e}^{-t^2/2}\, dt\).

x <- seq(.1, 2.5, length = 10)
n <- 10000
u <- runif(n)
cdf <- numeric(length(x))
for (i in 1:length(x)) {
  g <- x[i] * exp(-(u * x[i])^2 / 2)
  cdf[i] <- mean(g) / sqrt(2 * pi) + 0.5
}

Hint: Break this problem into two cases: \(x\geq0\) and \(x<0\), and use the symmetry of the normal density to handle the second case. Make the substitution \(u=t/x\), having \(du=dy/x\).

Let \(I(·)\) be the indicator function, and \(Z ∼ N(0, 1)\). Then for any constant \(x\) we have \(E[I(Z ≤ x)] = P (Z ≤ x) = Φ(x)\). Evaluate the standard normal cdf at \(x\).

x <- seq(.1, 2.5, length = 10)
n <- 10000
z <- rnorm(n)
dim(x) <- length(x)
p <- apply(x, MARGIN = 1,
  FUN = function(x, z) {mean(z < x)}, z = z)

Phi <- pnorm(x)
print(round(rbind(x, cdf, p, Phi), 4))

##       [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]  [,10]
## x   0.1000 0.3667 0.6333 0.9000 1.1667 1.4333 1.7000 1.9667 2.2333 2.5000
## cdf 0.5398 0.6430 0.7366 0.8155 0.8775 0.9228 0.9535 0.9729 0.9842 0.9902
## p   0.5433 0.6437 0.7338 0.8105 0.8749 0.9242 0.9563 0.9755 0.9882 0.9938
## Phi 0.5398 0.6431 0.7367 0.8159 0.8783 0.9241 0.9554 0.9754 0.9872 0.9938

Example 2 - Error bounds for Monte Carlo integration

Estimate the variance of the estimator in Example 3, and build approximate 95% confidence interval for the estimate of \(Φ(2)\).

x <- 2
n <- 10000
z <- rnorm(n)
g <- (z < x) 
v <- mean((g - mean(g))^2) / n
cdf <- mean(g)
c(cdf, v)

## [1] 9.771000e-01 2.237559e-06

c(cdf - 1.96 * sqrt(v), cdf + 1.96 * sqrt(v))

## [1] 0.9741681 0.9800319

pnorm(2)

## [1] 0.9772499

pnorm(2)*(1-pnorm(2))/n

## [1] 2.223256e-06

Example 3 - Variance reduction: Antithetic variables

Refer to Example 3, repeat the estimation using antithetic variables, and find the approximate reduction in standard error.

MC.Phi <- function(x, R = 10000, antithetic = TRUE) {
  u <- runif(R/2)
  if (!antithetic) v <- runif(R/2) else
    v <- 1 - u
    u <- c(u, v)
    cdf <- numeric(length(x))
  for (i in 1:length(x)) {
    g <- x[i] * exp(-(u * x[i])^2 / 2)
    cdf[i] <- mean(g) / sqrt(2 * pi) + 0.5
  }
 cdf
}

Show a comparison of estimates obtained from a single Monte Carlo experiment.

x <- seq(.1, 2.5, length=5)
Phi <- pnorm(x)
set.seed(123)
MC1 <- MC.Phi(x, anti = FALSE)
set.seed(123)
MC2 <- MC.Phi(x)
print(round(rbind(x, MC1, MC2, Phi), 5))

##        [,1]    [,2]    [,3]    [,4]    [,5]
## x   0.10000 0.70000 1.30000 1.90000 2.50000
## MC1 0.53983 0.75825 0.90418 0.97311 0.99594
## MC2 0.53983 0.75805 0.90325 0.97132 0.99370
## Phi 0.53983 0.75804 0.90320 0.97128 0.99379

Can the approximate reduction in variance be estimated for given x by a simulation under both methods, the simple Monte Carlo integration approach and the antithetic variable approach?

n <- 1000
MC1 <- MC2 <- numeric(n)
x <- 2
for (i in 1:n) {
  MC1[i] <- MC.Phi(x, R = 1000, anti = FALSE)
  MC2[i] <- MC.Phi(x, R = 1000)
}

print(sd(MC1))

## [1] 0.007198348

print(sd(MC2))

## [1] 0.0003669758

print((var(MC1) - var(MC2))/var(MC1))

## [1] 0.997401

Example 4 - Rejection method

Build an algorithm to generate 1000 samples from the Beta(2,2) distribution by rejection method.

The Beta(2,2) density is \(\pi(x) = 6 x (1 - x)\), \(0 < x < 1\).

Let \(p_u(x)\) be the Uniform(0,1) density. Then \(\pi(x)/p_u(x) \leq 6\), \(\forall\, 0 < x < 1\), so \(M=6\).

A random \(x\) from \(p_u(x)\) is accepted if \[ u < \frac{\pi(x)}{M p_u(x)} = \frac{6 x (1 - x)}{6 \times 1} = x (1 - x) . \]

On average, how many random numbers must be simulated to generate these 1000 samples from the Beta(2,2) distribution?

On average, \(M n = 6000\) iterations (\(12000\) random numbers) will be required for a sample size \(1000\). In fact, that depends on the upper bound \(M\) of \(\pi(x)/p_u(x)\), which depends on the choice of the function \(p_u(x)\).

Build a R code to obtain \(1000\) sample in this scenario. How many iterations and random numbers were required to generate the \(1000\) Beta sample?

n <- 1000
k <- 0             #counter for accepted
j <- 0             #iterations
x <- numeric(n)
while (k < n) {
  u <- runif(1)
  j <- j + 1
  y <- runif(1)    #random variate from p_u(x)
  if (y * (1-y) > u) {
    k <- k + 1     #we accept x
    x[k] <- y
  }
}
info<-cbind(j,2*j)
colnames(info)<-c("iterations","random numbers")
print(info)

##      iterations random numbers
## [1,]       5973          11946

Compare the empirical and theoretical percentiles.

p <- seq(.1, .9, .1)
Qhat <- quantile(x, p)  #quantiles of sample
Q <- qbeta(p, 2, 2)     #theoretical quantiles
round(rbind(Qhat, Q), 3)

##        10%   20%   30%   40%   50%   60%   70%   80%   90%
## Qhat 0.189 0.276 0.356 0.436 0.491 0.556 0.616 0.693 0.782
## Q    0.196 0.287 0.363 0.433 0.500 0.567 0.637 0.713 0.804

Repeat the simulation with \(n=10000\) produces more precise estimates.

n <- 10000
k <- 0             
j <- 0             
x <- numeric(n)
while (k < n) {
  u <- runif(1)
  j <- j + 1
  y <- runif(1)   
  if (y * (1-y) > u) {
    k <- k + 1     
    x[k] <- y
  }
}

p <- seq(.1, .9, .1)
Qhat <- quantile(x, p) 
Q <- qbeta(p, 2, 2)  
round(rbind(Qhat, Q), 3)

##        10%   20%   30%   40%   50%   60%   70%   80%   90%
## Qhat 0.198 0.291 0.367 0.439 0.505 0.567 0.635 0.711 0.803
## Q    0.196 0.287 0.363 0.433 0.500 0.567 0.637 0.713 0.804

Example 5 - Importance sampling method

In order to use importance sampling method to estimate \[ \int_0^1 \frac{\text{e}^{−x}}{1 + x^2} dx \] several possible choices of importance functions are compared here. The candidates for the importance functions are \[ \begin{array}{ll} p_0(x) = 1, &0 < x < 1, \\ p_1(x) = \text{e}^{−x}, &0 < x < \infty, \\ p_2(x) = (1+x^2)^{−1}/\pi, &−\infty < x < \infty, \\ p_3(x) = \text{e}^{−x}/(1 − \text{e}^{−x}), &0 < x < 1, \\ p_4(x) = 4(1+x^2)^{−1}/\pi, &0 < x < 1. \end{array} \] The integrand is \[ g(x) = \begin{cases} \text{e}^{−x}/(1 + x^2), &\text{if}\ 0 < x < 1, \\ 0, &\text{otherwise}. \end{cases} \] While all five of the possible importance functions are positive on the set \(0 < x < 1\) where \(g(x)>0\), \(p_1\) and \(p_2\) have larger ranges and many of the simulated values will contribute zeros to the sum, which is inefficient.

All of these distributions are easy to simulate; \(p_2\) is standard Cauchy or \(t_{(1)}\)-Student. Plot the densities on (0,1) for comparison with \(g(x)\) in Figure 1.

The function that corresponds to the most nearly constant ratio \(g(x)/p(x)\) appears to be \(p_3\), which can be seen more clearly from graphs (Figure 2). From the graphs, might we prefer \(p_3\) for the smallest variance?

x <- seq(0, 1, .01)
w <- 2
p1 <- exp(-x)
p2 <- (1 / pi) / (1 + x^2)
p3 <- exp(-x) / (1 - exp(-1))
p4 <- 4 / ((1 + x^2) * pi)
g <- exp(-x) / (1 + x^2)

#figure 1
plot(x, g, type = "l", main = "", ylab = "g(x)", ylim = c(0,2), lwd = w)
lines(x, g/g, lty = 2, col=2, lwd = w)
lines(x, p1, lty = 3, col=3, lwd = w)
lines(x, p2, lty = 4, col=4, lwd = w)
lines(x, p3, lty = 5, col=5, lwd = w)
lines(x, p4, lty = 6, col=6, lwd = w)
legend("topright", legend = c("g", 0:4), lty = 1:6, col = 1:6, lwd = w, inset = 0.02)

#figure 2
plot(x, g, type = "l", main = "", ylab = "g(x)/p(x)", ylim = c(0,3.2), lwd = w, lty = 2, col = 2) 
lines(x, g/p1, lty = 3, col=3, lwd = w)
lines(x, g/p2, lty = 4, col=4, lwd = w)
lines(x, g/p3, lty = 5, col=5, lwd = w)
lines(x, g/p4, lty = 6, col=6, lwd = w)
legend("topright", legend = c(0:4), lty = 2:6, col = 2:6, lwd = w, inset = 0.02)

Figures 1 and 2 plot importance functions with \(g(x)\) and ratios \(g(x)/p(x)\), respectively.

Build a R code to obtain \(10000\) sample under these scenarios.

m <- 10000

theta.hat <- se <- numeric(5)
g <- function(x) {
exp(-x - log(1+x^2)) * (x > 0) * (x < 1)
}

x <- runif(m) #using p0
fg <- g(x)
theta.hat[1] <- mean(fg)
se[1] <- sd(fg)

x <- rexp(m, 1) #using p1
fg <- g(x) / exp(-x)
theta.hat[2] <- mean(fg)
se[2] <- sd(fg)

x <- rcauchy(m) #using p2
i <- c(which(x > 1), which(x < 0))
x[i] <- 2 #to catch overflow errors in g(x)
fg <- g(x) / dcauchy(x)
theta.hat[3] <- mean(fg)
se[3] <- sd(fg)

u <- runif(m) #p3, inverse transform method
x <- - log(1 - u * (1 - exp(-1)))
fg <- g(x) / (exp(-x) / (1 - exp(-1)))
theta.hat[4] <- mean(fg)
se[4] <- sd(fg)

u <- runif(m) #p4, inverse transform method
x <- tan(pi * u / 4)
fg <- g(x) / (4 / ((1 + x^2) * pi))
theta.hat[5] <- mean(fg)
se[5] <- sd(fg)

Obtain and comment the estimates (labeled theta.hat) of \(\int_0^1 g(x)dx\) and the corresponding standard errors se for the simulation using each of the importance functions.

rbind(theta.hat, se)

##                [,1]      [,2]      [,3]       [,4]      [,5]
## theta.hat 0.5232240 0.5245760 0.5369783 0.52491761 0.5240453
## se        0.2458656 0.4184484 0.9623768 0.09676078 0.1409299

Can we make the following comments? i) The simulation indicates that \(p_3\) and possibly \(p_4\) produce smallest variance among these five importance functions, while \(p_2\) produces the highest variance. ii) The standard Monte Carlo estimate without importance sampling has \(\widehat{se} = 0.244\) (\(p_0=1\)). The importance functions \(p_1\) and \(p_2\) do not reduce error, but \(p_3\) and \(p_4\) each reduce the standard error in estimating \(\theta\).

References:

Rizzo, M.L. (2019). Statistical Computing with R, 2nd edition. Chapman and Hall/CRC.

Tanner, M.A. (1996). Tools for Statistical Inference, 3rd edition. Springer-Verlag, New York.