Final Report: A Simulation Study for Local Linear Regression

Nonparametric Function Estimation

Jae Ho Chang

Submitted on 2018-12-10

Simulation Setup

Generate \(Y_i=m(X_i)+\epsilon_i\) where \(\epsilon_i\overset{iid}{\sim}N(0,\sigma^2)\) where \(m\) denotes the unknown function of interest. We suppose \(X_i\) is drawn at random from a design distribution \(p\) with a compact support \([a,b]\).
Two test functions \[\sin{(2x)}+2\exp{(-16x^2)}~,\quad x\in[-2,2]\] \[x+2\exp{(-16x^2)}~,\quad x\in[-2,2]\] Let’s call each test function by Test Function1 and Test Function2.
Three snr and two design densities

snr(signal-to-noise ratio) is defined as \(\text{var}(m)/\sigma^2\) and two design densities are Uniform density and Beta(2,2) density. Here, \(\text{var}(m)=||\,f\,||_2^2\). The values of snr are picked as 2,4 and 8¹.

We use Gaussian kernel and generate 200 sets of data. The number of data points \(n\) for each data set were 100 and 200. According to the reference², the local polynomial estimator with \(p=1\) is : \[\hat{m}_h(x_0)=\mathbf{w}_{x_0,h}^{\top}\mathbf{Y}=\sum_{i=1}^n\frac{\{s_{2,x_0,h} − s_{1,x_0,h}(X_i − x_0)\}K_h(X_i − x_0)Y_i}{s_{2,x_0,h}s_{0,x_0,h} − s_{1,x_0,h}^2}\], \[\begin{align} \mathbf{w}_{x_0,h}&=\frac{1}{s_{2,x_0,h}s_{0,x_0,h} − s_{1,x_0,h}^2}[\{s_{2,x_0,h} − s_{1,x_0,h}(X_1 − x_0)\}K_h(X_1 − x_0),\cdots,\{s_{2,x_0,h} − s_{1,x_0,h}(X_n − x_0)\}K_h(X_n − x_0)]^{\top}\\&=[w_{x_0,h}(X_1),\cdots,w_{x_0,h}(X_n)]^{\top} \end{align}\].

PURE Bandwidth selector

We use conditional MISE as a criterion for bandwidth selection. \[MISE_c=E\left[\int\{m(x)-\hat{m}_h(x)\}^2dx \left|X_1^n\right.\right]=\int\{m(x)-\mathbf{w}_{x,h}^{\top}\mathbf{m}\}^2dx+\sigma^2\int\mathbf{w}_{x,h}^{\top}\mathbf{w}_{x,h}dx\] which can be estimated as \[\hat{MISE_c}=\int\{\hat{m}_g(x)-\mathbf{w}_{x,h}^{\top}\mathbf{\hat{m}}_g\}^2dx+\hat{\sigma}^2_g\int\mathbf{w}_{x,h}^{\top}\mathbf{w}_{x,h}dx\] where \(\hat{\mathbf{m}}_g=[\hat{m}_g(X_1),\cdots,\hat{m}_g(X_n)]^{\top}\) and \(g\) may be chosen by LSCV. That is, \[g=\underset{h}{\arg\min}\sum_{i=1}^n\{Y_i-\hat{m}_{h,-i}(X_i)\}^2\] where \(\hat{m}_{h,-i}(X_I)\) is the usual leave-one-out estimate of \(m(X_i)\). \(\hat{\sigma}_g^2\) can be given as : \[\hat{\sigma}_g^2=\frac{1}{\nu}\sum_{i=1}^n\{Y_i-\hat{m}_g(X_i)\}^2\] where \(\nu=n-2\sum_iw_{X_i,g}(X_i)+\sum_{i,j}w_{X_j,g}^2(X_i)\).

Test functions

rm(list = ls())
par(mfrow=c(1,2))
testf1 <- function(x){sin(2*x)+2*exp(-16*x^2)}
testf2 <- function(x){x+2*exp(-16*x^2)}
plot(function(x)testf1(x),xlim=c(-2,2),main='Test Function1',ylab=expression(f[1](x)))
plot(function(x)testf2(x),xlim=c(-2,2),main='Test Function2',ylab=expression(f[2](x)))

Results

For the uniform design, compute the optimal but unknown \(h_{ISE}\), defined as the minimizer of a discretized version of \[ISE(h)=\int\{m(x)-\hat{m}_h(x)\}^2dx\]

We present two kinds of figures for two test functions which are the boxplots of \((h_{PUREM}-h_{ISE})/h_{ISE}\) and \(\log{\{ISE(h_{PUREM})/ISE(h_{ISE})\}}\). Here, we only consider the case when snr is 4. That is, \(\text{var}(m)/4=\sigma^2\).

First, design desntiy is \(U(-2,2)\) so generate the data.

snr <- 4
varm <- integrate(function(x){testf1(x)^2},-2,2)$value
sg2.s <- varm/snr

n.s <- 100
X.v <- matrix(runif(n.s,min = -2,max = 2),ncol = 1)
eps.v <- rnorm(n.s,0,sg2.s)
Y.v <- matrix(testf1(X.v) + eps.v,ncol = 1)
plot(X.v,Y.v, main = 'An example sample set of Uniform design with snr value four', xlab = 'X.v, Test Function 1')

rm(X.v,Y.v,eps.v,n.s,sg2.s,snr,varm)

Define kernel, \(\mathbf{w}_{x,h}\) and \(\hat{m}_h(x)\).

Kh <- function(Kh_x,Kh_h.s) {exp(-Kh_x^2/(2*Kh_h.s^2))/sqrt(2*pi*Kh_h.s^2) %>% return}
w <- function(w_x.s,w_X.v,w_h.s) {
  s <- function(s_r.s, s_x.s = w_x.s, s_X.v = w_X.v, s_h.s = w_h.s) {
    t((s_X.v - s_x.s)^s_r.s) %*% Kh(s_X.v - s_x.s, s_h.s) %>% c %>% return
  }
  denom <- (s(2)*s(0) - s(1)^2)
  numer <- diag(c(Kh(w_X.v - w_x.s, w_h.s))) %*% (s(2) - s(1)*(w_X.v - w_x.s))
  return(numer/denom)
}
w.v <- Vectorize(w,'w_x.s')
hatm <- function(hatm_x.s, hatm_X.v, hatm_Y.v, hatm_h.s) {
  t(w(hatm_x.s, hatm_X.v, hatm_h.s)) %*% hatm_Y.v %>% c %>% return
}
hatm.v <- Vectorize(hatm,'hatm_x.s')

Now let’s find \(g\). First, define \(LSCV(h)\). If \(g\) was found then \(\hat{\sigma}_g^2\) can also be computed.

LSCV <- function(LSCV_h.s,LSCV_X.v,LSCV_Y.v){
  ans <- 0 ; n.s <- length(LSCV_Y.v)
  for (i in 1:n.s) {
    X.temp <- LSCV_X.v[-i] ; Y.temp <- LSCV_Y.v[-i]
    ans <- ans + (LSCV_Y.v[i] - hatm(LSCV_X.v[i],X.temp,Y.temp,LSCV_h.s))^2
  }
  return(ans)
}
LSCV.v <- Vectorize(LSCV,'LSCV_h.s')

Finally we are close to finding \(h_{PUREM}\).

hatMISEc_integrand <- function(hi_x.s, hi_X.v, hi_Y.v, hi_h.s, hi_g.s, hi_hatsg2) {
  n.s <- length(hi_Y.v)
  hatmg <- rep(0,n.s) ; for (i in 1:n.s) {hatmg[i] <- hatm(hi_X.v[i], hi_X.v, hi_Y.v, hi_g.s)}
  return(c((hatm(hi_x.s, hi_X.v, hi_Y.v, hi_g.s) - t(w(hi_x.s, hi_X.v, hi_h.s)) %*% hatmg)^2 + 
             hi_hatsg2*t(w(hi_x.s, hi_X.v, hi_h.s)) %*% w(hi_x.s, hi_X.v, hi_h.s)))
}
hatMISEc_integrand.v <- Vectorize(hatMISEc_integrand,'hi_x.s')
hatMISE <- function(hM_h.s, hM_X.v, hM_Y.v, hM_g.s, hM_hatsg2) {legendre.quadrature(
  function(x){2*hatMISEc_integrand.v(hi_x.s = 2*x, hi_X.v = hM_X.v, hi_Y.v = hM_Y.v, hi_h.s = hM_h.s, hi_g.s = hM_g.s, hi_hatsg2 = hM_hatsg2)},
  legendre.quadrature.rules(10)[[10]])
}

We also need ISE and \(h_{ISE}\).

ISE.integrand <- function(Ii_x.s, Ii_X.v, Ii_Y.v, Ii_h.s){
  (testf1(Ii_x.s) - hatm(hatm_x.s = Ii_x.s, hatm_X.v = Ii_X.v, hatm_Y.v = Ii_Y.v, hatm_h.s = Ii_h.s))^2
}
ISE.integrand.v <- Vectorize(ISE.integrand,'Ii_x.s')
ISE <- function(I_h.s, I_X.v, I_Y.v){legendre.quadrature(
  function(x){2*ISE.integrand.v(Ii_x.s = 2*x, Ii_X.v = I_X.v, Ii_Y.v = I_Y.v, Ii_h.s = I_h.s)},
  legendre.quadrature.rules(10)[[10]])
}

Now we have to replicate this procedure 200 times for two test functions and two different design densities. For that, wrap the whole code into a function and repeat it.

Results

Boxplots of \((h_{PUREM}-h_{ISE})/h_{ISE}\) with sample size 100 and 200

Boxplots of \(\log\{ISE(h_{PUREM})/ISE(h_{ISE})\}\) with sample size 100 and 200

e.g.) If snr=8 then \(\text{var}(m)=8\sigma^2\).↩
page 3.↩