Generate \(Y_i=m(X_i)+\epsilon_i\) where \(\epsilon_i\overset{iid}{\sim}N(0,\sigma^2)\) where \(m\) denotes the unknown function of interest. We suppose \(X_i\) is drawn at random from a design distribution \(p\) with a compact support \([a,b]\).
Two test functions \[\sin{(2x)}+2\exp{(-16x^2)}~,\quad x\in[-2,2]\] \[x+2\exp{(-16x^2)}~,\quad x\in[-2,2]\] Let’s call each test function by Test Function1 and Test Function2.
Three snr and two design densities
snr(signal-to-noise ratio) is defined as \(\text{var}(m)/\sigma^2\) and two design densities are Uniform density and Beta(2,2) density. Here, \(\text{var}(m)=||\,f\,||_2^2\). The values of snr are picked as 2,4 and 81.
We use Gaussian kernel and generate 200 sets of data. The number of data points \(n\) for each data set were 100 and 200. According to the reference2, the local polynomial estimator with \(p=1\) is : \[\hat{m}_h(x_0)=\mathbf{w}_{x_0,h}^{\top}\mathbf{Y}=\sum_{i=1}^n\frac{\{s_{2,x_0,h} − s_{1,x_0,h}(X_i − x_0)\}K_h(X_i − x_0)Y_i}{s_{2,x_0,h}s_{0,x_0,h} − s_{1,x_0,h}^2}\], \[\begin{align} \mathbf{w}_{x_0,h}&=\frac{1}{s_{2,x_0,h}s_{0,x_0,h} − s_{1,x_0,h}^2}[\{s_{2,x_0,h} − s_{1,x_0,h}(X_1 − x_0)\}K_h(X_1 − x_0),\cdots,\{s_{2,x_0,h} − s_{1,x_0,h}(X_n − x_0)\}K_h(X_n − x_0)]^{\top}\\&=[w_{x_0,h}(X_1),\cdots,w_{x_0,h}(X_n)]^{\top} \end{align}\].
We use conditional MISE as a criterion for bandwidth selection. \[MISE_c=E\left[\int\{m(x)-\hat{m}_h(x)\}^2dx
\left|X_1^n\right.\right]=\int\{m(x)-\mathbf{w}_{x,h}^{\top}\mathbf{m}\}^2dx+\sigma^2\int\mathbf{w}_{x,h}^{\top}\mathbf{w}_{x,h}dx\]
which can be estimated as \[\hat{MISE_c}=\int\{\hat{m}_g(x)-\mathbf{w}_{x,h}^{\top}\mathbf{\hat{m}}_g\}^2dx+\hat{\sigma}^2_g\int\mathbf{w}_{x,h}^{\top}\mathbf{w}_{x,h}dx\]
where \(\hat{\mathbf{m}}_g=[\hat{m}_g(X_1),\cdots,\hat{m}_g(X_n)]^{\top}\)
and \(g\) may be chosen by LSCV. That is, \[g=\underset{h}{\arg\min}\sum_{i=1}^n\{Y_i-\hat{m}_{h,-i}(X_i)\}^2\] where
\(\hat{m}_{h,-i}(X_I)\) is the usual leave-one-out
estimate of
\(m(X_i)\). \(\hat{\sigma}_g^2\) can be
given as : \[\hat{\sigma}_g^2=\frac{1}{\nu}\sum_{i=1}^n\{Y_i-\hat{m}_g(X_i)\}^2\] where
\(\nu=n-2\sum_iw_{X_i,g}(X_i)+\sum_{i,j}w_{X_j,g}^2(X_i)\).
rm(list = ls())
par(mfrow=c(1,2))
testf1 <- function(x){sin(2*x)+2*exp(-16*x^2)}
testf2 <- function(x){x+2*exp(-16*x^2)}
plot(function(x)testf1(x),xlim=c(-2,2),main='Test Function1',ylab=expression(f[1](x)))
plot(function(x)testf2(x),xlim=c(-2,2),main='Test Function2',ylab=expression(f[2](x)))
For the uniform design, compute the optimal but unknown \(h_{ISE}\), defined as the minimizer of a discretized version of \[ISE(h)=\int\{m(x)-\hat{m}_h(x)\}^2dx\]
We present two kinds of figures for two test functions which are the boxplots of \((h_{PUREM}-h_{ISE})/h_{ISE}\) and \(\log{\{ISE(h_{PUREM})/ISE(h_{ISE})\}}\). Here, we only consider the case when snr is 4. That is, \(\text{var}(m)/4=\sigma^2\).
First, design desntiy is \(U(-2,2)\) so generate the data.
snr <- 4
varm <- integrate(function(x){testf1(x)^2},-2,2)$value
sg2.s <- varm/snr
n.s <- 100
X.v <- matrix(runif(n.s,min = -2,max = 2),ncol = 1)
eps.v <- rnorm(n.s,0,sg2.s)
Y.v <- matrix(testf1(X.v) + eps.v,ncol = 1)
plot(X.v,Y.v, main = 'An example sample set of Uniform design with snr value four', xlab = 'X.v, Test Function 1')
rm(X.v,Y.v,eps.v,n.s,sg2.s,snr,varm)
Define kernel, \(\mathbf{w}_{x,h}\) and \(\hat{m}_h(x)\).
Kh <- function(Kh_x,Kh_h.s) {exp(-Kh_x^2/(2*Kh_h.s^2))/sqrt(2*pi*Kh_h.s^2) %>% return}
w <- function(w_x.s,w_X.v,w_h.s) {
s <- function(s_r.s, s_x.s = w_x.s, s_X.v = w_X.v, s_h.s = w_h.s) {
t((s_X.v - s_x.s)^s_r.s) %*% Kh(s_X.v - s_x.s, s_h.s) %>% c %>% return
}
denom <- (s(2)*s(0) - s(1)^2)
numer <- diag(c(Kh(w_X.v - w_x.s, w_h.s))) %*% (s(2) - s(1)*(w_X.v - w_x.s))
return(numer/denom)
}
w.v <- Vectorize(w,'w_x.s')
hatm <- function(hatm_x.s, hatm_X.v, hatm_Y.v, hatm_h.s) {
t(w(hatm_x.s, hatm_X.v, hatm_h.s)) %*% hatm_Y.v %>% c %>% return
}
hatm.v <- Vectorize(hatm,'hatm_x.s')
Now let’s find \(g\). First, define \(LSCV(h)\). If \(g\) was found then \(\hat{\sigma}_g^2\) can also be computed.
LSCV <- function(LSCV_h.s,LSCV_X.v,LSCV_Y.v){
ans <- 0 ; n.s <- length(LSCV_Y.v)
for (i in 1:n.s) {
X.temp <- LSCV_X.v[-i] ; Y.temp <- LSCV_Y.v[-i]
ans <- ans + (LSCV_Y.v[i] - hatm(LSCV_X.v[i],X.temp,Y.temp,LSCV_h.s))^2
}
return(ans)
}
LSCV.v <- Vectorize(LSCV,'LSCV_h.s')
Finally we are close to finding \(h_{PUREM}\).
hatMISEc_integrand <- function(hi_x.s, hi_X.v, hi_Y.v, hi_h.s, hi_g.s, hi_hatsg2) {
n.s <- length(hi_Y.v)
hatmg <- rep(0,n.s) ; for (i in 1:n.s) {hatmg[i] <- hatm(hi_X.v[i], hi_X.v, hi_Y.v, hi_g.s)}
return(c((hatm(hi_x.s, hi_X.v, hi_Y.v, hi_g.s) - t(w(hi_x.s, hi_X.v, hi_h.s)) %*% hatmg)^2 +
hi_hatsg2*t(w(hi_x.s, hi_X.v, hi_h.s)) %*% w(hi_x.s, hi_X.v, hi_h.s)))
}
hatMISEc_integrand.v <- Vectorize(hatMISEc_integrand,'hi_x.s')
hatMISE <- function(hM_h.s, hM_X.v, hM_Y.v, hM_g.s, hM_hatsg2) {legendre.quadrature(
function(x){2*hatMISEc_integrand.v(hi_x.s = 2*x, hi_X.v = hM_X.v, hi_Y.v = hM_Y.v, hi_h.s = hM_h.s, hi_g.s = hM_g.s, hi_hatsg2 = hM_hatsg2)},
legendre.quadrature.rules(10)[[10]])
}
We also need ISE and \(h_{ISE}\).
ISE.integrand <- function(Ii_x.s, Ii_X.v, Ii_Y.v, Ii_h.s){
(testf1(Ii_x.s) - hatm(hatm_x.s = Ii_x.s, hatm_X.v = Ii_X.v, hatm_Y.v = Ii_Y.v, hatm_h.s = Ii_h.s))^2
}
ISE.integrand.v <- Vectorize(ISE.integrand,'Ii_x.s')
ISE <- function(I_h.s, I_X.v, I_Y.v){legendre.quadrature(
function(x){2*ISE.integrand.v(Ii_x.s = 2*x, Ii_X.v = I_X.v, Ii_Y.v = I_Y.v, Ii_h.s = I_h.s)},
legendre.quadrature.rules(10)[[10]])
}
Now we have to replicate this procedure 200 times for two test functions and two different design densities. For that, wrap the whole code into a function and repeat it.
Boxplots of \((h_{PUREM}-h_{ISE})/h_{ISE}\) with sample size 100 and 200
Boxplots of \(\log\{ISE(h_{PUREM})/ISE(h_{ISE})\}\) with sample size 100 and 200