Variance Estimation of the Largest Eigenvalue Aspect

Deborah Wang-Lin

Ph.D., 1998

Advisor: Jan de Leeuw

This work aims at providing a systematic study of variance estimations of the largest eigenvalue aspect for stratified samples. The concept and implementation of the linearized variance of the largest eigenvalue aspect are investigated under the multinomial assumption. Three semi-parametric and two parametric bootstraps are proposed by employing the linearized variance in the resampling procedures. If the multinomial distribution assumption fails, especially if sampling proportions are more than 10%, these bootstraps may encounter substantial bias, which can be successfully removed by using bias corrections. In addition, one non-parametric bootstrap is proposed without any assumption. Besides the six proposed bootstraps, two non-parametric bootstraps are also discussed: the standard and mirror-matching bootstraps. These eight bootstraps can also be grouped into two categories: sampling proportions known or unknown. For the first case, pseudo-populations are generated in bootstrap procedures to reduce estimation biases from stratified sampling structures. For the second case, bootstraps are implemented in the same way as if the data set is simply random sampled to unveil how much accuracy one could achieve. Four data sets with different sampling proportions and different number of strata are used for simulation studies. The most frequently used 1.5IQR rule is applied to remove outliers to avoid misleading the simulation results. In most cases, the proposed-1 has the best stability among three non-parametric bootstraps. Semi-parametric and parametric bootstraps have almost constant stability values for different number of bootstrap replicates and these values are usually much smaller than those of non-parametric bootstraps. It suggests that including the linearized variance in the bootstrap procedure not only improves the stability but also reduces the computing time by using a smaller number of bootstrap replicates. It is shown that parametric bootstraps can save about 80% of computing time and still get the same quality results as non-parametric bootstraps. To get the same quality results, semi-parametric bootstraps further reduce computing time to only 1% of the time required for non-parametric bootstraps.

1998