Methods to Extract Rare Events

Weihua Huang
Ph.D., 2005
Advisor: Richard Berk

This dissertation discusses the data analysis situation where the goal is to find only a few rare events (proportion 0.05 or less of the study sample) from a number of observations. In this dissertation, continuous nonnegative response with long right tail is studied. The rare events are the observations at the right tail of the response distribution. A new method (REH) and its variation are proposed to solve this problem. They are applied to three real data sets, one on computer CPU performance and the other two on fisheries bycatch, all of which are characterized by a long right-tail response variable. This dissertation also uses the same data sets to compare the REH variation with one prestigious existing statistical learning method – Random Forests – on their performance to find out the rare events. Advantages and disadvantages of the two methods under comparison are commented.