Cost-Sensitive Tree-Stacking: Learning with Variable Prediction Error Costs

Tess Alexandra Nesbitt
Ph.D., 2010
Advisor: Yingnian Wu

When certain types of prediction error are more costly than others, a learner should be trained to minimize the more costly errors. Countless applications demand learning algorithms that can assimilate these variable prediction error costs. For example, a motivating dental application will be presented in which the cost of underpredicting ordinal bacterial plaque scores is significantly greater than the cost of overpredicting them, especially for high-risk patients since the former can perpetuate deterioration and disease. To complicate the matter, such “cost-sensitive” data typically has a skewed distribution in which the most dangerous examples are scarce. Consequently, conventional symmetric-loss learners simply predict the more common response(s) in order to reduce the number of prediction errors, which can have damaging effects. These obstacles collectively motivate the need for an aggressive cost-sensitive learner that is trained to avoid making errors in the most costly situations while simultaneously maintaining satisfactory overall performance.

While there has been more research in classification settings, cost-sensitizing in quantitative frameworks remains unconquered territory without many available solutions or computing packages. This dissertation describes the motivation, ideology, and mechanics behind the development of a new algorithm, Cost-Sensitive Tree-Stacking, which operates with the primary goals of achieving greater sensitivity to rare high-risk cases and minimizing overall prediction error cost in quantitative frameworks. In the algorithm, conventional loss functions are adapted to account for the type, magnitude, and cost associated with various prediction errors during training. Using these ideas to initially grow cost-sensitive trees, the algorithm thereafter implements a cost-sensitive stacking component to combine the trees for increased stability. Illustrated through two applications, Cost-Sensitive Tree-Stacking’s principal effect revolves around altering the composition of the total cost due to prioritizing the minimization of the most costly errors, thereby averting the potential damage of undetected risky examples. The final stacked ensemble is significantly more sensitive to the rare yet risky cases, and it consistently maintains superior cost-sensitive performance when compared with many of the popular learning ensembles that are used in practice. Finally, we explore how this flexible algorithm naturally lends itself to variable importance measures and inference techniques.