An Introduction to Ensemble Methods for Data Analysis (Revised July, 2004)

Richard A. Berk

This version is superseded by Preprint number uclastat-preprint-2005:5.

There are a growing number of new statistical procedures Leo Breiman (2001b) has called “algorithmic”. Coming from work primarily in statistics, applied mathematics, and computer science, these techniques are sometimes linked to “data mining”, “machine learning”, and “statistical learning”. A key idea behind algorithmic methods is that there is no statistical model in the usual sense; no effort to made to represent how the data were generated. And no apologies are made for the absence of a model. Rather, there is some practical data analysis problem to solve that is attacked directly with procedures designed specifically for that purpose. If, for example, the goal is to determine which prison inmates are likely to engage in some form of serious misconduct while in prison (Berk and Baek, 2003), there is a classification problem is to be addressed. Should the goal be to minimize some function of classification errors, procedures are applied with that minimization problem paramount. There is no need to represent how the data were generated if it is possible to accurately classify inmates by other means.
2003-09-01