Estimating the Homeless Population in Los Angeles: An Application of Cost-Sensitive Stochastic Gradient Boosting

Brian Kriegler, Richard Berk
In many metropolitan areas, efforts are being made to count the homeless to ensure a proper provision of social services. Some areas are very large, which makes spatial sampling a viable alternative to an enumeration of the entire terrain. Consequently, counts are manually observed in sampled regions, but they must be imputed in unvisited areas. Along with the imputation process, the costs of underestimating and overestimating may be weighted distinctly depending on one's perspective and what is at stake. Here, we analyze data from the 2004-2005 Los Angeles County homeless study using a variant of stochastic gradient boosting that allows for asymmetric costs. Specifically, we demonstrate how to boost the quantile distribution, which exhibits a straightforward translation to error estimation costs. Imputed counts and model diagnostics using various cost functions are reported. Practical usage of cost-sensitive imputed estimates are discussed briefly.
2007-09-01