Beating the Book: A Machine Learning Approach to Identifying an Edge in NBA Betting Markets

Guy Dotan
MAS, 2020
Frederic R. Paik Schoenberg
With the recent rise of sports analytics, legalization of sports gambling, and increase in data availability to the everyday consumer, the opportunity to close the gap between the bettor and the casino appears more attainable than ever before. Our hypothesis was that one could build a model capable of exploiting the inherent inefficiencies that might exist within the betting marketplace.

Part one of this study required a derivation of the mathematics behind betting odds to determine the true probability a sportsbook places on the outcome of a matchup. Integral to this analysis was to factor in the casino's always-included cut of the betting pool (known as the “”vig””) that are baked into all wagers to maximize profits.

Part two was the model building process in which we trained on an archive of team-level, NBA box scores dating back to 2007 in order to predict which team in the matchup would win or lose. We aggregated our pace-adjusted box score metrics using two different methodologies, rolling eight-game spans and accumulated year-to-date statistics, and then applied these datasets to four different modeling implementations: logistic regression, random forest, XGBoost, and neural networks. Our results were optimistic, as all models were able to accurately predict the winner of a matchup at a rate of greater than 60%, thus outperforming random chance.

The final part of this research involved seeing how our best model faired versus the real-world betting lines for the 2019-20 NBA season. We used our logistic model to get a win probability for each team in every matchup and compared that result to the probability as defined by the sportsbook odds. This betting edge (the discrepancy between our model and the sportsbook) was used in a variety of betting strategies. Using our best fixed wagering technique, we were able to generate a return on investment of about 5% over the course of the entire season. Using a more complicated wagering method known as the Kelly criterion—a strategy that adjusts the amount of money wagered based on the size of the edge identified—we were able to almost double our investment with a return of about 98%. In summary, building a betting model to gain a gambling edge was determined to be quite achievable.

2020