Determining Best Sports Ranking Method with Machine Learning Techniques

Robert Castillo
MASDS, 2024
WU, YINGNIAN
The increase in available sports data has allowed for an increased role in data analytics and machine learning in sports. Most notably, statistical methods have been used to try and predict the outcomes of games, which has given rise to sports betting. While this technological advancement can provide teams and viewers with an improved understanding of ways certain teams perform better than others, the leagues stick with standard, and often flawed, ways of ranking teams. In the following project, various methods are used to create enhanced Bradley Terry model ranking systems for each of the four major sports leagues in North America: the NFL, NBA, NHL, and MLB. These rankings will be used to calculate new win probabilities for each game. These results, along with other variables, will be used as inputs for neural networks that aim to improve the predictive capabilities for game matchups. The best models created using each sport’s dataset involved using the logistic regression ranking method that utilized the Ridge Regression penalty and home court advantage, improving predictive accuracy by as much as 17% compared to simply predicting that the home team will win.
2024