Understanding the Market Value and Utility of High-Variance Starting Pitchers

James Francis Lepore
MAS, 2018
Frederic R. Paik Schoenberg
While Major League Baseball has long been at the forefront of sports analytics, the most commonly used metric to evaluate the quality of a pitcher’s performance is called the ”Earned Run Average” (ERA) which is a global figure with no consideration for the distribution of values. This paper investigates the relationship between observed win percentages among pitchers that exhibit high start-to-start variation in the quality of their starts (consistent pitchers) relative to pitchers that exhibit low start-to-start variation in the quality of their starts (inconsistent pitchers). Logistic regression is first leveraged to transform the traditional components of ERA, runs allowed and innings pitched, into a proxy for expected win probability. Then, through the use of bootstrap sampling, a simulation of hypothetical pitchers is created to compare the difference in actual and expected win totals among pitchers that exhibit different distributional characteristics. Finally, the “Probability of Superiority” between opposing pitchers in individual games is calculated in an attempt to identify optimal matchups. In summary, a statistically significant pattern of outperforming expected win totals is found among pitchers with higher start-to-start variation. However, over the time period analyzed, pitchers at the major league level are much more similar than they are different. As a result, the magnitude of the difference in actual and expected win totals between consistent and inconsistent pitchers — while statistically significant — is arguably not large enough to be considered practically significant.
2018