Nate Silver Still Delivering: House Effects and Bias in Forecasting

Nate Silver, whose fantastic fivethirtyeight blog for the New York Times has been wildly successful, is still offering up plenty of analysis relevant to the sabermetric community despite his departure for politics. In his newest post, When ‘House Effects’ Become ‘Bias,’ Silver distinguishes between several statistical terms used to describe variance and error in projection systems.

The term ‘House effects,’ as Silver points out, refers to systems or portions of systems that produce outlying projections relative to their peers. In the weeks leading to the 2010 mid-term elections, for example, a Gallup poll showed a 15 point advantage for Republicans in house races, while most other polls predicted a much more modest victory.

After the conclusion of whatever event or sampling a projection system purports to predict, we can assess the system’s projections relative the actual outcomes. This can be done by looking at one or both of the following: margin of error – the total or average percentage points by which a projection system ‘missed’ the outcomes – and what is called statistical bias – the leaning of projections too heavily in one direction or another.

Election polls provide a nice, varied set of data in which bias can be seen clearly. For example, one poll-based projection might end up having a large margin of error (missing frequently on predicted outcomes), but not show any particular preference to Democratic or Republican candidates. Another poll might predict more correct outcomes, and generally miss by a smaller percentage of points, but show results that frequently overestimate the votes for Republican candidates. This is what political scientists mean by the term ‘bias.’ It is important to note that this form of statistical bias is not the same as partisan bias, or the skewing of projections based on the political views held by the conducting group, regardless of what overlap between the two there may be.

These terms are equally important when assessing the accuracy of projection systems in baseball. To give one example, Adam Jones received the following preseason OPS projections for 2010:

Bill James: .804
CHONE: .846
Marcel: .773
ZiPS: .818
Fangraphs Fans: .804

Taking a look at the consensus, we could have concluded in March that Marcel’s predictions carried somewhat notable house effects, with its OPS projection a relatively low outlier among its peers. Here we start to see that house effects are not always a bad thing, as Jones actually posted a .767 OPS in 2010. Sometimes one projection system might be keen to certain predictive factors that largely go unnoticed by others. Anyone who has studied statistical projection has undoubtedly come to the conclusion that even the most accurate MLB projection systems carry a high margin of error, and sometimes the consensus projection can miss significantly.

Much of this, however, can be explained by the idea of statistical variance (the statistician’s euphemism for luck). OPS is a combination of various inputs of outcomes, and often fails to account for how a player truly performed. If we look at the outcomes a hitter has the most control over, we can see that there wasn’t as much discrepancy between the systems as we might have thought. Each of the four simulation based systems (I’ll take the fans’ prediction out of the equation for now), for example, suggested that Jones would draw a walk in 6.4 to 7.0 percent of his plate appearances and strike out between 20.1 and 20.9 percent of the time. Most of the incongruity between systems was about batted ball data – which, all controversy aside, is much more susceptible to chance and variance than things like strikeout and walk rates. So, it seems that the main point of divergence among these projection systems was in attempting to predict chance, a task sisyphean at best.

While margin of error is one way of analyzing the utility of projection systems, it might be more helpful to look at bias. We can do this by isolating one bivalent variable and assessing the accuracy of the system in either direction. For example, if we believe that CHONE was overly generous in its assessment of Jones because it fails to account for the learning curve among young outfielders, we could assess its margin of error among outfielders under the age of 25 over the last 3 years. If patterns start to occur after running regression analyses as such, we can conclude that there is a statistical bias. One system might be consistently overrating rookie pitchers in Baltimore because it fails to consider the favorable pitching conditions in the AAA International League in general and Harbor Park in particular. Another system might be failing to accurately translate pitching performances for flyball pitchers in the AAA Pacific Coast League because the launching pads in the league make it nearly impossible to assess how well that type of pitcher would perform in a neutral context.

These are obvious pitfalls that have led to statistical bias, and they have largely been corrected in most systems, but it’s tough to say what other anomalies are being missed. What can be guaranteed is that no projection system is free from bias unless it is either perfect or unrealistically random. Digging for hidden biases is essential in attempting to find market inefficiencies in any realm.

Leave a comment

Filed under Uncategorized

Leave a comment