In Matt’s post yesterday, he added a “disclaimer” regarding projections such as Marcel, PECOTA, CAIRO, and ZiPS upon my request:

Projections assume performance by players will tend to regress towards the mean, such that there will be a smaller spread between good and bad performances. Therefore, some players at the top end may be sold short, while those at the bottom end may see a statistical bump.

To clarify, what this means is that if the mean batting average in the system is .270, the spread of projected performances might range from .240-.300, even though in real life, you are likely to see some .320′s and .220′s. As such, a projected .290 batting average may not seem that impressive relative to our general perceptions of what a good AVG is or compared to the player’s career, but it may actually be a strong projection relative to his peers.

Derek Carty addressed this issue in 2009 (h/t @joepawl):

The most important concept I’d like to stress is that of relativity. The kinds of articles I just mentioned operate under the assumption that the James projection for a player should be looked at relative to another system’s projection for him or relative to last year. This is incorrect, though. What we should be doing is examining the James projection for a player relative to all of the other players the James system projects.

As I’ve stressed many times before, context is of the utmost importance when it comes to almost anything fantasy baseball related. In this case, most people ignore the run environment that the James projection system assumes. To illustrate my point, I’ll use a very extreme example. Let’s say that we transport Albert Pujols and his 44 HR projection into a league where it is common for the worst players to hit 80 HRs per year and the best to top 200 HRs. While Pujols and his 44 HRs look terrific in our reality, in this new one it looks kind of pathetic. That’s context.

“What does this have to do with the James projections, though,” you ask? Well, while the James projections don’t assume a run environment where people are routinely hitting 200 HRs, it usually does assume that hitters perform a little bit better, on the whole (when compared to previous seasons or other projection systems). So if everyone is being projected to hit a few extra HRs, it does not necessarily make Alex Rodriguez’s 37 HR projection any more optimistic than CHONE’s 34 HR projection.

The bolded portion is the point that I am trying to make. Comparing projections to league average, career numbers, or to projections from other systems are useful only insofar as you ensure that you contextualize the results as necessary. Returning to my example from earlier, the projected .290 batting average does not seem all that impressive on its face. However, if you look at other projections in that system and find that it is the 3rd best AVG among all players at that position, the player is cast in a different light and the projection becomes a valuable data point in assessing the player.

Another thing to avoid with projections is treating them like predictions. Projections are not predictions, rather they serve as the mean of the expected range of possible outcomes. They are therefore useful to say that on average, a player’s statistics should cluster around these particular numbers. To put this into practical terms, when PECOTA tells me that Player X will have an OBP of .360, it is not predicting that he will have an OBP of exactly .360. Rather, it is telling us that if you take the average of the range of all expected outcomes, that number will be .360, such that the most likely outcome will be around .360.

Why does this matter? Because the difference between likelihood and exactitude is important when utilizing these numbers to buttress an argument. If these systems were in the business of predicting, saying Player A will have a better OBP than Player B would give us an obvious conclusion: A>B. This is not how projections work. If one player’s projected OBP is .360 and another’s is .355, PECOTA is telling us that if both perform to their mean projection, then A>B. However, both players have a wide range of possible outcomes. This allows for a decent possibility that the second player will finish with the better OBP. It is not the most likely outcome, but that uncertainty is built into the system and cannot be ignored. Feel free to use the projections, but remember that they are not intended to predict performances with exactitude.

This brings me to my final point, regarding projected standings. Most projection systems use their data to compile projected standings. In doing so, the administrators of the system need to build depth charts and estimate playing time, which is quite difficult and can add a lot of error into the calculations. As such, projected standings need to be used even more carefully than projections for individual players. They should be used to establish broad ranges of expected performances, rather than tight bands. For example, I tend to take a projection of 91 wins as a statement that the team is expected to sit anywhere from about 85-97 wins. This tells me that they should be in contention, but other than a broad conclusion such as that, it is dangerous to extrapolate much from this data.

I am not saying that projections are worthless, nor am I suggesting that any analysis built upon them is fatally flawed. I use projections all the time, and feel they can be a useful analytical tool when used prudently. Rather, I simply believe that it is important to be aware of the limitations of the tools we utilize to craft arguments. If we are not cognizant of them, we are in danger of reaching conclusions that are not supported by the evidence.
 

Follow Me On Twitter

Tagged with:
 

8 Responses to Being Prudent With Projections

  1. Moshe Mandel says:

    Just as a note, Colin Wyers told me last night that the margin of error on PECOTA standings is +/- 7 wins. Essentially, they are giving us a 15 game range with the mean as the stated projection.

  2. Steve S. says:

    The biggest issue I have with projections is at the margins concerning age and experience. I’m not convinced their overly useful when dealing with a 37 year old Derek Jeter or a 25 year old Phil Hughes. The aging player has a diminishing skill set, so averaging in prime years skews the numbers upward. The young player has too little experience at the MLB level to know if his career is on the upswing, or due to be exposed.

  3. Love this post. I obviously love looking at projections, and find they can be useful benchmarks in attempting to extrapolate some semblance of how we might expect a given player to do in the coming year, but obviously every system needs to be taken with copious amounts of salt. Anyone who looks at a given projection and expects a player to put up the exact line that a projection system says it will probably shouldn’t be looking at projections.

    However, given the five-month pocket of hell during which there are no professional baseball games being played, nothing gets the stat geek in me pumped up quite like a fresh set of projections, no matter how off-base they might be.

  4. [...] This post was mentioned on Twitter by Bill Miller, moshetya. moshetya said: RT @MosheTYA New post: Being Prudent With Projections http://bit.ly/hJApJx // my post on using projections correctly [...]

  5. Matt Warden says:

    Great post, Moshe. And I agree with you, Larry.

  6. oldpep says:

    I think there’s a difference in how people interpret words like ‘predictive’. SABR stuff has always been about seeing what’s most likely to happen, not in predicting what will happen. It applies to established MLB players and to guys being promoted to the show.
    I do believe pitchers are harder to project-so many more variables involved (like tweaking a pitch, small injuries, etc.)

  7. [...] to get a single set of projected standings. The projected AL standings are below, and the usual caveats about projections should be considered before lending too much meaning to these [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Set your Twitter account name in your settings to use the TwitterBar Section.