One of my bigger pet peeves is imprecise wording when discussing statistics. Frequently, UZR is cited as a statistic requiring a large sample size to be accurate, for a lot of reasons. UZR is essentially an estimate of a player’s defensive ability. The field is divided into zones, and players are rated based upon balls that entered or exited their zones. There is a probabilistic side to the statistic – a +10 run shortstop should get 80% of the balls in zone 32 – which adds some random variance in to the statistic.

So, there is random variance to UZR. However, most of that variance is eliminated through a regular season’s sample size. What I have below is a graph of the range of Yankee fielder’s UZR/150 at a very conservative 95% confidence level. That means that given the sample size (Fangraph’s Balls in Zone statistic) and a very generous standard deviation, we can say with 95% confidence that the true UZR/150 for each player lies in their respective ranges. I was conservative here so not to overreach (with an estimated standard deviation) and still didn’t find a whole lot of potential variance in UZR. Here’s the graph:

[image title="UZR range graph" size="full" id="21819" align="center" linkto="full" ]

Basically, somewhere in each player’s range, their true 2010 defensive performance (at least as UZR measures it) will end up 95% of the time. Ramiro Pena was somewhere between 12 and 29 runs above average at shortstop this year. Derek Jeter was somewhere between 11 and 8 runs below average. The more balls that entered the fielder’s zone, the tighter the player’s range is.

I prefer this to looking at a list of UZR point estimates that would normally be displayed on Fangraphs or another site. It does not account for inherent problems with UZR, but it does adjust for any random variation that could cause statistics to fluctuate. And, I think at least, it gives a clearer picture of who is good, and who is bad on defense. And yes, Brett Gardner really has been that good this season.

Now, everyone knows that defense tends to fluctuate quite a bit year to year. I don’t think this has anything to do with sample size or UZR. I think that players just plain play well some years and worse some others. Many good and bad defenders are pretty consistent year to year. Others are inconsistent – just like all other aspects of the game. The point is that even looking for a 95% confidence level, and relatively small samples, random variation can only account for a small portion of year-to-year variation. If you add to the sample, you’re not going to get a lot of change. UZR will still underrate Mark Teixeira, even with 500 more data points. Even guys like Thames or Pena, who saw very small amounts of time in the field relative to the regulars, didn’t shift around that all that much. Regulars were +/- about 3-4 runs.

Heading in to the postseason, we can reliably say that this is what we’re working with. Next year, the team could perform differently, but this is how they’ve performed this year, according to UZR; we don’t need a larger sample to say it. In a sane world, Ramiro Pena would be Derek Jeter’s defensive replacement with a modest lead (which would be easier if Jeter were batting 9th, which he should). They should probably at least consider Nick Swisher too, depending on the pitcher. And Marcus Thames should never seen the field, but we knew that already.

Follow Me On Twitter

 

One Response to 2010 Yankee Fielding – Adjusted UZR

  1. Your Mom says:

    This graph is epic.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Set your Twitter account name in your settings to use the TwitterBar Section.