Friday, September 23, 2011

Does Defense Matter

The advanced stat movement in basketball has a long way to go to catch baseball.

Various hurdles exist along the path. The popular parlance notes that baseball is an "individual sport masquerading as a team sport," whereas basketball outcomes are much more of a product of five interwoven parts. Then, there's the issue of whether basketball even collects enough data, or at least the right data, from which to draw conclusions.

The latter is especially concerning on the defensive end of the court. Dean Oliver, the godfather of the "Pomeroy" stats, suggested an expanded boxscore, which would include forced field goal misses to allocate defensive stops more appropriately, rather than just giving credit to the player that ultimately rebounded the miss. Without such a change, Oliver's defensive rating disproportionately favored big men, who could rack up huge numbers of rebounds and blocks, while guards saw their ratings primarily dependent on the only other input to the rating - steals.

Critics of Oliver's defensive rating launched a bevy of arguments, including the lack of a "forced miss" statistic and the circular nature of assignments (even if you could single out defensive performance, the best defenders draw the best offensive players, which would cause the best defenders to look more average than they actually are).

Unsurprisingly, the seemingly insurmountable task has caused many to throw up their hands, and even worse, has emboldened the "eye test" contingent, who have opportunistically latched onto the defensive end of the court as base camp in their war on numbers.

While there is no satisfactory defensive rating on the player-by-player level that can match the logic and predictive power of the individual Oliver offensive ratings, there are still many valuable insights that can start chipping away at the mystery that is the defensive end of the court. Starting with the team-level data, we can use the evidence to work our way down to the individual in a way that can at least begin to achieve the ultimate goal of the advanced stats movement - to determine what is random and what is predictable and to devise formulas to forecast the piece that is possible to predict.


Take a look at any of Ken Pomeroy's ratings, specifically focusing on team offensive and defensive ratings. Flip back through the years. Notice anything?

From 2004 to 2011, the standard deviation of offensive ratings among Division I teams is significantly wider than the defensive ratings. Each year, the spread between the two is about one to two points, meaning that the difference between the great and terrible offensive teams is much larger than the difference between the great and terrible defensive teams (from Stats 101, a one to two point standard deviation difference implies a four to eight point difference in 95 percent confidence interval).

A consistent gap like that tells us something. Indisputably, it shows that it is harder to excel on the defensive end than the offensive end, but there are many potential theories as to why. Maybe defense is far more beholden to luck than offense. The less of an action that is controllable, then the less room there is for specialization and merit-based excellence, which would lead to a tighter range of outcomes.

Another potential explanation (and not even a conflicting one) is that the ability to impact the game is much harder on defense than offense. If it takes a really special talent to generate extra stops above the average and an incredibly lazy, indifferent player to allow more points than average, then we'd see a smaller disparity in team defense, as a higher percentage of Division I players would be clumped around the defensive mean.


Pomeroy's defensive analysis on a team level goes deeper than just the points allowed per 100 possession rating.

He also breaks down the Four Factors of offensive and defensive success (effective field goal percentage, turnover rate, offensive rebounding percentage and free throw rate) and provides even further components including 2-point, 3-point and free throw percentage, block rates, steal rates, assist rates and 3-pointers attempted as a share of total field goals.

Regressing each of these stats against a team's defensive rating (points per 100 possessions) allows us to estimate the relative impact of each.

Controlling for the opponent's offensive rating (the exact same raw rates against really good and really bad offensive teams will lead to wildly different points allowed per 100 possessions, so factoring in the opponent becomes important here), the most important determinant of defensive success is two-point percentage allowed. A percentage point rise in 2PT% allowed leads to a 0.9 point rise in defensive rating.

There's a hefty gap before we get to 3PT%, offensive rebounding allowed and turnover rate. A percentage point rise in 3PT% allowed or offensive rebounding rate leads to a 0.7 point rise in defensive rating. Turnover rate has a much more dramatic impact on defensive rating on the margin (a percentage point rise in TO Rate drops a team's defensive rating by 1.2 points), but given that the range of possible outcomes is much smaller than for 2PT%, 3PT% and offensive rebounding rate, the absolute impact of TO Rate is reduced to that of 3PT% and offensive rebounding rate.

The remainder of the defensive metrics have almost no bearing on defensive rating. A one percentage point rise in free throw rate only bumps up a team's defensive rating by 0.1 points, and the range of potential free throw rates allowed is hardly wide enough for that to make a significant difference. Three point attempts as a percentage of total field goals has a similar impact and is mitigated by the same range concerns. Block percentage and steal percentage are all economically insignificant as well, but much of this is due to the fact that they already have direct relationships with other variables that do matter.

Now things become a little clearer. The most important aspect of defense is 2PT% allowed with 3PT%, offensive rebounding rate and turnover rate slightly behind. Any gains picked up by reducing trips to the line are likely to be trivial and style factors like the composition of opponents' shots and pace of the games are decently meaningless.

Starting with opponents' shooting, the major swing factor in that 2PT% figure will be shots in the paint that aren't highly contested, while the 3PT% figure will hinge on the number of wide open attempts a team generates or allows. Except in the case of an undersized post defender (i.e. the Laurent Rivard experiment at the 4 during Kyle Casey's injury), these are generally team assets or liabilities. Help defense rotates until it can't anymore, and the offense makes the extra pass to get a player a higher-than-normal percentage shot. The extent to which this happens, relative to normal, decently contested shots, goes a long way toward determining where the 2PT% and 3PT% end up.

On the opposite end of the spectrum from uncontested shots are blocks. For every percentage that a team's block percentage rises, its 2PT% allowed falls by 0.7 percentage point. The relationship isn't incredibly strong, however, primarily because what matters more than the actual block is whether the shot is altered, getting us back to the heretofore unrecorded "forced field goal miss" statistic.

Offensive rebounding rate and turnover rate are quite the opposite, as each are highly dependent upon the performance of the individual over that of the team. Individual rebounding rates are very consistent year-over-year, and easily aggregated to the team level. Turnover rates have a pretty common baseline and then move one-to-one with the individuals' steal percentages, which are also quite consistent year-over-year.

So, where does that leave us? Team-level defensive ratings are impacted by certain team-level defensive statistics in a clear, economically significant manner, and some of those statistics are clearly linked to individual-level stats, while others have individual components, but are more correctly described as team assets.


Having exhausted the potential of the box score stats, we need a different approach to see how much, if any, of the team assets can be allocated to individual performance.

The advent of plus-minus statistics in college basketball affords us that opportunity. If players are distinguishing themselves in things that we can't measure, then while the box score can't reward or penalize them, the scoreboard surely can.

By taking players with a large sample size of minutes and comparing the team performance offensively and defensively when they are on the court and off the court, we can potentially capture any value-added that a player provides. In most cases, if a player sees the floor for a significant number of minutes, that player will face substantially the same level of competition as that of his teammates, so we should be able to compare the percentage of minutes a player plays to the percentage of the total team points that his squad scores and allows while he's on the floor.

For instance, if a player plays 75 percent of team minutes and his team scores 78 percent of its points while he's on the floor and allows 70 percent of its total points, then we can consider that player better than his team's average both offensively and defensively. Comparing that performance to the team-level offensive and defensive ratings, we can generate individual offensive and defensive ratings by player based on the most fundamental measurement in any sport: points.

If such "aptitude" isn't luck but skill, it should be repeatable year-over-year. Thus, among a sample of qualified players from both 2010 and 2011, we should see a strong correlation between the defensive performance from the two different seasons. Such a relationship doesn't really materialize, however, as a simple regression of 2011 against 2010 revealed a coefficient of just 0.25 on the 2010 defensive rating with an R-squared of 0.10. The exact same regression of 2010 and 2011 offensive ratings revealed a coefficient of 0.54 on the 2010 offensive rating and an R-squared of 0.37.

By this measure, individual offensive output is over two-times as consistent as individual defensive output and possesses far more explanatory power.

The message is decently clear. Individuals can impact offensive output in a far more direct way than they can affect defensive efficiency, something which should be instructive when deciding between personnel that could have differing impacts on differing sides of the ball.


Hearkening back to the fundamental finding of Bill James, a team's winning percentage is very highly correlated to its points allowed and its points surrendered. Thus, maximization of winning percentage involves driving as much a wedge between the two as possible.

Points scored are more highly volatile than points allowed, making offense more of a priority than defense. This implicitly creates a decently high hurdle for a "defensive specialist" to clear. Even further, there is a demonstrated link between defenders that can generate steals, grab rebounds and alter shots and lower defensive ratings, but the argument that a player who doesn't produce in these categories has large value defensively is nebulous at best.

To be clear, the argument is not that defense doesn't matter at all. Clearly, some teams are much better than others on that end of the floor, and there is something beyond luck that is driving the differentials. But the differences are larger on the offensive end and once you control for the effect of the stats that we already keep, the remaining amount of defensive variance to explain isn't all that large and could be driven in some part by pure, dumb luck.

So, if your big men all rebound and alter shots at about the same rates and your guards all have similar steal rates, just put your top five offensive rating players on the floor and you'll likely have your optimal lineup.

No comments:

Post a Comment