Tuesday, December 16, 2014

Luck-Adjusted College Basketball Scores: (Or How I Intend To Take Most of the Fun/Mystery/Excitement Out of Sports)

As many of you have noticed, I've been working on a new pet project recently.

It's no secret that I am an ardent opponent of attributing all of an outcome to skill when it is partially (or sometimes mostly) driven by luck. My vociferous opposition has nothing to do with a desire to argue for argument's sake, nor is it because I want to be the annoying pebble in the shoe.

The reason I care so deeply about separating luck from skill is that what I find interesting about sports is the predictive pursuit. Once you know what is luck and what is skill, you can account for both in your predictions in different ways, but failing to admit that much of what happens on the playing surface is driven by luck is essentially conceding any ability to predict outcomes with a reasonable level of accuracy.

We know a lot about sources of luck in college basketball. Here's a quick list:

1) Free throw defense is pretty much all luck.

2) For the vast majority of players and teams, two-point jumpers on both ends of the floor are pretty much a crapshoot, and it's hard to deviate far from the 35-36 percent national average in either direction.

3) Three-point defense is a mixed bag. You can try to scare opponents off the three-point line with some causal success, but forcing three-point misses is a much more random game.

Notice that this list contains basically all shots taken from outside the low block.

There is a corresponding list of the things that are driven either by skill or by conscious tactical decisions:

1) Offensive rebounding tends to be a replicable skill (top three teams from 2013-14 are still in the top five in 2014-15), as much of offensive rebounding has to do with size (a constant) and the effort permitted by the coaching staff in pursuing rebounds on that end (as opposed to getting back to stop transition buckets - a hotly debated tradeoff in its own right).

2) Getting to the free throw line tends to be a skill, as the same players and teams tend to have success year-over-year. Keeping opponents off the free throw line also seems to be a conscious decision of coaching staffs, though having the personnel to guard without fouling can make sometimes make this tactical decision difficult to execute.

3) Scoring around the rim and defending the basket tend to be replicable skills. Unlike jumpers, players seem to be more consistent with their conversion rates on layups, and shot blockers tend to post high block rates year-over-year.

4) At the extremes, turnover rates remain decently consistent both on the offensive end (there are really bad teams at handling the ball and really good ones) and on the defensive end (the decision to apply heavy pressure or the decision to apply no pressure), but through the middle, there is a decent amount of luck and regression to the mean in turnover rate stats.

Given those guiding principles, we now have a roadmap to start separating the elements of the game that are determined by behavior or skill and those that are guided by randomness or luck.

That roadmap has led to the model that you have been seeing referred to as "luck-adjusted" scoring.

To explain the model, let's start with the simplest of cases one might see in a play-by-play:

Player X makes Two Point Jump Shot

Now, on the scoreboard, Alpha Team will get two points for Player X's made jumper. As we saw above, though, on average, that jumper will fall about 35 percent of the time. So, in the luck-adjusted model, Alpha Team gets 0.7 points.

If that seems silly, let's take a look at a few more possessions:

Player X makes Two Point Jump Shot
Player X misses Two Point Jump Shot
Player X misses Two Point Jump Shot

Player X is really into two-point jumpers (Player X isn't meant to be Yale's Matt Townsend, but it's okay if that's who you're envisioning here), and this run of performance is about what we'd expect on average. Sure enough, the luck-adjusted model would have Alpha Team at 2.1 points, while the scoreboard would have them at 2 points.

But what if Player X truly does make 40 percent of his two-point jumpers. This "luck-adjusted" model is selling him short, no?

The answer is yes, but the difference is inconsequential versus not adjusting for luck at all. Some simple math can prove this out.

Player X misses Two Point Jump Shot
Player X makes Two Point Jump Shot
Player X makes Two Point Jump Shot
Player X misses Two Point Jump Shot
Player X misses Two Point Jump Shot

In this case, Player X and his 40 percent shooting from two would "deserve" four full points, while the luck-adjusted model would only give him 3.5 points. That's an overattribution of 0.5 points to luck.

Think about what happens, though, when we attribute nothing to luck. Roughly 9 percent of the time, Player X will make four or five out of five two-point jumpers. The scoreboard would show 8 or 10 points from that flurry, which is 4 or 6 points more than what Player X and Team Alpha deserve. Another 8 percent of the time, Player X would make none of the two-point jumpers. In this case, Team Alpha got four fewer points than it deserved. In fact, two-thirds of the time, Player X will fail to make two and miss three for the appropriate four points, meaning that in two-thirds of cases Player X's actual output will deviate from the expected by at least two points. Meanwhile, the "luck-adjusted" model only deviates half a point for getting his real two-point jumper hit rate wrong by five percentage points.

Or, in other words, adjusting for an expected value for a shot taken, even if that expected value is off by a decent margin, has far, far less risk than failing to adjust for the expected value of the shot type at all.

The same logic applies for three-point shots and free throws, and the model makes the same average expected value adjustment for those.

Layups, however, have to be treated a little bit differently.

On both ends of the floor, a team's ability to convert and to stop opponents from converting around the rim tends to be a demonstrated skill that can differ by a sizeable amount. Defensively, a team's block rate can have a massive impact on the percentage of layups made and block rates tend to be pretty stable over time down to the player level. For instance, Harvard's Kenyatta Smith would have ranked second in the nation in block rate a couple years ago, had he played a very small additional number of minutes to qualify, and has returned after a year's absence to rank eighth nationally.

The higher the block rate a team records, the more likely that team is altering shots around the rim as well.

On the opposite end, having skilled finishers at the guard position and lots of size in the post can make a team very prolific on a sustained basis at the rim.

Thus, in the model, a team's expected value from layups is dependent upon the opponents ability to prevent conversions around the rim and the team's own ability to produce them. The same thing goes on the opposite end of the floor as well. The expected value credit is given for the "unblocked" field goal percentage around the rim, because all blocked attempts are given an expected value of zero in the model.

So, that's how the model handles all of the possessions for which there is a shot that is either made or missed and rebounded by the defense. There are three other ways a possession can go, though. Here's a brief explanation of how we handle each of those:

1) Turnover - Any possession ending in a turnover is given zero credit with one notable exception highlighted below.

2) Offensive Rebounds - Any possession containing one or more offensive rebounds will be credited the expected value of the highest attempted shot. For instance, if a team shoots a two-point jumper, gets an offensive rebound and shoots a three, then the team will get the expected value for the three (just north of 1pt) rather than the expected value of the two-point jumper (0.7pts) for that possession. If the three is missed, but the team gets another offensive rebound and turns it over, the team still gets the expected value of the three pointer, rather than a 0 for the turnover (this is the exception to the turnover point above).

3) Free Throws - Any possession ending in free throws will receive 0.7pts per free throw. In the case of an and-one, the 0.7pts for the free throw are added to the expected value of the shot itself. One minor note here is that currently the model is calculating one-and-ones incorrectly. The correct expected value for a one-and-one is 1.2 pts (the 1.4pts expected for two free throws less the 30 percent chance you miss your opportunity for 0.7pts on the back end due to missing the front end). I'm working on coding the model to catch those, as now a team is getting 1.4pts when it hits the front end and just 0.7pts when it misses the front end. This really doesn't affect the overall luck-adjusted scores that much, even if you decide to miss multiple front ends in a row, because it's only a half-point difference in expected value and can only happen a handful of times a game.

Finally, the model also recognizes that offenses are more likely to score at the rim in transition shortly after forcing a live-ball turnover. Thus, any un-blocked layups taken within a short span after a forced turnover are given a higher expected value.

Quite simply, those adjustments are all that comprise the luck-adjusted model. If you've been following closely, you can see that the model is going to like teams that take a lot of threes and keep opponents from doing so. It will also favor teams that are both good finishers around the basket offensively and great at protecting the rim defensively. Teams that can keep possessions alive through offensive rebounds will benefit by getting another opportunity for a higher expected value outcome. The model will also favor those teams that spend a lot of time at the free throw line.

The behaviors that the model does not like include two-point jump shots and turnovers, especially giveaways that lead to transition baskets coming back the other way.

To a great extent, this model merely adds the stability of math around heuristics we've already come to hold dear. The traditional thought around upsets is that the underdog limits their turnovers and knocks down a ton of threes, while preventing the favorite from getting easy baskets on the other end. All this model does is merely show you mathematically why that is the case.

Right now, the model is in the hypothesis stage, though I have done some preliminary testing by comparing the predictive ability of the luck-adjusted and actual halftime margins on the actual outcomes from Ivy games thus far this season. Taking the actual halftime margins for a particular game, I either added the luck-adjusted margin from the first half or the actual halftime margin to "predict" the final margin. The luck-adjusted first half margin plus the actual first half margin was a better predictor of the final outcome than the actual first half margin doubled with the former producing a standard deviation of outcomes that was two-thirds the size of the latter.

This model represents my first real attempt to take various luck factors that I've played around with in a vacuum (two-point jumpers and three-pointers, free throw defense and so on) and apply them over an entire game. While this can do a great job to tell you how a game should have played out, given the tactics employed, there is a also second logical phase to this study, which includes adjust lucky or unlucky game-level factors which could have influenced even the luck-adjusted outcome.

For instance, what if Harvard's Siyani Chambers randomly finds himself in foul trouble in a game and the Crimson's turnover rate balloons to 35 percent? Chambers has committed roughly two fouls per 40 minutes for his career, so it's unlikely to happen very often, but how should we address the game where it does? Currently, the luck-adjusted model would ding Harvard for the extra turnovers and the luck-adjusted game outcome would look quite poor. A strong argument could be made, though, that the game should be re-evaluated using the Crimson's normal turnover rate of just under 20 percent, as that is a more reliable and predictive indicator.

There are other potential confounding factors as well. Teams could change strategy in the second half due to the margin on the scoreboard in ways that would affect the luck-adjusted view of the world. Extended end-of-game scenarios can give the team leading on the scoreboard many high expected value possessions that, while logical given the actual score, can skew the luck-adjusted outcome.

I'll leave those types of questions to further study, but in the interim, the best way to handle them is to treat a team's true quality as its luck-adjusted (and quality of opponent) game scores that are in the 25th-to-75th percentile range, rather than getting too caught up in outlier performances. Here is that view for the 2014-15 Ivy campaign thus far:



It's still early, so this metric will be bouncy (taking quartiles of 7-10 games is really dicing a limited data set too fine), but at this point, the graphic above shows what the luck-adjusted scoring system would indicate about the Ivy landscape.

Some of the results are unsurprising. The Crimson has the highest ceiling and the highest average performance. That Cornell has the second-highest ceiling or that Princeton has been so consistently solid but unable to turn that into wins are more interesting findings, however. And while some Ivy followers continue to spread their prophesy that Brown or Dartmouth is a darkhorse contender, the luck-adjusted model shows a clear delineation between the top five and teams six through eight.

I hope that this will encourage some interesting discussion, and I will continue to ponder updates to the model and provide the model outputs throughout the season, so that together we can all observe how it performs.

No comments:

Post a Comment