Saturday, April 5, 2014

The Ivy League's RPI Problem

That simple formula, born in 1981, remains the most influential tool in college basketball today.

If you don't agree, just ask SMU or Utah. Each of those two teams came into Selection Sunday with a body of work that would have merited a seed in the 7-9 range, according to Vegas, which hopefully most will agree retains expert status in judging team quality.

Due to a pair of legitimately awful non-conference schedules, however, SMU became 2014's biggest snub, while Utah didn't really get serious consideration at all. Both the Mustangs and Utes would have been solid favorites over multiple at large teams (NC State, UMass and Colorado come to mind), but the poor scheduling dragged down their RPI, pushing SMU into the 50s and Utah all the way down to the 80s.

One might feel strongly that SMU and Utah deserved to be punished for their poor schedule strength. The problem is that the RPI isn't the best arbiter of such claims. SMU's non-conference strength of schedule ("NCSOS") checked in at 298th in Pomeroy and finished around 295th in the RPI's calculation. That area of Pomeroy's NCSOS ranking was littered with AAC teams, though. Cincinnati was just four spots ahead, and Louisville slotted in just a couple more away. In the eyes of the committee, the Bearcats and Cardinals looked nothing like SMU, as Cincinnati and Louisville had RPI NCSOS rankings of 95 and 149, respectively.

All three were deserving tournament teams, at least as far as the best available measures of team quality are concerned. Two of them played the RPI game, either knowingly or unknowingly, and sat solidly within the NCAA field, while the last was left to make a run in the NIT.

The dirty secret is that for most BCS teams, gaming or not gaming the RPI is merely the difference between being judged fairly or in a more positive light. For mid-major leagues like the Ivy, failing to consider the effects of the RPI invariably leads to a squad looking much worse than it otherwise would.

For those who aren't familiar with the RPI formula, it is calculated in three (overly-)simplistic parts, and you can read more about it here.

In general, the RPI would do a decent job if schedules were assigned randomly. It's essentially performing the first few steps of what would ultimately be an iterative process to rank order the quality of teams - a process that can be taken to its logical conclusion (and with more data like margin of victory added) given the leaps and bounds that technology has taken since 1981.

Schedules aren't assigned randomly, however, and this creates a massive advantage for coaches who understand the fundamental flaw of the RPI: The opponents' strength of schedule weighting of 25 percent doesn't nearly make up for the benefit of playing teams which have win-loss records that are inflated based on their decision to schedule lightly.

If you take each conference's difference between its average NCSOS in the Pomeroy ratings and its average NCSOS in the RPI, you'd find a pretty stunning correlation between the best conferences and the best gaming of NCSOS. In fact, nine of the top 10 conferences in the Pomeroy ratings are among the top 10 best conferences at playing a much weaker schedule in reality than it appears in the RPI. Conference USA (14th in Pomeroy) supplanted the West Coast Conference (9th in Pomeroy) in the top 10 best NCSOS gamers, and that is actually quite logical. Being able to game NCSOS requires being able to dictate terms of scheduling - something which requires the resources necessary to purchase the games you need.

On average, Ivy League teams played a non-conference schedule rated 202nd in Pomeroy. That was ninth weakest in Division I, but the eight leagues it bested were among the best in college basketball: Big Ten, Big 12, ACC, American, Pac 12, SEC, Mountain West and Conference USA. It also turns out that those were eight of the ten best NCSOS gamers, so while the Ivy League's NCSOS according to the RPI was 238 on average, those conferences all finished with an NCSOS average in the Top 200, while the most egregious offender - the Big Ten - started with a Pomeroy NCSOS average 40 spots WORSE than the Ivies only to finish with an NCSOS average in RPI terms that was 90 spots BETTER than the Ivies.

So, how do the big conference schools do it? They press their advantage in two distinct ways: 1) They play most or all of their games at home and 2) If they play on the road it's against an RPI boosting team. The NCSOS in Pomeroy accounts for the true quality of the opponent and where you play the opponent, whereas the NCSOS in the RPI considers only straight winning percentage (leaving the where question to the weighting of the wins in your team's adjusted winning percentage). This means that the average home game creates a wedge between the NCSOS in Pomeroy and in the RPI, and that wedge can be driven even wider if you schedule the right teams (those with a winning percentage that is far better than their true team quality).

While the adjusted winning percentage helps correct for the site issue, it doesn't do enough, especially given the extra weight that SOS and quality wins are given by the NCAA selection committee. It is roughly as easy to beat a No. 50 team at home as it is to beat a No. 150 team on the road. Yet, one of those wins is considered to be an elusive Top 50 victory while the other one lands in a throwaway pile. That creates a massive problem for mid-majors, which can land road games against teams between 100 and 150, but can rarely get a Top 50 opponent to visit, while BCS teams can host multiple Top 50 teams non-conference and would only venture to the venue of a team ranked 100-150 if it happened to be a fellow power conference team having a very down year.

It's not all doom and gloom for the Ivy League and other mid-major conferences, though. While half of the Ivies had NCSOS rankings that were 60 or more spots worse in the RPI calc than in Pomeroy (including Penn's whopping 151 spots worse), one team managed to game NCSOS quite nicely: Columbia.

The Lions finished with a Pomeroy NCSOS rank of 255, third-worst in the league. But their RPI NCSOS rank of 139 was in a virtual tie for best in the Ivies with Cornell, and the 116 spot improvement was greater than the average improvement for the best NCSOS gaming conference in all of college basketball - the Big Ten. Understanding how Columbia successfully gamed the system should be a priority for all coaches at the mid-major level. Here are some simple, easily executed rules to follow:

Rule 1) Play as few Division I teams with horrible expected winning percentages as possible, regardless of location, and always try to replace those games with a non-Division I opponent that doesn't count toward the RPI.

Columbia's worst non-conference opponent was Maryland Eastern-Shore, which checked in at 5-23 (.179). The Lions played no other teams below .300, and just three more below .460. That left 12 of their 16 non-conference opponents with winning percentages of near .500 or better, allowing Columbia to finish with a NCSOS above .500 and one which ranked 158th nationally. If the Lions had merely replaced their game with Maryland Eastern Shore with another non-Division I opponent, their NCSOS would have risen to roughly 100. Kill the home games against UMass Lowell and Fairleigh Dickinson and the Lions would have seen that NCSOS rise into the Top 50.

Better yet, replace those three games with expected winning percentages of .700 or better (a good BCS team or top flight mid-major), and Columbia's NCSOS would have risen to the Top 10 nationally. Sure, maybe the Lions would have finished 7-9 against Division I opponents non-conference, but maybe they'd steal one of those games. Or maybe we assume a world where they finish off Manhattan at home or St. John's at the Barclays Center. Now you take that 8-8 or 9-7 non-conference record and pile on an 11-3 Ivy run to push the overall mark to 19-11 or 20-10 with an NCSOS in the Top 10. If you don't think that profile gets a long look on Selection Sunday, then you're missing what's really going on in that committee room.

Rule 2) Beg, borrow or steal home games (or at least neutral ones) against Top 100 RPI teams, and if you're going to go on the road to play the big boys, play the biggest big boy you can find.

Columbia hit the scheduling jackpot during its 2013-14 season. It got a visit from RPI No. 60 Manhattan and a shot at No. 67 St. John's on a neutral floor. Those were much more winnable games than visiting either, and they count as quality Top 100 wins regardless of where they're played. That's the dirty secret of the RPI. Getting games against Top 50 and Top 100 teams are doubly beneficial. First, they boost your NCSOS. But then they get counted AGAIN in the Top 50 and Top 100 record column, which is somehow considered separately despite the credit being baked into the RPI in the first place.

So, the Lions fulfilled part one of this rule beautifully, but did an even better job with part two. Columbia wanted to schedule a name and looked to the Big Ten. It could have taken a wimpy approach and tried to pick one that it felt it could beat like a Penn State or a Northwestern. If it had, it would have cost itself nearly 50 spots in its NCSOS. Instead, the Lions went big and scheduled Michigan State and its .758 winning percentage (as part of the Coaches vs. Cancer multi-team event). The result was a nice NCSOS boost, and ultimately Columbia actually had a decent chance of taking the game as well.

Rule 3) If you can't get enough games under Rule 2, at least schedule teams that schedule weakly themselves (possibly in weak conferences that they will destroy) or those that nab victories with copious home games and thus are likely to post gaudy W-L records that aren't commensurate with their true talent.

If there's a rule that Columbia nailed most, it might be this one. Remember that for strength of schedule the RPI only cares about an opponent's winning percentage. Nothing else. Not who the opponent played (that's captured in opponent's strength of schedule, which has a much diminished effect), not where the opponent played its games, but rather just the opponent's win percent.

So, when the Lions played Pomeroy No. 174 Stony Brook at home in January, it looked like the Lions were playing at team which was nearly as good as Manhattan or Michigan St. and a team that was actually a bit better than St. John's. While that's obviously not the case in reality, it is those games upon which true RPI gaming is born. Columbia's three opponents in the Portland pod of the Coaches vs. Cancer Classic all finished with roughly the same winning percentage in the eyes of the RPI calculation. The host Pilots finished at 112th in Pomeroy, while the visiting teams from Idaho and North Texas were in the mid-200s.

Rule 4) Focusing on building a gaudy non-conference record instead of scheduling wisely doesn't fool anybody, and it actually benefits all of your in-league competitors while hurting you (especially if they're gaming the RPI by focusing on SOS over wins and losses).

The biggest impediment that coaches can put in the way of the growth of their program is a focus on racking up bogus victories for optics. Scheduling the dregs of Division I in order to go 10-4 non-conference instead of 4-10 in search of "confidence building wins" won't change the ultimate outcome of the league race, but it guarantees a finish in the RPI that is lower than the true quality of the team (see: Brown 2013-14), gives the team few chances for a statement win but plenty of chances for an embarrassing loss (see: Harvard at FAU) and ensures that your league opponents will get some nice Rule 3 games in conference play (thanks Princeton!).

In 2007-08, Harvard went 8-22. In 2008-09, it went 14-14. Yet arguably the most publicized games for the Ivy League in each of those seasons was the Crimson's home upset of Michigan and its road win at a ranked Boston College squad. The 14-17 games that each Ivy team controls are a precious commodity. Precious few of the predetermined league contests will capture the general public's imagination. Squandering the opportunities that a team does have to capture the national spotlight is an abdication of the duty that a coach has to the program that employs him.

So far, we've viewed these rules through the lens of a team that followed them well: 2013-14 Columbia. A fair objection might be that no matter what the Lions did with their schedule, they would have had no shot at an at-large bid to the NCAA Tournament, so likely none of this applies to Ivy League teams. Leaving aside the idea of better postseason positioning of all kinds (NIT vs. CIT or CBI; first option to host CIT semis and finals due to better RPI, etc.), the overwhelming message from fans, media and recruits is that many mid-major teams are making the wrong tradeoff between winning five extra games and having five extra opportunities to make a splash on the national stage. If that's not compelling, however, let's discuss the one that got away, and how following these simple rules could have changed #2BidIvy from an aspiration to an already banked achievement.

During the 2010-11 season, Harvard went 9-3 against the 59th toughest non-conference schedule in the nation according to Pomeroy. It played three teams seeded No. 8 or better in the NCAA field (George Mason, Michigan and UConn) and two more that were among the first left out (Boston College and Colorado), going 2-3 in those games. To its credit, the Crimson really nailed Rule 2 above. Harvard even got some of Rule 1 right by scheduling two Division III opponents, avoiding the massive SOS hit that would have come with playing a 300ish opponent instead.

It flunked the rest of the scheduling rules so badly, however, that it eroded any hope of presenting an impressive profile to the committee. Harvard played four of its 12 Division I non-conference games against teams with winning percentages at .300 or below. If it had merely replaced those teams with squads that finished at .500 for the season, it would have posted a Top 10 NCSOS, but instead, it finished with a NCSOS rank in the 130s, nearly 80 spots worse than its true schedule strength.

It also put together a slate of opponents that finished with the 27th toughest strength of schedule in the nation, meaning that the Crimson's NCSOS suffered because its opposition played brutal schedules that hurt their winning percentages. While some of that is captured as a benefit in the opponent's strength of schedule portion of the RPI formula (a reason why Harvard finished the regular season with an RPI in the 30s), it's not given the bonus look that the committee always provides in singling out the NCSOS metric.

Finally, the Crimson could have achieved the similar effect of getting a Top 10 NCSOS, if it had kept two of the four teams with winning percentages at or below .300 and replaced the other two with teams with winning percentages of .700 or above. The desire to feast on easy victories (and they really weren't - Harvard barely slipped past three of the four opponents) may have guided the Crimson to a 23-6 regular season record, whereas sacrificing a couple wins to efficient scheduling probably would have left Harvard at 21-8, but solidly in the NCAA field as the Ivy League's first at-large bid.


Someday soon, none of this will really matter. The antiquated and easily gamed RPI formula will be replaced with any one of (or a combination of) the myriad algorithms that provide stunningly accurate estimations of team quality. The NCAA committee will understand the relationship of luck and opportunity in basketball and that a team that gets 15 shots at Top 100 teams and wins five shouldn't be given more credit than a team that got five shots and won three. Some mid-major upsets will remain, but even more will no longer be upsets, as those teams will have been properly seeded in the first place.

This all will be the reality some day.

In the short term, however, there exists a massive market inefficiency. Louisville and Cincinnati actively or passively exploited it, while SMU didn't. Columbia posted an RPI NCSOS that was over 100 spots better than it deserved, while Penn saw its Pomeroy non-conference schedule ranking of 112 fall to 263rd in the RPI NCSOS. Finally, the Ivy League's first real chance at being a two-bid league fell just short in 2011 simply because Harvard failed to schedule effectively.

On average, Ivy League teams were saddled with a NCSOS that was 36 spots worse in the eyes of the RPI than the true schedule strength, as its RPI NCSOS as a league checked in at 32nd out of 32 conferences. The Big 12, which had an average Pomeroy NCSOS of 209 - just behind the Ivy League, checked in with the Top RPI NCSOS of any league in college basketball.

It's simply shocking to watch such an intelligent league consistently do something so stupid.

No comments:

Post a Comment