NFL Elo Ratings Are Back!

Sep 06, 2015

A good deal of FiveThirtyEight’s NFL coverage last season used Elo ratings, a simple system1 that estimates each team’s skill level using only the final scores and locations of each game. For 2015, we’re not only bringing Elo back (with a few small tweaks — more on those in a moment), but we’ve also built a continually updating Elo NFL predictions page that allows you to see the latest rankings, plus win probabilities and point spreads for the current week of NFL games.

How do our Elo ratings work? FiveThirtyEight editor-in-chief Nate Silver wrote a detailed FAQ2 about the formula before the 2014 season, and almost all of it still applies. The only changes we made mirror the methodology we used when applying Elo to the entire history of the NBA back in May and involve what to do when new (expansion) teams are added to the closed circuit of a league.

Originally, our Elo formula started each franchise (at its founding) with a rating of 1500, which also represented the rating of an average team. This worked in general, especially since it had been a long time since the league had expanded. But it’s not such a good assumption for handling expansion teams and analogous situations, such as mergers between different leagues. Eventually, we determined that new franchises should be given a rating of 1300,3 and in conjunction with this change, we also regress teams toward a mean of 1505 (instead of 1500) after every season.4 This helps balance against the low ratings assigned to expansion teams, though it does mean the average team no longer carries a 1500 Elo rating.5

Aside from those slight adjustments, Elo still works exactly the same way it did last season: Teams gain and lose ground based on the final score of each game and how unexpected the result was in the eyes of the pregame ratings. Under Elo, teams pick up where they left off: The initial team rankings for 2015 are by definition the same as last season’s end-of-year rankings,6 only more compressed because of the regression toward the mean.

Going into Week 1, that means the Seattle Seahawks and New England Patriots are once again the NFL’s highest-rated teams, albeit with lower Elo ratings than when they faced off last season in one of the strongest championship matchups in NFL history. Why? Like other well-designed predictive rating systems, including ESPN’s new Football Power Index, Elo is appropriately cautious early in the season; a team needs to prove itself to warrant a very high or very low rating.7 Combine that with the luck inherent in the NFL — the best teams don’t always win — and even Elo’s top-rated teams, the Seahawks and the Patriots, have just a 15 percent and 14 percent chance of winning the Super Bowl, respectively.

Just like last season, we’ll be writing a weekly column using Elo as a jumping-off point to discuss the week’s games. And in between, you can find ratings and predictions on our interactive page.

Here’s to another great NFL season!

Filed under: NFL

Here’s the text of Nate’s original article, in case it gets erased by ABC News:

Introducing NFL Elo Ratings

If you followed FiveThirtyEight’s coverage during the World Cup, you know that we’re big fans of the World Football Elo Ratings. They’re based on a relatively simple system developed by the physicist Arpad Elo to rate chess players. But they can be adapted fairly easily for other head-to-head competitions from baseball to backgammon.

We thought we’d have a little fun and extend them to American football. In an accompanying post, you’ll find our initial Elo ratings for all 32 NFL teams (at this point, the ratings are based on a team’s standing at the end of last season, discounted slightly to reflect reversion to the mean). We’ve also developed a simulator program that plays out the NFL schedule thousands of times and projects a team’s likelihood of making the playoffs, based on a team’s record up to that point in time, its Elo rating, its remaining schedule and the NFL’s various tiebreaker rules. We plan to update these projections at the end of every week.

But first (inspired somewhat by The New York Times’s personification of its election model, Leo), we thought we’d “interview” the Elo system about how it does its work.

FiveThirtyEight: What are some of some of your best qualities?

Elo: I’m simple, transparent and easy to work with. I can do a lot with a little, such as calculating point spreads and the probability of either team winning a game.

Can I use you to beat Vegas?

I wouldn’t try that. Vegas lines account for a much wider array of information than I do. When Nate backtested me, he found that I got 51 percent of games right against the point spread. That’s not nearly enough to cover the house’s cut, much less to make a living.

We noticed that you have the Seattle Seahawks favored by 10 points in their Thursday-night game against the Green Bay Packers, while Vegas has the Seahawks as six-point favorites instead.

That’s a perfect example. Has anything strange been going on with the Packers?

Well, their star quarterback, Aaron Rodgers, was injured. Now he’s back!

If this Mr. Rodgers fellow is as good as you say he is, that could account for the difference. I don’t know anything about him. I only keep track of the final scores, the dates of games and where the games were played.

So what good are you?

Think of me as a benchmark. I do a pretty good job of accounting for the basic stuff — wins and losses, margin of victory, strength of schedule. I also retain a memory from past seasons, so I know that the Jacksonville Jaguars aren’t as likely to win the Super Bowl as the Denver Broncos. Can we get to some more technical questions?

Um … what are your parameters?

That’s more like it. Like K, for instance; K is my favorite parameter.

What makes K so special?

K tells me how much to update my ratings after each game. In a sport like baseball, where there are lots of games, any one additional game doesn’t tell you all that much, so K takes on a low value. In the NFL, it’s much higher. Specifically, it’s the number 20. That may not mean anything to you, but if you set K a lot higher than that, I’d be a nervous wreck and bounce around too much from game to game. And if you made K much lower, I’d be hopelessly sluggish and too slow to notice changes in the quality of team’s play.

I noticed the Detroit Lions have an Elo rating of 1467. What does that mean?

An average team has an Elo rating of 1500 — so your Lions are not so hot. But it could be a lot worse. In 2009, the Lions got all the way down to a rating of 1223. Most NFL teams wind up in the range of 1300 to 1700.

We’re still not quite sure how your ratings work. If you have one team at a 1650 and another at 1400, what does that mean?

If it makes things easier, you can translate my ratings into a point spread. Take the difference in my ratings and divide by 25. It’s that simple.

So, if one team is rated 250 Elo points higher than the other, that works out to a spread of 10 football points.

Precisely.

What about home-field advantage?

I can account for that, too. Historically, it’s been worth about 65 Elo ratings points or 2.6 NFL points. Just add that to the point spread.

What if you want to calculate a team’s probability of winning?

That’s pretty easy, too, although you’ll need a formula for it. In a game between Team A and Team B, Team A’s win probability is equal to:

Pr(A) = 1 / (10^(-ELODIFF/400) + 1)

Where ELODIFF is Team A’s Elo rating minus Team B’s Elo rating.

Let’s say Team A wins. Its Elo rating will improve?

Yes. One of my more appealing properties is that a team’s Elo rating will always improve after it wins and always decline after it loses. How much it improves will depend on how much of a favorite or an underdog it was.

So, like after the 2008 Super Bowl …

I can predict where you’re going with that question. I’ll admit that I didn’t have the New York Giants rated so highly compared to the New England Patriots. But the Giants’ Elo rating improved a lot after they won that game — more than the Patriots’ would have if they’d won instead. I may have my flaws, but unlike a lot of you human beings, I know how to fix them. The lower a team is rated, the easier for it to gain ground by proving me wrong.

Do you also account for margin of victory?

Affirmative. I took some inspiration from the soccer ratings, which account for goal differential in addition to the game result. But this is one of the more complicated parts.

For the NFL, I start by adding one point to team’s margin of victory and then take its natural logarithm. Then I multiply that result by the K value. That means I’m more moved by big wins than narrow ones, although there are diminishing returns. I’m not so impressed by the fifth touchdown when a team is ahead 28-0.

That seems simple enough.

It would be, but that isn’t all there is to it. We haven’t talked about my autocorrelation problem. It’s a little embarrassing.

Go on. “Autocorrelation”? Was that the weird David Cronenberg movie?

Autocorrelation is the tendency of a time series to be correlated with its past and future values. Let me put this into football terms. Imagine I have the Dallas Cowboys rated at 1550 before a game against the Philadelphia Eagles. Their rating will go up if they win and go down if they lose. But it should be 1550 after the game, on average. That’s important, because it means that I’ve accounted for all the information you’ve given me efficiently. If I expected the Cowboys’ rating to rise to 1575 on average after the game, I should have rated them more highly to begin with.

It’s true that if I have the Cowboys favored against the Eagles, they should win more often than they lose. But the way I was originally designed, I can compensate by subtracting more points for a loss than I give them for a win. Everything balances out rather elegantly.

The problem comes when I also seek to account for margin of victory. Not only do favorites win more often, but when they do win, they tend to win by a larger margin. Since I give more credit for larger wins, this means that their ratings tend to get inflated over time.

Is this also a flaw with the soccer Elo ratings?

Possibly. You may want to reconsider what you wrote about Germany.

So, how do you correct for this?

It isn’t complicated in principle. You just have to discount the margin of victory more when favorites win and increase it when underdogs win. The formula for it is as follows:

Margin of Victory Multiplier = LN(ABS(PD)+1) * (2.2/((ELOW-ELOL)*.001+2.2))

Where PD is the point differential in the game, ELOW is the winning team’s Elo Rating before the game, and ELOL is the losing team’s Elo Rating before the game.

It’s a little ugly, but we all have our vices.

I see that you have ratings for this year’s teams, but they haven’t played any games yet! How does that work?

I take their rating from the end of last season and discount it slightly. Specifically, I revert it to the mean by one-third. Remember that the mean Elo rating is 1500. So, if a team finished last season with a rating of 1800, I’ll revert it to 1700 when the new season begins. This whole notion of “season” is strange to me, by the way. We don’t have them in chess.

For now, the ratings are all about which teams were good last year?

Technically speaking, a game affects my ratings forever once it’s played, just with a smaller and smaller weight that gradually diminishes to almost nothing over time. But, yes, for the time being, my ratings are mostly about who was good last season. Games toward the end of the season will count more, especially games during last year’s playoffs.

Thanks for taking the time! So, you’re saying we should take the Seahawks?

How about a nice game of chess?

See the footnote above.

Which is effectively the Elo level at which the NFL’s expansion teams have played since the 1970 AFL-NFL merger.

More specifically, we regress each team’s rating to the mean by one-third.

Because the NFL hasn’t added an expansion franchise since 2002, the average at the end of the 2014 regular season was 1,504.9.

Had our tweaked version of Elo been in place at the end of last season.

Incidentally, some of you may be wondering why we used FPI in our season previews instead of Elo, and the answer is simple: This year’s preseason Elo ratings are just a regressed-to-the-mean version of last year’s final Elo ratings. The FPI, by contrast, is grounded in not only last year’s ratings but also Vegas over/under win totals and a poll of NFL experts.

NFL Elo Ratings Are Back!

Introducing NFL Elo Ratings

Discussion about this post