We Don’t Know Which NFL Teams Have Hard (Or Easy) Schedules
Ranking off of last season's winning percentages assumes a lot of things that will probably end up being wrong.

It’s been a few weeks since the NFL released its full 2024 schedule, so I thought I would revisit one of my biggest pet peeves about that process: the way we talk about next season’s strength of schedule for each team.
Aside from highlighting the juicy matchups sure to bring in the most viewers, the first thing folks tend to do after the schedule release is to rank who has the easiest or most difficult slates lined up next season. And the way they do that is by calculating the combined 2023 records of a team’s opponents. By this accounting, the Cleveland Browns (with an average opponent’s winning percentage of .547) will play the toughest schedule of any team — because of course they will — while the Atlanta Falcons and New Orleans Saints (at .453 apiece) will play the easiest schedules.
That seems like a relatively straightforward way of measuring schedule strength, until you consider that we mostly don’t know how good or bad any team will be next season — and that using last year’s record isn’t even the best way to make a prediction about team quality, limited as our general ability to do that might be.
How wrong do these schedule-strength predictions based on last year’s winning percentage end up being? Going back to the NFL’s last expansion/realignment in 2002, let’s take a look at a scatter plot of each team’s preseason SOS ranking (based on opponents’ records from the previous season) and their actual SOS ranking (based on opponents’ actual records from that season):
Needless to say, the relationship there is slim. (It’s a Spearman rank correlation coefficient of 0.20, for anyone who cares.) For instance, a team projected for the No. 1 hardest schedule in preseason using this method ended up among the easiest half of NFL schedules one-third of the time. And teams projected for the 31st or 32nd-hardest schedules1 ended up among the hardest half of schedules nearly 40 percent of the time.
The point is, these official preseason schedule strength rankings based on last year’s records tell us next to nothing about who will actually play a hard or easy schedule.
So what’s a good solution? We can improve the rank correlation to 0.30 by swapping out last year’s winning percentage with a multi-year, regressed-to-the-mean projection2 of a team’s Simple Rating System (SRS) — although as you can see when I list those results below, it still gets us to a similar place at the top and bottom of the rankings.
At the margins, the Rams may have a tougher schedule than the basic method would have you believe, and the Jaguars might have it easier than it seems. (That’s cool… I guess?) SRS is better to use than winning percentage because a team’s SRS from the previous season has a higher correlation with its winning percentage this season than its previous winning percentage does. We could also make adjustments for home-field advantage, and rather than using average opponent winning percentage as our target measure of in-season schedule strength, we could use Pro-Football-Reference’s SOS metric instead.
But the bigger takeaway from all of this is that nobody really knows how difficult a team’s schedule is going to be next season. (And that’s because nobody really knows how good any given team is going to be next season.) Knowing that, maybe the only solution is to not talk about or rank future schedule strengths at all.
Filed under: NFL
Because of ties, there wasn’t always a solo 32nd-toughest schedule each year.
With the following weights given to each category: Last year’s SRS (38.8%); SRS from 2 years ago (14.8%); SRS from 3 years ago (1.2%); regression to a 0.0 SRS (45.1%).
I so needed this. As my kids will tell you, I constantly rant about how many positively dumb things are associated with the NFL. So much so that I now just refer to the League as the "Idiot Magnet" - my kids immediately know what I'm talking about.
The list of mind-numbing irritants is seemingly endless and grows each year. These include continually dumb play calling by geared up Coordinators with headsets and enormous and growing plastic play sheets that somehow instruct them to continue to run on 3rd and 11 year after year for 1 yard, commentators talking about EPA without knowing if it's a stat or a governmental agency, an ever expanding lexicon of ridiculous terminology and jargon to try and sound smart (position "rooms" instead of groups e.g. "the Jets quarterback room is special this year"), numerical personal packages e.g. "the Rams use 1-2 personnel packages more than anyone", those super complicated defensive coverages e.g. "Tampa 2" becoming "2-Deep Coverage" becoming "2-High Coverage" becoming "Cover 2" for the same formation, secretive playbook mythologies that run rampant in the month of June (even though Baker Mayfield apparently learned the entire Ram's super complicated playbook of genius Sean McVay in less than 48 hours after being signed as a FA to start), OTAs where every rookie looks "amazing" and "the real deal" throwing in shorts to uncovered receivers 5 yards downfield, later ridiculous training camps where (if you watch them live on NFL Network) 95% of the guys are just standing around and drinking sponsored sports drinks, useless preseason games and the Real Househusbands of the NFL known as Hard Knocks in August that has - literally - the same script each year (some rookie struggling to make the cut, isolation shots of sprinklers coming on against a deep blue sky, the supposed insane veteran linebacker yelling at everyone because he thinks that is "leadership," the overly affected zany quarterback preening for the cameras who can't decide if he wants to be in the NFL or onstage at Chuckles Comedy Club, the "Turk" giving the cut news, and so on.
The Magnet is so strong.
This year the Magnet added its scheduling mysteries to the mythos. They had a guy making the media rounds talking about how "complex" the schedule is and how they "use a ton of computers to get it done." The show's hosts were agog at the opportunity to see a bit of the Wizard behind the curtain and went full Mike Wallace on the guy by pressing him on whether the NFL had a Taylor Swift subroutine to somehow miraculously get this all done. Somebody seriously suggested that going to 18 games would be "mind blowing" and would require even more computers...I'm not kidding.
As I drove listening to this drivel, I wondered how MLB does it with 162 games 7 days a week instead of 17 over 4 days a week without all of the hoopla. Gosh, they must have an entire secret underground city of AI driven computer people - like the telepathic descendants of the post-nuclear survivors in Beneath the Planet of the Apes - but I digress.
The "Scheduling Guy" appearances led - of course - to an explosion of deep thoughts by these shows on strength of schedule because dumb invariably leads to dumber. Yet another example of people believing that using crude ensemble averages of past performances loaded with inapplicable attributes for the present use case are somehow predictive instead of descriptive.
Thank you for making the "reader room" smarter again.
But I do wonder at times whether we can we salvage some insight from all the madness? Can looking at the NFL's annual application of leeches, bleeding out of blood humors or touching relics to cure St. Vitus's Dance, somehow offer a way forward out of the madness? Can we dig a little deeper and perhaps - perhaps - avoid the inevitable pull of the Idiot Magnet to achieve something useful from all their dumb self-serving ignorance?
So, I do think about things such as....
Do so called "rest edges" where a team has a day or more of additional rest matter?
Would overlaying Approximate Value trends for the upcoming roster (total value and increasing or decreasing directionally) suggest potential better outcomes?
Does the timing of the bye week matter? Is early, middle or later more predictive of success?
Can using Vegas team record odds for the upcoming season as a wisdom of crowds exercise produce a a more accurate estimate of SoS?
Why does MLB's various SoS algorithms seem so much more accurate? Larger sample size?
Just some rumblings, but for now it's June and I'm exhausted resisting the pull of the Magnet. Thanks again for the interesting insights.
Given the transfer portal and recruiting, I feel like the correlation may be even less