TWISTR IHSAA SPORTS RATINGS

Home Page | TWISTR Blog

Fall: Football | Boys Soccer | Girls Soccer | Volleyball

Winter: Boys Basketball | Girls Basketball

Spring: Baseball | Softball

Frequently Asked Questions


Last Post | Blog Home | Next Post

November 26, 2023

Blog: Some thoughts on small samples and connectivity (introducing football ratings to TWISTR)

On Thanksgiving Day, helped by a little bit of rare free time, I clicked "publish" on IHSAA football ratings for the very first time. Given the next two days marked the end of the 2023 IHSAA football season, I concede it's kind of a useless time to publish the initial ratings. The delay is for a good reason, though — it's really hard to model football, especially at the high school level.

When we build computer ratings for high school basketball or other sports, we're banking primarily on two factors coming through for us:

  1. Large sample sizes to reliably identify how strong each individual team is
  2. Connectivity between teams to reliably compare groups of teams that don't play each other

Football … really provides us with neither of these.

1. It's fundamentally a small sample sport

Let's think about basketball for a second. If you follow a basketball team over the course of the season, you'll probably see some games where they shoot the lights out, some where they struggle from the field, and others where they're somewhere in the middle — and over the course of the season, you'll get a good reference for what a "typical" performance looks like.

When you break basketball down, though, you're not necessarily looking at 25 games of action, but maybe 1,500 individual offensive and 1,500 defensive possessions over the course of a season that represent individual plays. For every "successful" play, a team scores points on offense or holds their opponent on defense; the opposite are true for "unsuccessful" plays. Because the final score directly gives us information on the team's gap between "successful" and "unsuccessful" plays, we're able to gather a lot of info over the course of the season to help us produce fairly reliable ratings based on the outcome of 3,000 plays.

Football, at least with the information available to us at the high school level, doesn't give us that same opportunity. Like in basketball, we see the end result of possessions on the scoreboard, but each team might only have the ball 10 or 12 times in a game (compared to maybe 50 or 60 in basketball).[1] When you couple that with football's much smaller 9-game regular season, a team may only have 100 individual offensive and 100 defensive possessions over the course of a season — a much less reliable sample of information compared to basketball.

The fundamental difference is that in football, many plays make up one possession, and in any given possession, some plays might be "successful" even if the possession is "unsuccessful". Consider a drive where a team starts at its own 20 and picks up 15, 10, 30 and 20 yards on its first four plays, driving to the opponent's 5-yard line. On first-and-goal, the quarterback and running back botch a handoff, and the defense recovers the fumble. If you're betting on whether or not that offense will score the next time it has the ball, are you more likely to say yes because they moved the ball down the field easily on four plays, or no because of the one bad play?

Our model doesn't get all that info though, because all it knows is that the offense "failed" and the defense "succeeded", despite … doing nothing other than falling on a loose ball that bounced their way. And because football is a short season with a small number of possessions per game, a few good or bad plays out of the ordinary might not be balanced out over the course of the season.

Let's finish with a note: football doesn't have to be a small sample size sport. Advanced models like Brian Fremeau's FEI ratings or Bill Connelly's SP+ ratings (posted each week in a new location at ESPN+) are able to use play-by-play data to more reliably dive into box scores and rate teams based on every individual play, not just the final outcome of a possession. But much like with volleyball, where we don't have reliable visibility into point-by-point scores, we don't get that luxury at the high school level — which makes it harder to assess the true calibre of each individual team.

2. Teams will never be well connected

That said, the small sample issue doesn't keep us from generating ratings that are still pretty decent. Much like the TWISTR ratings for volleyball, our football ratings are directionally strong and still do a good job of rating teams, even if they can't pick up on all the nuance hidden in the box score. Having some information is still better than having no information.

No information, though, describes the fundamental issue with rating football: teams aren't well connected, and they never will be.

Let's use Jasper, a member of the 10-team SIAC since 2020, as an example. This year, Jasper's boys basketball team will play 9 conference games and 13 non-conference games, all against IHSAA schools rated by TWISTR. If I wanted to compare Jasper to, say, Huntington North, I'll be able to do that pretty easily — Jasper plays New Albany, who plays Kokomo, who plays Huntington North. Given that the TWISTR ratings are really a whole network of many transitive comparisons between teams, we can do a pretty good job by the end of the year of assessing the relative strength of these two teams.

In football, however, the SIAC plays a 9-game conference schedule, which means its teams play no non-conference games during the regular season. That's a problem for TWISTR or any other rating system — how do you compare two teams when there's no transitive property chain available to you?

The SIAC is the only conference playing a 9-game schedule today, but others have in the past — the Summitt split into divisions for the first time this year after playing 9-game conference schedules from 1997 through 2009 and again from 2015 through 2022, while the Hoosier Crossroads did from 2006 through 2013 — and it creates a fundamental challenge, one that's important to solve as the SIAC is a league with state title contenders most years (half the conference has been to a state title game this century, with four schools winning seven titles between them).

We're able to start building these transitive property chains once SIAC schools start playing non-conference opponents in the state tournament, but (1) it doesn't help us with pre-tournament projections and (2) the whole conference still played only 12 non-conference opponents in the state tournament this year — fewer opponents than one school will play in a whole basketball regular season.

Going forward, I anticipate publishing TWISTR ratings for football sooner by developing "priors" that inform how strong we expect any individual school to be heading into the season, based on historical data. It'll still be a bit of an inexact science until we get into the tournament, but it's the best we'll be able to do.

(By the way, this whole thing is actually be the biggest issue with seeding the football state tournament, but that's for another blog post.)

[1] From 2007 through 2022, FBS games averaged 26.3 total possessions per game, per Brian Fremeau. I think it's reasonable to expect the average high school game has fewer possessions, both because of the shorter clock (12-minute quarters vs. 15) and the bias toward run-dominant offenses you often see at the high school level. I'd also read intro to the "Notes" section of his site, which serves as a good introduction to why efficiency data is important for properly assessing the calibre of football teams.