It’s almost Super Bowl Sunday, one of the most sacred holidays here in the United States. It’s the day of the championship game between the winners of the NFC and AFC, this year the Philadelphia Eagles and Kansas City Chiefs.
Yes, for those of you from outside of the US, I know our football is different from your football. And yes, our football doesn’t involve much kicking and our ball looks more like an egg than a ball. We still call it football.
Anyway, football is an interesting sport because it is very specialized. There are completely different squads that play on offense and defense (not to mention special teams). Each team has 11 players on the field, and a total of 53 on the roster. Of those 53, the starting quarterback is traditionally seen as the most important. Teams will mortgage their futures to advance high enough in the draft to land a star quarterback. But is that wise?
Look, I know that the quarterback is an important position. But to me the position seems to be overrated. A great quarterback surrounded by a mediocre team will have a bad season. And when you look at the most successful QBs in the league, many of them were not drafted that early. Jalen Hurts, who just brought the Eagles to the Super Bowl went in the second round. Tom Brady didn’t go until the 6th round. Joe Montana went in the third round. And then of course Brock Purdy was the very last pick in the 2022 draft and given the dubious title “Mr. Irrelevant” before taking the 49ers to the Super Bowl.
I’m no football expert but I would argue the offensive linemen are underrated. You aren’t going to see a offensive tackle or guard on the cover of a video game and they are rarely the team’s most well known star player. But without a good line that star QB you drafted in the first round is going to be in trouble. Best case scenario, he will get sacked or won’t have enough time in the pocket to make a the plays he needs to make. Worst case scenario, he gets injured.
Is there a way to use data to verify this hypothesis? Maybe. I figured this would be a good problem to explore some Clojure data science libraries with, including the Clerk notebook system. My code is on my github and notebook output is available here if you want to see a deeper dive. This will just be a summary of my results.
I downloaded the draft data for the past couple decades, along with each team’s record at the end of the season. I then used each team’s draft history for the previous 12 years and used that to predict their winning percentage using linear regression. So how well did the model perform?
F-statistic: 4.262456174908303 on degrees of freedom: {:residual 661, :model 10, :intercept 1}
p-value: 8.996957588269794E-6
R2: 0.06057855860393235
Adjusted R2: 0.046366433923205275
Residual standard error: 0.18696060921556648 on 661 degrees of freedom
AIC: -333.7338876432864
Well it wasn’t a particularly predictive model. It had an R^2 of only 0.06, so it doesn’t explain much of a team’s record. It turns out there is more to football than just when different positions are drafted. But it did have statistically significant F statistic, so it does seem like there is some valuable information here. What did the coefficients look like?
| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval |
|-----------+-----------+----------+-----------+----------+----------------------|
| Intercept | 0.221996 | 0.078155 | 2.840479 | 0.004643 | [0.068535 0.375458] |
| QB | -1.6E-5 | 2.4E-5 | -0.685462 | 0.493293 | [-6.4E-5 3.1E-5] |
| RB | -3.3E-5 | 1.8E-5 | -1.768212 | 0.077487 | [-6.9E-5 4.0E-6] |
| DL | -8.0E-6 | 1.1E-5 | -0.724203 | 0.469197 | [-3.0E-5 1.4E-5] |
| LB | 3.6E-5 | 1.5E-5 | 2.455527 | 0.014324 | [7.0E-6 6.5E-5] |
| DB | 1.5E-5 | 1.3E-5 | 1.132575 | 0.257803 | [-1.1E-5 4.1E-5] |
| OL | 5.5E-5 | 1.4E-5 | 3.940833 | 9.0E-5 | [2.7E-5 8.2E-5] |
| TE | -1.0E-6 | 2.5E-5 | -0.035517 | 0.971679 | [-5.1E-5 4.9E-5] |
| WR | 2.4E-5 | 1.5E-5 | 1.587041 | 0.112981 | [-6.0E-6 5.4E-5] |
| K | 1.43E-4 | 6.0E-5 | 2.375624 | 0.017803 | [2.5E-5 2.61E-4] |
| P | -8.0E-6 | 6.1E-5 | -0.132145 | 0.89491 | [-1.28E-4 1.11E-4]
Or graphically

Well the QB coefficient was in fact negative. However it wasn’t even close to being statistically significant. There were only three positions that were significant at p < 0.05 and all three were positive, linebackers, kickers, and, as I suspected, offensive linemen. Kickers had the highest coefficient, but they also had a very large error margin as there just weren’t many of them. Offensive linemen were the only position that were significant at p < 0.01.
So does this mean drafting linemen early in the draft will make a team better? Not quite. Remember, correlation does not imply causation. At best it means drafting linemen early in the draft is something that good teams often do. And there are reasons to doubt the causality of that relationship. Teams that are set at the skill positions may be able to afford to use their earlier draft picks on less exciting positions. And of course there is the complicating factor that the draft order is set based on the team’s record the preceding season. So the teams picking first were pretty bad teams the previous year. Since top draft picks often go to positions such as QB, there are a lot of bad teams with a high drafted quarterback.
So if I somehow found myself owning an NFL team, what approach would I take in the draft? I would give free reign to the coach and other staff to make decisions as they see fit. I grew up in the Washington DC metro. I know what happens when ownership thinks they know more than the experts on how to build a team.