Monday, 27 August 2012

Statistical inference and mathematical modelling (Part 4)

So, now let's apply what we have observed about the variable behaviour of an NFL team's seasonal interception totals to a live betting market.

Over/under quotes on total interceptions of all 32 NFL teams for the 2012 season are available via:

http://www.oddschecker.com/american-football/nfl-specials

** warning: prices referred to in the following copy may be out of date **

To recap, these are the statistical dimensions of interceptions that we established in the previous post:

1) A team's total interceptions for one season is only weakly correlated with its total for the following season;

2) In general, a team's total interceptions is, in effect, a somewhat random draw from its passes defended;

3) In year y, a team's total of passes defended predicts its interception total in year y+1 better than total interceptions.

In particular, result 3) is highly encouraging from the point of view of finding an edge. The Over/Under assessments are, as might be anticipated, highly influenced by each team's recent performance in total interceptions, yet:

There is only a weak correlation between total interceptions in consecutive seasons, r=0.151;
passes defended is a better guide to total interceptions in year y+1, r=0.262

4.1 Introducing linear regression

Linear regression is the bread-and-butter method of sports modelling. It is a mathematical technique of explaining the relationship between two or more quantities, the first described by the dependent variable and the others by explanatory or independent variables.

For now, let us consider the simplest case of two variables a and b linked by the relationship:

b = ma + c

In this case, b = a team's total interceptions for year y+1, while a = a team's total of passes defended for year y. The task is to determine values for m and c which describes a "best-fit line" between the 128 data-points from teams in the NFL seasons 2007-2011.

The answer is b = 0.0924a + 7.47

So, by plugging in a team passes defended for year y as the descriptive variable a, we can predict or project a better estimate of total interceptions for year y+1 than we can by using total interceptions for year y as the descriptive variable instead.

Let's make the calculations for the upcoming 2012 NFL season:

Team Pdef 2011 INT 2011 xINT 2012
Arizona Cardinals 95 10 16.3
Atlanta Falcons 99 19 16.6
Baltimore Ravens 122 15 18.7
Buffalo Bills 87 20 15.5
Carolina Panthers 84 14 15.2
Chicago Bears 85 20 15.3
Cincinnati Bengals 83 10 15.1
Cleveland Browns 78 9 14.7
Dallas Cowboys 72 15 14.1
Denver Broncos 68 9 13.8
Detroit Lions 94 21 16.2
Green Bay Packers 129 31 19.4
Houston Texans 114 17 18.0
Indianapolis Colts 55 8 12.6
Jacksonville Jaguars 79 17 14.8
Kansas City Chiefs 104 20 17.1
Miami Dolphins 80 16 14.9
Minnesota Vikings 58 8 12.8
New England Patriots 84 23 15.2
New Orleans Saints 99 9 16.6
New York Giants 104 20 17.1
New York Jets 90 19 15.8
Oakland Raiders 106 18 17.3
Philadelphia Eagles 78 15 14.7
Pittsburgh Steelers 83 11 15.1
San Diego Chargers 82 17 15.0
San Francisco 49ers 124 23 18.9
Seattle Seahawks 105 22 17.2
St. Louis Rams 77 12 14.6
Tampa Bay Buccs 73 14 14.2
Tennessee Titans 81 11 15.0
Washington Redskins 85 13 15.3

Now, let's now compare our projections of xINT - here, our projected total interceptions for each team in 2012 - with the outlying Over/Under totals among the range of bookmakers offering prices on the market:

Team xINT 2012 O/U quote Difference
New England Patriots 15.2 22.0 -6.8
Green Bay Packers 19.4 24.5 -5.1
Chicago Bears 15.3 20.0 -4.7
New Orleans Saints 16.6 12.5 +4.1
Dallas Cowboys 14.1 18.0 -3.9
Philadelphia Eagles 14.7 18.5 -3.8
Buffalo Bills 15.5 18.5 -3.0
Atlanta Falcons 16.6 19.0 -2.4
Arizona Cardinals 16.3 14.0 2.3
Carolina Panthers 15.2 17.5 -2.3
Baltimore Ravens 18.7 16.5 +2.2
Detroit Lions 16.2 18.0 -1.8
Oakland Raiders 17.3 15.5 +1.8
New York Jets 15.8 17.5 -1.7
San Francisco 49ers 18.9 20.5 -1.6
Houston Texans 18.0 16.5 +1.5
San Diego Chargers 15.0 16.5 -1.5
Kansas City Chiefs 17.1 18.5 -1.4
New York Giants 17.1 18.5 -1.4
Minnesota Vikings 12.8 11.5 1.3
Tampa Bay Buccs 14.2 15.5 -1.3
Cleveland Browns 14.7 13.5 +1.2
Indianapolis Colts 12.6 11.5 +1.1
Miami Dolphins 14.9 16.0 -1.1
Tennessee Titans 15.0 14.0 +1.0
Seattle Seahawks 17.2 18.0 -0.8
Washington Redskins 15.3 14.5 +0.8
Jacksonville Jaguars 14.8 15.5 -0.7
Cincinnati Bengals 15.1 14.5 +0.6
Pittsburgh Steelers 15.1 14.5 +0.6
St. Louis Rams 14.6 15.0 -0.4
Denver Broncos 13.8 13.5 +0.3




average 15.7 16.6 -0.8

The table is sorted by the magnitude of the difference between our estimate of 2012 total interceptions and that of the bookmakers. In other words, the direction of disagreement (the + or - sign) has been removed.

At face value, it appears there are some sizeable differences. So, is it time to bet the under of the Patriots, Packers and Bears and the over of the Saints with everything we can muster?

No. Not yet.

4.2 Establishing and improving forecast accuracy

The correlation between two variables - such as total interceptions and passes defended - amounts to evidence that the two were related (or, to be correct, may have been related) in the past. But it does not mean that one is the cause of the other, or that the same relationship will hold true in the future.

(It is into this bear-trap that the majority of poor forecasts fall and die. I'm not going to talk about statistical significance and other measures of hypothesis testing here because it will slow down my flow. You will just have to trust that I am acutely aware of their importance in forging my conclusions.)

In modelling sports, a highly complex and dynamic interaction of variables is most often the driving force behind outcomes like interceptions or goals. But we need to produce relatively simple models to understand the probability with which they come about, even if we know that the limited array of variables we are using is not the full story.

Having produced our forecasts for the total interceptions of NFL teams in the upcoming 2012 season using just one predictor - passes defended in the 2011 season - we have probably reached a better understanding of what 'causes' interceptions better than the vast majority of American Football fans and - most importantly - bettors.

But, this is not just an exercise in playing with numbers. We are driving towards a sound mathematical conclusion on which we can bet money with the expectation of a positive result. We can improve our forecasts by a good deal yet.

We know that our projection method works in general by observing a correlation between passes defended in year y and total interceptions in year y+1. But let's now look at how the projections have fared in past seasons, importantly, by NFL team:

Team Year Pdef Int INT luck Wt avg
Arizona Cardinals 2007 84 18 3.1 -1.4
Arizona Cardinals 2008 88 13 -2.6
Arizona Cardinals 2009 108 21 1.9
Arizona Cardinals 2010 97 17 -0.2
Arizona Cardinals 2011 95 10 -6.8
Atlanta Falcons 2007 85 16 0.9 1.5
Atlanta Falcons 2008 85 10 -5.1
Atlanta Falcons 2009 78 15 1.2
Atlanta Falcons 2010 91 22 5.9
Atlanta Falcons 2011 99 19 1.4
Baltimore Ravens 2007 106 17 -1.8 -0.1
Baltimore Ravens 2008 125 26 3.8
Baltimore Ravens 2009 100 22 4.3
Baltimore Ravens 2010 99 19 1.4
Baltimore Ravens 2011 122 15 -6.6
Buffalo Bills 2007 98 18 0.6 1.1
Buffalo Bills 2008 83 10 -4.7
Buffalo Bills 2009 109 28 8.7
Buffalo Bills 2010 88 11 -4.6
Buffalo Bills 2011 87 20 4.6
Carolina Panthers 2007 81 14 -0.4 0.8
Carolina Panthers 2008 95 12 -4.8
Carolina Panthers 2009 88 22 6.4
Carolina Panthers 2010 85 17 1.9
Carolina Panthers 2011 84 14 -0.9
Chicago Bears 2007 78 16 2.2 3.0
Chicago Bears 2008 121 22 0.5
Chicago Bears 2009 66 13 1.3
Chicago Bears 2010 94 21 4.3
Chicago Bears 2011 85 20 4.9
Cincinnati Bengals 2007 97 19 1.8 -1.5
Cincinnati Bengals 2008 93 12 -4.5
Cincinnati Bengals 2009 112 19 -0.9
Cincinnati Bengals 2010 83 16 1.3
Cincinnati Bengals 2011 83 10 -4.7
Cleveland Browns 2007 103 17 -1.3 -0.8
Cleveland Browns 2008 90 23 7.0
Cleveland Browns 2009 83 10 -4.7
Cleveland Browns 2010 96 19 2.0
Cleveland Browns 2011 78 9 -4.8
Dallas Cowboys 2007 103 19 0.7 0.1
Dallas Cowboys 2008 68 8 -4.1
Dallas Cowboys 2009 99 11 -6.6
Dallas Cowboys 2010 82 20 5.5
Dallas Cowboys 2011 72 15 2.2
Denver Broncos 2007 80 14 -0.2 -2.8
Denver Broncos 2008 57 6 -4.1
Denver Broncos 2009 97 17 -0.2
Denver Broncos 2010 86 10 -5.2
Denver Broncos 2011 68 9 -3.1
Detroit Lions 2007 76 17 3.5 -0.1
Detroit Lions 2008 48 4 -4.5
Detroit Lions 2009 80 9 -5.2
Detroit Lions 2010 76 14 0.5
Detroit Lions 2011 94 21 4.3
Green Bay Packers 2007 90 19 3.0 5.6
Green Bay Packers 2008 110 22 2.5
Green Bay Packers 2009 126 30 7.7
Green Bay Packers 2010 110 24 4.5
Green Bay Packers 2011 129 31 8.1
Houston Texans 2007 86 11 -4.2 -1.6
Houston Texans 2008 71 12 -0.6
Houston Texans 2009 89 14 -1.8
Houston Texans 2010 69 13 0.8
Houston Texans 2011 114 17 -3.2
Indianapolis Colts 2007 84 22 7.1 0.9
Indianapolis Colts 2008 63 15 3.8
Indianapolis Colts 2009 84 16 1.1
Indianapolis Colts 2010 64 10 -1.3
Indianapolis Colts 2011 55 8 -1.8
Jacksonville Jaguars 2007 83 20 5.3 2.6
Jacksonville Jaguars 2008 69 13 0.8
Jacksonville Jaguars 2009 71 15 2.4
Jacksonville Jaguars 2010 61 13 2.2
Jacksonville Jaguars 2011 79 17 3.0
Kansas City Chiefs 2007 83 14 -0.7 -1.4
Kansas City Chiefs 2008 70 13 0.6
Kansas City Chiefs 2009 96 15 -2.0
Kansas City Chiefs 2010 110 14 -5.5
Kansas City Chiefs 2011 104 20 1.6
Miami Dolphins 2007 70 14 1.6 -0.9
Miami Dolphins 2008 99 18 0.4
Miami Dolphins 2009 90 15 -1.0
Miami Dolphins 2010 93 11 -5.5
Miami Dolphins 2011 80 16 1.8
Minnesota Vikings 2007 88 15 -0.6 -1.0
Minnesota Vikings 2008 78 12 -1.8
Minnesota Vikings 2009 73 11 -1.9
Minnesota Vikings 2010 77 15 1.3
Minnesota Vikings 2011 58 8 -2.3
New England Patriots 2007 91 19 2.9 5.0
New England Patriots 2008 71 14 1.4
New England Patriots 2009 86 18 2.8
New England Patriots 2010 102 25 6.9
New England Patriots 2011 84 23 8.1
New Orleans Saints 2007 86 13 -2.2 -2.9
New Orleans Saints 2008 106 15 -3.8
New Orleans Saints 2009 109 26 6.7
New Orleans Saints 2010 77 9 -4.7
New Orleans Saints 2011 99 9 -8.6
New York Giants 2007 90 15 -1.0 -0.9
New York Giants 2008 92 17 0.7
New York Giants 2009 92 13 -3.3
New York Giants 2010 103 16 -2.3
New York Giants 2011 104 20 1.6
New York Jets 2007 82 15 0.5 -1.0
New York Jets 2008 91 14 -2.1
New York Jets 2009 103 17 -1.3
New York Jets 2010 96 12 -5.0
New York Jets 2011 90 19 3.0
Oakland Raiders 2007 87 18 2.6 -1.2
Oakland Raiders 2008 86 16 0.8
Oakland Raiders 2009 77 8 -5.7
Oakland Raiders 2010 75 12 -1.3
Oakland Raiders 2011 106 18 -0.8
Philadelphia Eagles 2007 75 11 -2.3 1.0
Philadelphia Eagles 2008 107 15 -4.0
Philadelphia Eagles 2009 117 25 4.3
Philadelphia Eagles 2010 113 23 3.0
Philadelphia Eagles 2011 78 15 1.2
Pittsburgh Steelers 2007 88 11 -4.6 -1.4
Pittsburgh Steelers 2008 107 20 1.0
Pittsburgh Steelers 2009 79 12 -2.0
Pittsburgh Steelers 2010 109 21 1.7
Pittsburgh Steelers 2011 83 11 -3.7
San Diego Chargers 2007 119 30 8.9 1.9
San Diego Chargers 2008 90 15 -1.0
San Diego Chargers 2009 80 14 -0.2
San Diego Chargers 2010 83 16 1.3
San Diego Chargers 2011 82 17 2.5
San Francisco 49ers 2007 78 12 -1.8 0.3
San Francisco 49ers 2008 85 12 -3.1
San Francisco 49ers 2009 87 18 2.6
San Francisco 49ers 2010 79 15 1.0
San Francisco 49ers 2011 124 23 1.0
Seattle Seahawks 2007 97 20 2.8 -0.9
Seattle Seahawks 2008 75 9 -4.3
Seattle Seahawks 2009 77 13 -0.7
Seattle Seahawks 2010 97 12 -5.2
Seattle Seahawks 2011 105 22 3.4
St. Louis Rams 2007 89 18 2.2 -1.1
St. Louis Rams 2008 65 12 0.5
St. Louis Rams 2009 59 8 -2.5
St. Louis Rams 2010 91 14 -2.1
St. Louis Rams 2011 77 12 -1.7
Tampa Bay Buccs 2007 84 16 1.1 2.9
Tampa Bay Buccs 2008 95 22 5.2
Tampa Bay Buccas 2009 82 19 4.5
Tampa Bay Buccs 2010 89 19 3.2
Tampa Bay Buccs 2011 73 14 1.1
Tennessee Titans 2007 108 22 2.9 1.1
Tennessee Titans 2008 106 20 1.2
Tennessee Titans 2009 83 20 5.3
Tennessee Titans 2010 89 17 1.2
Tennessee Titans 2011 81 11 -3.4
Washington Redskins 2007 97 14 -3.2 -3.7
Washington Redskins 2008 102 13 -5.1
Washington Redskins 2009 85 11 -4.1
Washington Redskins 2010 103 14 -4.3
Washington Redskins 2011 85 13 -2.1

In several cases, teams like the Green Bay Packers, New England Patriots and Chicago Bears can sustain interception luck - a notably higher percentage of passes defended compared with total interceptions - over several seasons. (I have back-tested this on 10 years of data before 2007 and found a similar effect is at play.)

So, rather than assuming a league-average rate of total interceptions per passes defended it turns out that we can make much better predictions if taking account of each team's historical percentage. The weighted average (in which recent totals count for slightly more) of each team's interception luck is included in the right-hand column.

We can then recalculate by projecting passes defended in year y to total interceptions in year y+1 at a rate nearer that which is typical for the team than the league.

From a technical standpoint, this can be referred to as regressing a rate statistic to a proposition-specific rather than general average.

Again, this is vital technique of sports modelling. Where rate stats are concerned, we need to be careful before assuming that deviations from expectation are not just noise. But, if it is the case there is an underlying signal in this example, from what is its source?

Every NFL team runs a defensive scheme, styled most often by its defensive coordinator (but sometimes, in the case of the New England Patriots, by its defensive-minded head coach).

It turns out that schemes which rely heavily on zone defense (those of the Tampa-2 and Cover 2 family) generally intercept passes at a higher rate than the league average of 17.7% of passes defended, while those schemes which are blitz-heavy or emphasise pressure (typically the 3-4 set-ups of Pittsburgh, Baltimore and Arizona) generally intercept a lower percentage.

The difference, I would guess, is a function of both the number of defenders in coverage on each pass play and, perhaps more importantly, their responsibilities. In zone schemes, defenders face the opposing quarterbacks and thus may get a better read on the ball, whereas man-to-man defenders are focussed on the opposing receivers and may have less time to sight the ball and pick it off, rather than just deflecting it.

4.3 Our final projections

Employing this approach leads to a considerable higher success-rate in projecting total interceptions when back-tested on data of a near-significant sample-size. The correlation between xINT and INT improved from 0.261 to 0.478.

Although the latter figure is still lower than ideal, compensation can be made in our projections for teams which have changed defensive schemes, or are otherwise more likely to face a higher number of pass plays. We must also take into account that there is an upward trend in passes attempted (and hence intercepted) in the NFL.

Team xINT 2012 O/U quote Diff
New England Patriots 18.8 22 -3.2
New Orleans Saints 15.6 12.5 +3.1
Baltimore Ravens 19.4 16.5 +2.9
Chicago Bears 17.3 20 -2.7
Philadelphia Eagles 16.2 18.5 -2.3
Dallas Cowboys 15.8 18 -2.2
Arizona Cardinals 16.1 14 +2.1
Tennessee Titans 16.0 14 +2.0
Buffalo Bills 16.6 18.5 -1.9
Green Bay Packers 22.7 24.5 -1.8
Denver Broncos 15.3 13.5 +1.8
Oakland Raiders 17.2 15.5 +1.7
New York Jets 15.8 17.5 -1.7
Kansas City Chiefs 16.9 18.5 -1.6
Indianapolis Colts 13.1 11.5 +1.6
Carolina Panthers 16.1 17.5 -1.4
New York Giants 17.1 18.5 -1.4
Detroit Lions 16.7 18 -1.3
Minnesota Vikings 12.8 11.5 +1.3
Cleveland Browns 14.8 13.5 +1.3
Houston Texans 17.7 16.5 +1.2
San Francisco 49ers 19.4 20.5 -1.1
Miami Dolphins 14.9 16 -1.1
Jacksonville Jaguars 16.6 15.5 +1.1
Atlanta Falcons 18.1 19 -0.9
Pittsburgh Steelers 15.3 14.5 +0.8
Washington Redskins 15.2 14.5 +0.7
Seattle Seahawks 17.7 18 -0.3
Cincinnati Bengals 14.8 14.5 +0.3
St. Louis Rams 14.7 15 -0.3
Tampa Bay Buccs 15.7 15.5+0.2
San Diego Chargers 16.5 16.5 =


Note that the average of my projections (xINT) = 16.47 while the average of the bookmakers outlying quote = 16.56.

In the next blog, I will be showing how to calculate what these differences mean in terms of percentage chance of cashing an Over/Under bet on these propositions in general.