Wednesday, 29 August 2012

Statistical inference and mathematical modelling (Part 5)

5.1 Understanding the importance of distributions

So, I have my projections about the total interceptions market for the 32 NFL teams. How do I know the corresponding market edge?

Let's make an example of the projection I made in the last part for the Tennessee Titans in 2012:

xINT = 16.0
O/U = 14.0

The metric xINT represents the calculated total interceptions which Tennessee are most likely to have in 2012 according to my analysis of the data. But it's very important to have knowledge of the distribution of possible values of total interceptions which this estimate implies.

To establish this we need to examine and evaluate the distribution of past values of total interceptions in past NFL seasons to see how they are distributed (some smoothing of the data is necessary).

distribution %

(NB: It is easy to see that the distribution is much more fat-tailed (wider) than the normal distribution, in which 99% of the sample exists within three standard deviations of the mean. This is a function of regressing to the proposition-specific (ie team) average rather than the league average in calculating xINT.)

What this table describes is an estimate of the percentage of the sample of total interceptions that lies within certain values of the mean.

So, in the case of the Tennessee Titans, our projected mean value (xINT) for total interceptions in 2012 is 16.0. Assuming a symmetrical distribution around this mean (a faulty premise, but convenient for now), the table tells us that 13.3% of the possible outcomes of total interceptions lies between 16.0 and the bookmaker's Over/Under quote of 14.0.

So, if we place a bet on total interceptions for the Tennessee Titans to go over the total of 14, we can expect to win 50% + 13.3% = 63.3% of the time.

The bookmakers in the table offering the Over/Under quote of 14 total interceptions at the time of writing were Boylesports and Youwin. Both were citing odds of 20/23 associated with the proposition which represents their profit margin or, as it would be referred to in the US, the "juice" or "vig(orish)".

Odds of 20/23 represent roughly 53.5%, so to make a profit on our choice of the Over/Under of 14 total interceptions we need to be right more than 53.5% of the time. In this case, our projection says that the total will go over 63.3% of the time.

5.2 Calculating Expected Value (EV)

The Expected Value (EV) of a proposition is the profit or loss which will accrue as the number of trials of the event tends to infinity. In other words, it is the calculable edge at the given odds which can be expected when the estimate of the event's likelihood is correct.

In the case of betting over the total of 14 interceptions for the Tennessee Titans in 2012, if our estimate of 16 is correct then:

63.3% of the time we would win 20/23 units
36.7% of the time we would lose 1 unit

So, our EV =>    63.3/100 * 20/23 =  0.550
minus                 36.7/100 * -1       = -0.367
equals                                            = +0.183

Assuming our estimate is correct, we can expect to win .183 units for every unit stakes. Our EV on the bet is +18.3% or £18.30 for every £100 staked.

* NB: all decisions - whether involving financial considerations or not - can be evaluated in the same manner. We can think of the units we win and lose as utility in respect to their effect on our life.

5.3 Diminishing Marginal Value (or Utility*)

If you look again at the table which describes the distribution of total interceptions around our notional mean, you should recognise a familiar economic principle. For every extra interception we can project over or under the bookmaker's quoted value, we receive a smaller edge.

One interception is worth 6.8% but two interceptions are only worth 13.3%, or 6.65% each. Three interceptions are worth only 6.4% each and so on.

This is a massively important consideration in all forms of gambling and investment. It will be extremely familiar to you if you have studied, or understand, entry level economics:

As your prediction becomes increasingly outlandish, you receive progressively less value for making it.

The worth of each extra interception is also likely to be reduced the further away from the mean of the overall sample we are making our projection.

This is because, as we depart to values someway from the mean, the actual shape of the distribution is not symmetrical. It can't be.

In the 2007-2011 NFL seasons, for instance, the average number of total interceptions was 15.9. While there is a very small chance that a team could have 20.1 more total interceptions than this (36) there is no chance that a team could have 20.1 less total interceptions (-4.2). For obvious reasons, it is not possible to have fewer than zero interceptions.

It is therefore likely that projecting total interceptions under a quote which is itself less than the mean will result in rapidly diminishing returns. (Projecting total interceptions more than a quote itself higher than the mean is also undesirable, but it is less undesirable.)

5.4 Putting it all together

Using our projections of xINT - expected total interceptions for 2012 - the corresponding EV (Expected Value) for each Over/Under proposition is listed in the table. Note that the calculations do take into account the population distribution as it actually exists (rather, as is likely to do in 2012).

Team xINT  O/U  Diff Odds-1 EV%
New Orleans Saints 15.6 12.5 3.1 0.87 +30.5
New England Patriots 18.8 22.0 3.2 0.85 +28.5
Chicago Bears 17.3 20.0 2.7 0.85 +23.6
Baltimore Ravens 19.4 16.5 2.9 0.83 +23.3
Philadelphia Eagles 16.2 18.5 2.3 0.87 +20.9
Dallas Cowboys 15.8 18.0 2.2 0.85 +18.6
Arizona Cardinals 16.1 14.0 2.1 0.87 +18.6
Tennessee Titans 16.0 14.0 2.0 0.87 +17.4
Denver Broncos 15.3 13.5 1.8 0.87 +14.7
Buffalo Bills 16.6 18.5 1.9 0.83 +13.5
Oakland Raiders 17.2 15.5 1.7 0.87 +13.0
New York Jets 15.8 17.5 1.7 0.85 +12.6
Kansas City Chiefs 16.9 18.5 1.6 0.87 +12.0
Indianapolis Colts 13.1 11.5 1.6 0.87 +10.7
Jacksonville Jaguars 16.6 15.0 1.6 0.83 +10.0
New York Giants 17.1 18.5 1.4 0.85 +8.3
Carolina Panthers 16.1 17.5 1.4 0.83 +8.0
Detroit Lions 16.7 18.0 1.3 0.85 +7.4
Houston Texans 17.7 16.5 1.2 0.87 +6.6
Cleveland Browns 14.8 13.5 1.3 0.83 +6.2
Green Bay Packers 22.7 24.5 1.8 0.80 +6.0
Miami Dolphins 14.9 16.0 1.1 0.85 +4.9
Minnesota Vikings 12.8 11.5 1.3 0.83 +4.8
San Francisco 49ers 19.4 20.5 1.1 0.85 +3.1
Atlanta Falcons 18.1 19.0 0.9 0.85 +1.7
Washington Redskins 15.2 14.5 0.7 0.87 +1.4
Pittsburgh Steelers 15.3 14.5 0.8 0.80 -1.2
Cincinnati Bengals 14.8 14.5 0.3 0.87 -3.7
Tampa Bay Buccs 15.7 15.5 0.2 0.87 -4.3
St. Louis Rams 14.7 15.0 0.3 0.85 -4.8
Seattle Seahawks 17.7 18.0 0.3 0.85 -5.2
San Diego Chargers 16.5 16.5 0.0 0.87 -6.5

The right-hand column enumerates the expected profit from each proposition, assuming xINT represents the team average total interceptions for the 2012 NFL season reasonably accurately.

Note that the table is ordered by Expected Value expressed this time as a percentage. Note that this ranking is not the same as the difference between xINT and the Over/Under quote.

This is a reflection of the distribution. In the case of the Green Bay Packers, for instance, our projection is a healthy 1.8 total interceptions less than the quote. But both these figures are towards the right-hand extreme of t he distribution where relatively few values exist. So, the number of potential outcomes between xINT and the O/U quote (from where we derive our profit) are relatively few.

Now let's use intuition to cross-check some of the findings from the big difference of opinion with the Over/Under we find with New Orleans (a highly profitable Over bet) and New England (a highly profitable Under) to the agreement - resulting in coin-flip expectation minus the bookmaker's juice - about San Diego.

5.5 The acid test of intuition

You might be surprised to find that I believe strongly that the result of a quantitative analysis should have to tally with my intuition. The Figures Never Lie! after all. But, wait a second, to invest wisely there is a need to have confidence, so the figures should be able to "sell" their argument to me. And I should be able to "sell" it to a third-party like you. Not literally, obviously!

So, let's take the opposite case of our two outlying projections, New Orleans and New England. Does it make sense that the bookmaker's Over/Under projections should be a long way wrong?

The total interceptions for New Orleans over the last five seasons are 9, 9, 26, 15, 13 and the Over/Under quote is 12.5. Well, the first thing to notice is that the Saints have indeed gone Over the total (our side) in three of those five campaigns.

In the last two seasons, their total interceptions of 9 is well Under the total, but we know that there is only a weak correlation in this variable from one season to next. As argued in a previous post, there is a lot of randomness at work.

Moreover, the passes defended totals of 99, 77, 109, 106 and 86 suggest that the New Orleans defense has the underlying skills to make significantly more than 12.5 interceptions in 2012. But what about their scheme?

One reason that the Saints have not met expectations in making interceptions is their defensive coordinator for the past three seasons, Greg Williams. Now suspended for his part in the infamous 'Bountygate' scandal (the financial incentivising of Saints players to injure opposing players), Williams runs an extremely aggressive scheme with a multitude of blitzing and man-to-man coverage.

Though the Saints defense has gone out of the frying pan into the fire with his replacement the former St Louis Rams HC Steve Spagnuolo, this alone should produce better results, judged by the latter's history when Defensive Coordinator with the New York Giants in 2007-8.

In other words, given the randomness of interceptions, the underlying skills suggested by passes defended and the change in scheme, betting Over the total of 12.5 interceptions for the New Orleans Saints makes a lot of sense.

What about betting the Under of 22 total interceptions about the New England Patriots? Can we really do this with confidence when the architect of their defense is no less than Bill Belichick? Yes we can!

In the last five seasons, the Patriot defense has exceeded the metric Expected Interceptions - that is enjoyed so-called 'interception luck' - by an average of five interceptions per year.

During this period, their total interceptions were 23, 25, 18, 14 and 19. As with the New Orleans example, it is encouraging that the total would have gone Under  this season's quote of 22 more often than not, but the situation is even more promising than that.

New England's passes defended - which shows a stronger season-to-season correlation that total interceptions - are 84, 102, 86, 71 and 91. In other words, they ranked towards the bottom of the league in a statistic which we do know is related to underlying skills because it tends to persist somewhat from one season to the next. (They also ranked 29th of 32 teams in yards-per-pass against, another strong indicator of defensive skills which survives randomness.)

So, why should a defense which by several measures is one of the worst against the pass have the second-highest (behind Green Bay) Over/Under quote? Well, we know the answer to that: because of their recent history.

5.6 Cognitive biases in investment

It is easy to see why fading the recency bias is such a powerful approach in investment strategy. The average NFL fan might buy the idea that New England's total interceptions of 23 for the 2011 season were an aberration, but not when they also intercepted 25 in 2010.

It's true that in some cases like these, the predictor variables - and indeed intuition - are insufficient to understand why there might have been a sudden change to the likelihood of a variable's output. But we can also observe the statistical significance of an event and understand how randomness profoundly affects the world.

Both our top two plays - Over the total of 12.5 for New Orleans and Under 22 for New England - are the result of an understanding that recent outcomes are subject to considerable flux. With knowledge of the shape of the population of all outcomes, and by using metrics and variables which capture the underlying skills which influence them, we can make often make better projections than the market.

I had intended to get into the nuts-and-bolts of staking and portfolio selection in this post but it has gone on too long already. There will now be a Part 6.