Friday, 22 March 2013

Towards a dynamic approach to modelling

Many conclusions drawn about sports from performance data simply do not stand up to the attempt to project or predict. There are numerous reasons for this, most of which are related to improper use of statistics in the first place. (There are belting examples in horseracing journalism and broadcasting, from trends analysis to trainer form. The gold medal even for retrodictive prediction goes to Phil Smith for his linear manipulation of different-shaped ratings populations in the exercise which purported to serve as evidence that Frankel was superior to various champions of previous generations.)

Once of the most common reasons for the failing of otherwise more technically correct studies is the savage beast of nonlinearity. In a basketball game, for instance, a team averages more points as its offensive rebounds increase, but the more it attacks the boards in search of a second-chance opportunity, the more its transition defense tends to suffer (Wiens et al., 2013). So, small changes in the way a team plays can cause much larger deviations in its overall effectiveness. If you know anything about chaos theory, you will recognise the potential result of this kind of trade-off as an example of bifurcation (the loss of a system's equilibrium as a result of a sometimes small change in one of its parameters).

In football - a similar, fluid, multi-state game to basketball - a team must commit various changing proportions of its resources to the related concerns of attack and defence. In the early stages of a game between two roughly equal sides, for instance, it makes sense for managers to adopt a fairly conservative approach. In game-theoretical terms, a manager like Sam Allardyce may find that the optimal strategy is a minimax solution; that is, to play in such a way which maximises the chance of a marginal gain in win likelihood. (Finding such a minimax solution may be easier from the stands than the dugout.)

As the game progresses, a team must adapt the distribution of attacking and defensive resources in accordance with the strategic demands of the situation. According to some studies, managers are good at doing this, as teams who have conceded the first goal score an equalizer at a rate significantly higher than might be expected.

A team leading in the closing stages of a game often chooses to maximise its chance of winning by playing more defensively. In so doing, it chooses to sacrifice overall efficiency in order to minimise goals conceded, a counter-strategy to the opposing team adopting the opposite, attacking approach. Even when the score is tied, teams often change strategies, and adapt to their opponents' choices, in order to improve the desired outcome (which, in a league with no salary cap, draft or luxury tax like the Premiership is often not a win).

So, the result of a football (or basketball) game implies different things according to the prior distribution of events which disturb the equilibrium. A 1-0 win with the goal in the first minute says something different about the two teams than a 1-0 win with the goal in the last minute, even though football modellers will nearly all use the outturn in the same way. When the goal is scored in the first minute, the winning team has exhibited a very strong defence, for it is likely that the opponent would have adopted an increasingly attacking strategy, forcing more of the leading team's resources to be used defensively. At the same time, however, a 1-0 win with a goal in the first minute argues that the winning team is not very good in transition (the vital seconds after a change in possession when a large proportion of goals in football are scored) because the ensuing 89 minutes should have provided many chances to score on the break.

To beat an increasingly sophisticated betting market by the widest possible margin, it is my contention that a sports modeller has to be able to differentiate between these two types of games, and to understand their implications for future performance.

These considerations are some of the causes of the nonlinearity of performance which regression models based on statistics like 'average this' and 'total that' cannot hope to capture - even when the models themselves use logistic or exponential terms.

Instead, reality argues for a different approach to sports modelling - to capture data specific to the context in which the game was actually played, rather than to regress final statistics on winning percentage with all the assumptions which this entails. My view is that such an approach either won't get the money, or won't get it for long. A different, more dynamic approach is called for. More of that tomorrow.

To Crash or Not To Crash: A quantitative look at the relationship between offensive rebounding and transition defense in the NBA (Jenna Wiens, Guha Balakrishnan, Joel Brooks, John Guttag. Massachusetts Institute of Technology, 2013.)