Wednesday, 27 March 2013

Nonlinear dynamics of racehorse ratings: how upside can dominate exposed ability

Recall that 50,000 simulations of a race between five similarly exposed older horses produced the following results:
 
Horse
Rating
W
L
%
Anchovy
90
16307
33698
32.6
Butler
89
12536
37469
25.1
Coconut
88
9553
40452
19.1
Donkey
87
6740
43265
13.5
Einstein
86
4869
45136
9.7

The relationship between a horse's rating and its winning percentage in this framework can be seen as nonlinear. Start at the bottom with Einstein: if his trainer could improve him by one point, his chance of winning would go from 9.7% to 13.5%, the same as Donkey. We say there is a marginal improvement of 3.8% at a marginal cost of 1 point.

The next marginal change of 1 point takes Einstein from Donkey to Coconut, as it were. Reading from the table this is 19.1% minus 13.5% or 5.6% - a bigger marginal improvement than from the step from Einstein to Donkey.

And so, in this example, there is an exponential growth in the marginal benefit to a horse's chance for every marginal point that it improves. But there is obviously an upper limit, a bound, to this imposed by the fact that no horse can have a greater than 100% chance of winning.

So, let's run the simulation again, holding all things equal expect that the difference between horses is now 2 points rather than 1 point. Coconut is still the same 88 horse but his rivals are spread out more in terms of ability; there is more variance in their exposed merit, in other words:

Horse
Rating
W
L
%
Anchovy
92
19317
30688
38.6
Butler
90
13265
36740
26.5
Coconut
88
8642
41363
17.3
Donkey
86
5439
44566
10.9
Einstein
84
3342
46663
6.7

Compare the two tables, there is a significant difference in winning percentages at the extremes: Anchovy has seen his chance improve from 32.6% to 38.6% while Einstein has seen his deteriorate from 9.7% to 6.7%.
Compared with Coconut - the horse who owns an average rating in this simulation - Anchovy had a 13.5% better chance in the first example (32.6% minus 19.1%) but has a 21.3% better chance (38.6% minus 17.3%) this time. But, as neoclassical economics dictates, let's look at the changes in percentage chance of success at the margin.
In the first case, the 13.5% better chance came at a marginal cost of two points (from Coconut's rating of 88 to Anchovy's rating of 90). So, the marginal benefit of one point is 6.25%.
In the second example, however, the marginal improvement of 21.3% (Anchovy's 38.6% minus Coconut's 17.3%) came at a marginal cost of four points. So, the marginal benefit of one point is only 5.325%. 
I want to flesh out this important point more. So, let's now increase the difference between horses to 5 points, so now poor old Einstein is 20lb inferior to Anchovy. (Perhaps the latter just joined Satish Seemar in Dubai.)
Horse
Rating
W
L
%
Anchovy
98
27652
22353
55.3
Butler
93
13533
36472
27.1
Coconut
88
5876
44129
11.8
Donkey
83
2232
47773
4.5
Einstein
78
712
49293
1.4

Now, Anchovy is 5-4 on favourite and Einstein is just about a 100-1 poke. This example perhaps resembles a Group race whereas our first simulation could serve as a proxy for a handicap. Let's re-examine the marginal benefit of improving from Coconut (rating 88 again) to Anchovy (now rated 98).

The marginal improvement is 43.5% at a marginal cost of 10 points. So, the marginal benefit of one point is now only 4.35%. It is obvious that the more spread out the ratings become, the less benefit there is to a horse's chance from a marginal improvement of one point. Hopefully, this is exactly what most punters would intuitively understand.

However, it is important to remember we are dealing with five horses who have exactly the same potential for improvement. If you remember the last blog, I noted that, in some situations where the sub-populations of horses are not connected - such as three-year-olds and older horses on the Flat, Dubaian form and European form, or novice hurdlers in Ireland and British handicap hurdlers -  a horse's exposed ability is negatively correlated with its potential for improvement. (This was rephrased in several of the superb Twitter responses I received subsequently in the more familiar terms of horses being exposed/unexposed.)

So, now let's give one of our horses, Butler, greater upside than its rivals. To do this mathematically from my original sample of the Beyer speed figures of older horses in the US, I simply divided the population into sub-populations of horses who had a difference in average starts of five and recalculated the parameters of the distribution from which the Monte Carlo simulation makes a random draw. (For the technically minded, this second population had a fatter right-tail and a lower peak, representing a greater chance of ratings distant from the mean and a higher standard deviation).

Now, Butler belongs to this lighter-raced group and Anchovy, Coconut, Donkey and Einstein remain as they were. Let's go back to the first example, the handicap-style encounter between closely matched rivals in which their median ratings are separated by only a point. Here is another reminder of that initial out turn of 50,000 simulations:

Horse
Rating
W
L
%
Anchovy
90
16307
33698
32.6
Butler
89
12536
37469
25.1
Coconut
88
9553
40452
19.1
Donkey
87
6740
43265
13.5
Einstein
86
4869
45136
9.7

Now, look what happens when Butler belongs to a cohort of horses who have raced five less times on average:

Horse
Rating
W
L
%
Anchovy
90
14832
35173
29.7
Butler
89
15797
34208
31.6
Coconut
88
8628
41377
17.3
Donkey
87
6255
43750
12.5
Einstein
86
4493
45512
9.0

 ...with the following placings:

1
2
3
4
5
14832
11507
8438
7967
7261
15797
9278
8187
8482
8261
8628
10990
10646
9925
9816
6255
9839
11306
11248
11357
4493
8391
11428
12383
13310

 ...producing the following parameters:
AVG
STD
MAX
MIN
MEDIAN
87.4
7.90
108
48
90
87.3
8.45
115
40
89
85.4
7.98
106
37
88
84.4
7.83
107
39
87
83.4
7.89
104
40
86
Butler is now the favourite! I hope you find this as thought-provoking as I did the first time I ran these numbers. Leaving aside the particular dynamics of the older horse population in the US on dirt, the most important takeaway point is as follows:

Anchovy has a higher rating than Butler going into the race (90 to 89) and the difference in their median figures is this same 1 point in 50,000 races. Anchovy's average rating (87.4) is also still higher than Butler's (87.3) but because the latter has more upside (represented by a higher standard deviation of his performance level (8.45 to 7.90) and a higher peak effort (115 to 108) in rare situations, he is actually more likely to win a race between them (making all the same assumptions I laid out in the last blog).

Remember, the change to Butler's distribution of expected ratings was not dramatic in and of itself. Here are some situations in British racing which would represent much bigger likely swings between horses that have had different numbers of starts or opportunities to achieve their best ratings:

  • Two horses in a maiden when one has run three times and the other once
  • Nursery handicaps in which the top weight has beaten horses with low ratings while its opponents have been beaten by rivals with bigger ones
  • Irish novice hurdlers against British ones that have run in the Betfair Hurdle against exposed handicappers
  • Dubai-trained horses who have thrashed locals and are now facing fancy outsiders with big figures in Group 1 races round the world

 I'm certain you can list many other situations in which the opportunity cost of achieving a rating has not been discounted before projecting a horse's chance.

If you were confused by what I was saying about My Tent Or Yours, no implication was being made about his Betfair Hurdle form or the strength of the Supreme Novices' form or whether you would go skint laying horses who were clear top-rated at the Festival (whether you go skint has far less to do with your ability than your temperament, in any case.)

Instead, the point is that if you were trying to create a computerised SP and one of your inputs was (hopefully) a horse's form, there must be a term in the equation representing opportunity cost, so that the likely distribution of all its future ratings reflects the importance of prior opportunity. It is not an overcomplication, it is an extremely important correction to an oversimplification of some who make bogus inferences from figures, especially on the television.

What we are working towards here is a mathematical understanding of the importance of upside compared with exposed ability. Many of you whom I have met over the years have an extremely good intuitive grasp of this nonlinear dynamic, one ability which makes them a superb punter. My brain does not work like yours, unfortunately, in that I have to quantify my edge before having a bet.

I do not trust my instincts, partly because the single most dominant influence over my cognitive development was my maternal Grandfather who impressed on me the beauty of numbers and that everything in life - even those things assumed to be the preserve of the aesthete - were underpinned by quantitative processes. In some ways, he has made my life hell but mostly I feel lucky to have been inspired in this way. And, my mantra is that most things which come from your superior intuition are amenable to a test for cognitive bias. Or, as in the case of my friend Tom Segal, just evidence of a brilliant, intuitive mind.