Friday, March 14, 2014

Statistics & Garbage

This article by Barry Meadow appeared in the December edition of Horseplayer Monthly. To read the March 2014 issue with 32 pages of handicapping interviews and insight for free, please click here. 

Barry Meadow has spent more than 30 years in the gambling world. He wrote his first book, Success at the Harness Races, in 1967. He's also written Money Secrets at the Racetrack, which has been lauded as the definitive guide to money management at the track. Meadow's eclectic resume includes serving in Vietnam, writing television sitcoms, playing the professional tennis circuit in India, doing standup comedy in California, and, of course, playing blackjack at the professional level in his spare time. 

A trainer wins 18% first off the claim.  A handicapping system hits 29% winners.  A jockey's year-to-date win percentage is 6%.  Will any of these stats, or others, help your bottom line?  Or will they simply mislead you?

Until William Quirin's Winning at the Races was published in 1979, few handicapping books offered much in the way of statistics, mainly because compiling them was an exercise in tedium.  You'd have to buy every Racing Form, every day, and then go through each race searching for some characteristic you wanted to research.  When you finally found a qualified selection, you'd grab a different Form to check the chart, and then record each result.  Doing even the simplest work took incredible patience, or a staff of unpaid students.

All that changed with the introduction not only of the personal computer, but more recently with the availability of daily downloads.  Now, for just a few dollars a day, anyone can download every past performance line for every horse in the nation, write a simple query, and find out if horses really do yield a flat-bet profit if they return in exactly five days (they don't) or whether you can make money by playing every dropper from a straight maiden into a maiden claimer who showed early speed last out (ditto).

The gathering of horsey data is no longer much of a problem.  Ask the computer a question, and it will spit out answer. 

However, while accumulating data is one thing, interpreting it correctly is something else altogether.  The essential problem is that while ideas should be forward-tested (you state a hypothesis, then test it), many data miners work backwards, falling victim to what is known as "hindsight bias."  They start with already-known results, and then look for patterns that might have contributed to these results.  Typical:  A player notes that many recent winners at his track were dropping in class, so he decides to check the last three months' results.  Sure enough, class droppers did well, but because the survey includes the recent results that he already knows, his sample will be skewed. 

Let's look at some basic principles.  Understand these, and you won't be misled by handicapping stats:

* The larger the sample size, the more likely will the percentages be accurate.  Conversely, anything goes when looking at tiny sample sizes.

* The less often a result occurs and the higher the payoffs, the greater the sample size you need to measure the validity of the idea.

* Unlike groups cannot be lumped together: 3-5 shots cannot be lumped in with 7-1 shots.              

* Check the actual number of plays, not simply the number of races investigated to obtain those plays.

* Rules that appear arbitrary (horse's last race must have taken place within the past 21 days, horse must go off at odds of 5-1 or above, etc.) indicate that the system came from back fitting with the arbitrary rules added to get rid of a bunch of losers.

* Whenever an idea has been developed from one set of results, it must be tested on a completely separate group of results.

* Once a result has been proven (e.g., coin flips win 50%), you can use a statistical formula known as standard deviation to predict the range of results; however, if a result
is merely recorded and not proven, you cannot accurately predict the range of results since you do not know whether the result is typical or atypical.

* Return-on-investment statistics are often skewed by a handful of longshot winners--sometimes even by one such winner. 

* Any study of race results should look at what the usual results are for the particular odds category, and compare the usual ratio of wins, places, and shows to the results in question. 

* Streaks, both positive and negative, often happen for no reason other than the statistical fluctuations that are part of any long mathematical series

Whenever you see a handicapping statistic, ask these questions:

1. Could it be false?  

Years ago, betting every favorite lost only half the track take.  However, my own survey of 400,000 more recent favorites showed conclusively that you would lose the full track take by betting every favorite today.  Yet some authors still continue to mistakenly tell their readers that the old stat is still valid.

2. Who says so?  

A man touting his own system might tell you that it had an ROI of 37% last year at Belmont.  Nice (if it's true), but what about every other track?  Did it lose everywhere except Belmont?  Often, it's the information that isn't being revealed that it is the most revealing. 

3. How many plays were there?  

A sample size of 1,000 plays for a system whose average winning payoff is $24 is just about useless.   If a guy tells you he bet 417 longshots last year and showed a 15% profit, don't be surprised if he does the same this year and shows a 30% loss.  

4. How was the number derived? 

Who compiled the numbers?  How far back?  Which tracks?  What were the odds?  What was the 1-2-3 record, and what was the expected 1-2-3 record for horses at those odds?  

5. If an ROI figure is not included, is the number of any use? 

If a stat has an impact value of 2.3 (horses with characteristic win 2.3 times their fair share of races), that's good--but if they average a $3.80 payoff, who cares?

6. If an ROI figure is included, how many plays is it based on, and did a few big payoffs skew the results?

A 500-play report that shows a 7% profit is worthless if its two biggest winners accounted for all the profit.

7. Is it possible that the result is simply a fluke? 

If horses from post 6 showed a net profit for a particular meeting but posts 5 and 7 were losers, it's likely the result is nothing more than a statistical anomaly. 
8. Have others, using different races, found similar results?

If you based a method on the results of certain races, you need to test it on different races - as many as possible.  Better yet, have somebody else test it.

9. Is there evidence that the tested factor was more successful than can usually be expected, less so, or about average? 

That includes not only the win percentage, but whether the prices were better or worse than usual.

These are starter questions.  If you really want to get serious about the subject, study books like How to Lie with Statistics (Darrell Huff), Fooled by Randomness (Nassim Nicolas Taleb), Innumeracy (John Allen Paolos) and Statistics for Dummies (Deborah Rumsey). 

Don't believe everything you read - even if it's got a number attached.

No comments: