Data Analysis

RicHep365 · Wed Jan 24, 2018 11:34 am

I've always wondered how much data is enough data when looking for trends in horse betting. I have placed hundreds of thousands of bets over the years and I constantly analyse the data to seek out trends. I'm always finding new interesting patterns, which could be certain odds ranges not performing, bets placed within so many minute of the race not performing, certain courses not yielding much profit etc... But if I ever tweak anything, the trend always seem to turn on it's head in the following 6 months and I end up reverting to the original set of rules.

So how much data is enough data and how can you tell if a trend is worth taking note of?

ShaunWhite · Wed Jan 24, 2018 2:07 pm

The only data that matters is the right-hand edge, the rest is coulda woulda shoulda. Mean reversion is a formidable opponent. That said I've spent countless hours looking backwards.

That's why I raised the topic of critical analysis because I'm trying to focus my attention on how I react to the market 'as is' rather than trying to predict it from what was. There are reams of papers about modelling and backtesting but almost nothing on how to measure how well you are actually executing these strategies or the basic setups like double tops/bottoms, breakouts, crossovers etc etc.

The two mugs on my desk illustrate the struggle, one is a "I {heart} Spreadsheets", the other says "Don't predict, respond".

rinconpaul · Wed Jan 24, 2018 8:07 pm

5 (10 times is better) the probability of a unique variable!

Break your data up into variables or combinations thereof (field size & rank say?)
e.g: 6 horse field - rank 4 - $10 back price (your unique variable)

Decimal odds of $10 has 10% implied probability. Divide 100% by 10% = 10 (one cycle)
Now multiply by 5 (10 is better) = 50.
You need 50 records of that unique variable to test.

You could cheat a little and group prices by tick size?
e.g: All runner with decimal odds $4.10 to $6.00 have an average implied probability of $5 = 20% implied probability. Divide 100% by 20% = 5 (one cycle)
Now multiply by 10 (as you've cheated) = 50

The higher the odds, the more data you need. In the graph I've plotted the desired minimum number of records using 5x (blue line) against actual number of records for each unique value (orange line).
I could rely on the profit/loss of the price range 4.1 to 4.7 than > 4.7.

RicHep365 · Wed Jan 24, 2018 9:36 pm

ShaunWhite wrote: ↑
Wed Jan 24, 2018 2:07 pm
The only data that matters is the right-hand edge, the rest is coulda woulda shoulda. Mean reversion is a formidable opponent. That said I've spent countless hours looking backwards.

That's why I raised the topic of critical analysis because I'm trying to focus my attention on how I react to the market 'as is' rather than trying to predict it from what was. There are reams of papers about modelling and backtesting but almost nothing on how to measure how well you are actually executing these strategies or the basic setups like double tops/bottoms, breakouts, crossovers etc etc.

The two mugs on my desk illustrate the struggle, one is a "I {heart} Spreadsheets", the other says "Don't predict, respond".

Unfortunately the algos aren't smart enough to respond (yet) so I just have to try and predict from historical data. All I can do is crunch numbers and say 'what if?' Just spend half my life buried in excel and python script. I find combinations of settings that feel too good to be true, in which case I'm probably back fitting too much. It's tough to find the balance.

RicHep365 · Wed Jan 24, 2018 9:48 pm

rinconpaul wrote: ↑
Wed Jan 24, 2018 8:07 pm
5 (10 times is better) the probability of a unique variable!

Break your data up into variables or combinations thereof (field size & rank say?)
e.g: 6 horse field - rank 4 - $10 back price (your unique variable)

Decimal odds of $10 has 10% implied probability. Divide 100% by 10% = 10 (one cycle)
Now multiply by 5 (10 is better) = 50.
You need 50 records of that unique variable to test.

You could cheat a little and group prices by tick size?
e.g: All runner with decimal odds $4.10 to $6.00 have an average implied probability of $5 = 20% implied probability. Divide 100% by 20% = 5 (one cycle)
Now multiply by 10 (as you've cheated) = 50

The higher the odds, the more data you need. In the graph I've plotted the desired minimum number of records using 5x (blue line) against actual number of records for each unique value (orange line).
I could rely on the profit/loss of the price range 4.1 to 4.7 than > 4.7.

Interesting, this feels like to small a sample to me, I will produce this from my stats and see how it stacks up. Cheers.

Data Analysis

Login • Register