Data Analysis

A place to discuss anything.
Post Reply
RicHep365
Posts: 105
Joined: Thu Nov 23, 2017 9:42 am

I've always wondered how much data is enough data when looking for trends in horse betting. I have placed hundreds of thousands of bets over the years and I constantly analyse the data to seek out trends. I'm always finding new interesting patterns, which could be certain odds ranges not performing, bets placed within so many minute of the race not performing, certain courses not yielding much profit etc... But if I ever tweak anything, the trend always seem to turn on it's head in the following 6 months and I end up reverting to the original set of rules.

So how much data is enough data and how can you tell if a trend is worth taking note of?
User avatar
ShaunWhite
Posts: 9731
Joined: Sat Sep 03, 2016 3:42 am

The only data that matters is the right-hand edge, the rest is coulda woulda shoulda. Mean reversion is a formidable opponent. That said I've spent countless hours looking backwards.

That's why I raised the topic of critical analysis because I'm trying to focus my attention on how I react to the market 'as is' rather than trying to predict it from what was. There are reams of papers about modelling and backtesting but almost nothing on how to measure how well you are actually executing these strategies or the basic setups like double tops/bottoms, breakouts, crossovers etc etc.

The two mugs on my desk illustrate the struggle, one is a "I {heart} Spreadsheets", the other says "Don't predict, respond". :?
User avatar
rinconpaul
Posts: 112
Joined: Wed Dec 03, 2014 10:39 pm

5 (10 times is better) the probability of a unique variable!

Break your data up into variables or combinations thereof (field size & rank say?)
e.g: 6 horse field - rank 4 - $10 back price (your unique variable)

Decimal odds of $10 has 10% implied probability. Divide 100% by 10% = 10 (one cycle)
Now multiply by 5 (10 is better) = 50.
You need 50 records of that unique variable to test.

You could cheat a little and group prices by tick size?
e.g: All runner with decimal odds $4.10 to $6.00 have an average implied probability of $5 = 20% implied probability. Divide 100% by 20% = 5 (one cycle)
Now multiply by 10 (as you've cheated) = 50

The higher the odds, the more data you need. In the graph I've plotted the desired minimum number of records using 5x (blue line) against actual number of records for each unique value (orange line).
I could rely on the profit/loss of the price range 4.1 to 4.7 than > 4.7.
You do not have the required permissions to view the files attached to this post.
RicHep365
Posts: 105
Joined: Thu Nov 23, 2017 9:42 am

ShaunWhite wrote:
Wed Jan 24, 2018 2:07 pm
The only data that matters is the right-hand edge, the rest is coulda woulda shoulda. Mean reversion is a formidable opponent. That said I've spent countless hours looking backwards.

That's why I raised the topic of critical analysis because I'm trying to focus my attention on how I react to the market 'as is' rather than trying to predict it from what was. There are reams of papers about modelling and backtesting but almost nothing on how to measure how well you are actually executing these strategies or the basic setups like double tops/bottoms, breakouts, crossovers etc etc.

The two mugs on my desk illustrate the struggle, one is a "I {heart} Spreadsheets", the other says "Don't predict, respond". :?
Unfortunately the algos aren't smart enough to respond (yet) so I just have to try and predict from historical data. All I can do is crunch numbers and say 'what if?' Just spend half my life buried in excel and python script. I find combinations of settings that feel too good to be true, in which case I'm probably back fitting too much. It's tough to find the balance.
RicHep365
Posts: 105
Joined: Thu Nov 23, 2017 9:42 am

rinconpaul wrote:
Wed Jan 24, 2018 8:07 pm
5 (10 times is better) the probability of a unique variable!

Break your data up into variables or combinations thereof (field size & rank say?)
e.g: 6 horse field - rank 4 - $10 back price (your unique variable)

Decimal odds of $10 has 10% implied probability. Divide 100% by 10% = 10 (one cycle)
Now multiply by 5 (10 is better) = 50.
You need 50 records of that unique variable to test.

You could cheat a little and group prices by tick size?
e.g: All runner with decimal odds $4.10 to $6.00 have an average implied probability of $5 = 20% implied probability. Divide 100% by 20% = 5 (one cycle)
Now multiply by 10 (as you've cheated) = 50

The higher the odds, the more data you need. In the graph I've plotted the desired minimum number of records using 5x (blue line) against actual number of records for each unique value (orange line).
I could rely on the profit/loss of the price range 4.1 to 4.7 than > 4.7.
Interesting, this feels like to small a sample to me, I will produce this from my stats and see how it stacks up. Cheers.
Post Reply

Return to “General discussion”