Sample size for backtesting

gstar77 · Mon Jun 26, 2023 10:00 am

Hello all,

When working on back testing a system on the dogs what would be suitable sample size? I’ve collected data for the last 2000 races is this suitable or is this considered too small a sample?

Thanks

jimibt · Mon Jun 26, 2023 10:18 am

gstar77 wrote: ↑
Mon Jun 26, 2023 10:00 am
Hello all,

When working on back testing a system on the dogs what would be suitable sample size? I’ve collected data for the last 2000 races is this suitable or is this considered too small a sample?

Thanks

Altho I no longer participate in the markets, I can say from experience that 2k of greyhound races is a pretty small sample. Given that the UK produces around 120 races a day, this would equate to 16-20 days of races. Take into consideration also that these 120 races a day are split across a number of classes and you quickly see that you are probably only looking at 250 A5 races in your sample (for example).

A high number of dedicated grey traders are probably using historical data going back at least 3 years (90-100k of races), and altho it sounds like overkill, you'll soon see that more really is more when it comes to flattening out localised variances. that said, it all depends on what you are looking for in your sample data.

horses for courses (or rather dogs for tracks

)

[edit] - in my new focus (forex and indices), I sample back 5 years of Raw tick data and make various models using an 80% in sample/20% out of sample model. I run these optimisations and position the out of sample data at various positions on the timeline to ensure that i'm not over fitting my data on any parameter set. it really is a mix of art and science, but (for me) is more importantly about having as rich a dataset as possible in order to ensure that my backtested results produce a reasonable expectancy when it hits the live market.

[edit 2] - for all my trying, I was never able to come up with a scalable and accurate model for greys. I tried so many approaches and none stuck. That's not to say that it's impossible, but it is possibly the toughest challenge that i ever undertook. i went down to form, section level and still came up with zilch.. that said, maybe i was being too anal!!

gstar77 · Mon Jun 26, 2023 11:20 am

Hi Jimibt

Thanks for your reply, yes I was thinking that it was on the small side. At the moment I’m running a bot to collect the price on the fav grey 10 seconds before the off. I’m thinking if there’s a way to swing trade the natural variances in the markets? My thinking is that the results on backing the fav will meander above and below the zero profit line so is there an opportunity to say back all favs at the bottom of the curve and bet it back up towards the zero profit line?

I’m not explaining it very well I know but have you tested this approach?

Thanks

jimibt · Mon Jun 26, 2023 12:37 pm

gstar77 wrote: ↑
Mon Jun 26, 2023 11:20 am
Hi Jimibt

Thanks for your reply, yes I was thinking that it was on the small side. At the moment I’m running a bot to collect the price on the fav grey 10 seconds before the off. I’m thinking if there’s a way to swing trade the natural variances in the markets? My thinking is that the results on backing the fav will meander above and below the zero profit line so is there an opportunity to say back all favs at the bottom of the curve and bet it back up towards the zero profit line?

I’m not explaining it very well I know but have you tested this approach?

Thanks

no, i'm afraid not. i tended to look for value bets based on the price at (for example) 3 mins out and then seeing what had happened at the 60 second mark. the problem from memory was that the liquidity and spread were quite poor until the final 60 seconds, so the dynamic of spread had to be pulled into the mix. as i said, i made a bit of a dogs ear of my modelling on greys as i tried to incorporate SOOO many variables. potentially limiting down to a small sub set of parameters would be the way to go.

Altho i looked at price over time, i didn't drill down to the granularity that you are looking at, purely because i wanted to 1st establish a baseline of change inside the bigger timeframes. maybe your 10 second gapping could be show the variations you anticipate, you certainly won't do any worse than i did!!

I was also using the api direct, rather than via BA, so potentially my focus was a bit different too.

ShaunWhite · Mon Jun 26, 2023 12:44 pm

You should collect as much data as you can for each race. Eg if you're just collecting the 10s price on the fav then if/when your idea fails you're back to square 1 for the next idea. And are you collecting the available volume too? Eg your backtest will be misleading if the best price only had 50p available and you had to go to 2nd best to get a tenner on.

Sizewise then 6 months is a decent amount, esp after it's been split into training and testing data. But you can rule out the things that are really bad with a month or two. I think it's worth you investing in a couple of months of advanced historic data to get an idea of the info that's available. For instance that will give you the entire traded ladder for every dog for the whole time the market is open, and the entire sp ladder too.

As Jimi alluded too there a lot of dedicated dog traders, all using the price data, and straightforward edges on the fav aren't easy to come across and margins are tiny. I've got 5 years of the entire api stream and a simulator and I can't see anything simple based on things like you're describing. You aren't trying to beat the market you're trying to beat everyone in the market so unless you get tooled up to the extent they are then you'll aways struggle to get your £1 out of the very few £s that have to be shared by everyone.

That's not meant to be demoralising, just realistic otherwise you'll spend the next 6 months on something that's a waste of time. But spend 5 months getting seriously well equipped, and month 6 will see you in a position to be competitive.

jimibt · Mon Jun 26, 2023 2:17 pm

ShaunWhite wrote: ↑
Mon Jun 26, 2023 12:44 pm
... You aren't trying to beat the market you're trying to beat everyone in the market

if ever there was a t-shirt or tattoo that captured the quest, that's it!!

gstar77 · Mon Jun 26, 2023 7:14 pm

Thanks all for your replies

ShaunWhite · Mon Jun 26, 2023 8:09 pm

jimibt wrote: ↑
Mon Jun 26, 2023 2:17 pm

ShaunWhite wrote: ↑
Mon Jun 26, 2023 12:44 pm
... You aren't trying to beat the market you're trying to beat everyone in the market
if ever there was a t-shirt or tattoo that captured the quest, that's it!!

It's very Brad Goodman, like "Don't be a human being, be a human doing".

Screenshot_20230626_200517_Google.jpg

chasgeez · Thu Jul 06, 2023 9:15 pm

whats the best way to get historical BSP and non of the other data? Currently trying to do a sample of a years worth of graded racing (ignoring the crazy sprints though!!)

Dallas · Thu Jul 06, 2023 9:25 pm

chasgeez wrote: ↑
Thu Jul 06, 2023 9:15 pm
whats the best way to get historical BSP and non of the other data? Currently trying to do a sample of a years worth of graded racing (ignoring the crazy sprints though!!)

https://promo.betfair.com/betfairsp/prices

chasgeez · Thu Jul 06, 2023 11:00 pm

Dallas wrote: ↑
Thu Jul 06, 2023 9:25 pm

chasgeez wrote: ↑
Thu Jul 06, 2023 9:15 pm
whats the best way to get historical BSP and non of the other data? Currently trying to do a sample of a years worth of graded racing (ignoring the crazy sprints though!!)
https://promo.betfair.com/betfairsp/prices

Thank you fingers crossed bsp doesn’t crush my plans

Sample size for backtesting

Login • Register