Am I fooling myself?

The sport of kings.
User avatar
boony
Posts: 6
Joined: Mon Nov 07, 2016 8:53 pm

So, I'm another couple thousand races processed and my analysis brings me back to my original concern as to whether it's a legit process to filter out courses.
Untitled2.png
Blue line is betting everything. Orange line is betting everything except races at the courses I decided were poor after looking at results after ~3300 races. Grey line is a new set of excluded courses which I determined looking at the results after ~5300 races.

After the first 3300 races, you can see that orange line moved about but ultimately flat-lined over the next 2000 races.

Obviously the grey line looks a lot better, but what will happen over the next 2000 races? Will it flat-line again and force me to come up with another set of course exclusions to make it profitable?

It seems to me that if the filters I come up with have no relevance on future races then they're pointless and this whole process of back-testing and applying filters is flawed.
You do not have the required permissions to view the files attached to this post.
CallumPerry
Posts: 575
Joined: Wed Apr 19, 2017 5:12 pm
Location: Wolverhampton

I personally think at this stage you just go in with small stakes and try it out. Otherwise you run the risk of over-fitting. That's looks as good of an indicator to give it a go with a couple of quid you don't mind losing as anything I can imagine. Track everything when you go live and then after a few hundred markets you should see whether it looks promising, after a few thousand if it still works you've cracked it!

May I ask, what programmes do you lot use to back test? I'm just using excel. Say I use a one second logger and record all of the key information in the spreadsheet from 20 minutes out until 00:00:00 that's 1,200 rows of data per market. If I record thousand like you lot have and try and get excel to create charts and stuff it would just freeze. I've seen a video before, I think it was on Nigelk's YouTube channel where there was a loading bar. Is this something you lot use or is it a completely different programme? Point me in the right direction please!
User avatar
boony
Posts: 6
Joined: Mon Nov 07, 2016 8:53 pm

CallumPerry wrote:
Sun Sep 16, 2018 7:07 pm
I personally think at this stage you just go in with small stakes and try it out. Otherwise you run the risk of over-fitting. That's looks as good of an indicator to give it a go with a couple of quid you don't mind losing as anything I can imagine. Track everything when you go live and then after a few hundred markets you should see whether it looks promising, after a few thousand if it still works you've cracked it!

May I ask, what programmes do you lot use to back test? I'm just using excel. Say I use a one second logger and record all of the key information in the spreadsheet from 20 minutes out until 00:00:00 that's 1,200 rows of data per market. If I record thousand like you lot have and try and get excel to create charts and stuff it would just freeze. I've seen a video before, I think it was on Nigelk's YouTube channel where there was a loading bar. Is this something you lot use or is it a completely different programme? Point me in the right direction please!
I've written a suite of programs.

First one is the Logger; every half second it logs pretty much everything it gets from the Betfair API starting at 30 mins pre-scheduled off, until the race is suspended after going in-play. So the amount of data is vast - hence slow to process.

Then I have the Simulator that mimics the Betfair API and can replay the data logged by the Logger.

Finally a Strategy-Runner calls the Simulator as though it was Betfair to get data and allows me to test strategies. The strategy receives the data one row at a time, does the processing and then places the bets. The bet matching is simulated to mimic real-life as closely as possible. My simulator outputs a results file that I load into excel to do charts and other stuff.

I've tested loads of strategies but have only managed to get one that is profitable so far- I'm hoping this one will be my second. The beauty is that once the strategy is coded, it's a one-line code change to make it use the real API .. and the 'bot' is born :)
User avatar
Dublin_Flyer
Posts: 688
Joined: Sat Feb 11, 2012 10:39 am

Have you re-ran and included the orange line including the bad races to see was the recent flat line an arbitrary/seasonal fvck up? This month and next month are notorious for weird results with the changeover from flat to NH and likewise in April/May from NH to flat.

5500 races is only about 3 or 3.5 months because of summer racing quantities, could be worth running it a good while longer to see if the present streak is a seasonal change/aberration. The last 250 or so downhill streak only equates to about a 7 or 8 day period when there's 35-45 races daily, everyone has a bad week, prior to the last week it's growth was significant for the number of races involved.

I'm of the view that backfitting your system if you have a logical reason to do so is ok, it's when you start superbackfitting that it screws up.
CallumPerry
Posts: 575
Joined: Wed Apr 19, 2017 5:12 pm
Location: Wolverhampton

That is seriously impressive stuff boony... not going to lie some people's knowledge and skills on this forum frightens me and my basic little robots. It's like I'm designing a really good paper airplane that is being thrown into a hurricane when other traders have made military standard fighter jets with their own hands.

Hope all of your work pays off for you!
foxwood
Posts: 390
Joined: Mon Jul 23, 2012 2:54 pm

Dublin_Flyer wrote:
Sun Sep 16, 2018 8:37 pm
Have you re-ran and included the orange line including the bad races to see was the recent flat line an arbitrary/seasonal fvck up? This month and next month are notorious for weird results with the changeover from flat to NH and likewise in April/May from NH to flat.

5500 races is only about 3 or 3.5 months because of summer racing quantities, could be worth running it a good while longer to see if the present streak is a seasonal change/aberration. The last 250 or so downhill streak only equates to about a 7 or 8 day period when there's 35-45 races daily, everyone has a bad week, prior to the last week it's growth was significant for the number of races involved.

I'm of the view that backfitting your system if you have a logical reason to do so is ok, it's when you start superbackfitting that it screws up.
+1 for seasonal variations - and variations within season ie start / middle / end are all a bit different as the market works out what this season's runners are like

+1 backfitting ok - all prediction is based on historical lessons - unless you are a seer :D
foxwood
Posts: 390
Joined: Mon Jul 23, 2012 2:54 pm

boony wrote:
Sun Sep 16, 2018 6:35 pm
So, I'm another couple thousand races processed and my analysis brings me back to my original concern as to whether it's a legit process to filter out courses.

It seems to me that if the filters I come up with have no relevance on future races then they're pointless and this whole process of back-testing and applying filters is flawed.
The courses are different - they run different race types and classes with different ranges of typical prices so a strategy that suits top class races may not work at cold and wet Wolverhampton in Winter. All depends on how you are slicing and dicing if course is relevant.

Try deriving a good looking set of parameters from 2500 races. Then apply those parameters to the remaining 2500 to see what they "would have done". You are then seeing the outcome of your back-testing applied to "future" races. Be aware you may be mixing jumps/flat though as Dublin pointed out but the importance of that depends on what your strategy is - may be unimportant or could be vital.
spreadbetting
Posts: 3140
Joined: Sun Jan 31, 2010 8:06 pm

CallumPerry wrote:
Sun Sep 16, 2018 9:13 pm
That is seriously impressive stuff boony... not going to lie some people's knowledge and skills on this forum frightens me and my basic little robots. It's like I'm designing a really good paper airplane that is being thrown into a hurricane when other traders have made military standard fighter jets with their own hands.

Hope all of your work pays off for you!
I used to feel the same but I've been using the same simple techniques and spreadsheets I made years ago and they still work fine. No offence to the data gatherers but they do all seem to assume their skillset/backgrounds are going to crack the markets for them rather than thinking about how and why markets move the way they do.

Your little paper airplane may turn out to be a lot more profitable than some complicated lead balloon so don't assume the war is lost.
User avatar
boony
Posts: 6
Joined: Mon Nov 07, 2016 8:53 pm

Callum thanks for your nice comment but I agree with Spreadbetting... Coding is what I do for a living so it felt natural for me to approach it this way. Given how long it's taking me to get the results compared to others makes me wonder if I haven't taken the wrong path. :)

Hope your bots continue to work
LinusP
Posts: 1871
Joined: Mon Jul 02, 2012 10:45 pm

boony wrote:
Mon Sep 17, 2018 4:54 pm
Callum thanks for your nice comment but I agree with Spreadbetting... Coding is what I do for a living so it felt natural for me to approach it this way. Given how long it's taking me to get the results compared to others makes me wonder if I haven't taken the wrong path. :)

Hope your bots continue to work
I found significant speed boosts in both runtime and development by combining live trading with my backtester. I now store bets in my live framework and calculate matching and profit/loss locally. When live this data gets ignored but is available to paper trade or backtest with a flick of a switch. I use streaming so when backtesting I just pass it the raw recorded data either from Betfair or what I have recorded and as far as the framework is concerned it makes no difference.

With the above being single threaded I then either spin up multiple instances locally or when doing a few months plus I use a mixture of AWS spot instances and AWS Fargate. With the latter I am looking at processing 12 months of inplay racing data in just over 20 minutes.

I am not sure how you are setup but I went down the route of a separate program that acted like a local API but it ended up being a complete nightmare due to speed and development time.
User avatar
ShaunWhite
Posts: 9731
Joined: Sat Sep 03, 2016 3:42 am

CallumPerry wrote:
Sun Sep 16, 2018 7:07 pm
Say I use a one second logger and record all of the key information in the spreadsheet from 20 minutes out until 00:00:00 that's 1,200 rows of data per market. If I record thousand like you lot have and try and get excel to create charts and stuff it would just freeze.
It might be worth you having your master set(s) of data, and then pull out some of that out into a subset for each type of testing. That can be either a range of dates, race types, volumes, or just the prices at certain times, random markets etc etc whatever. When you think you have something, run it on your large set (or the just the relevant bits of it) Excel might need to be left overnight, but if you get freezes it might be worth getting some more memory. It shouldn't freeze, but it might look like it has.

So collect all you can, but work on as little as possible. Eg bin the horse names, bin the race type cell if you know what type they all are, bin the course name if you don't need it. But obv not in your MstDb, just in the EnqDb

I don't know what you know but even a small amount of vba (start by recording macros) will help a lot moving data from one sheet to another.
CallumPerry
Posts: 575
Joined: Wed Apr 19, 2017 5:12 pm
Location: Wolverhampton

I trialled an example sample out earlier (nothing too big) and the memory was just about coping. I think I'll have to use my brother's computer for big sample analysing in the future, which will be no problem. Will most likely invest in a new system once I start making a decent amount of consistent profit, interesting to know how others do it as always!
foxwood
Posts: 390
Joined: Mon Jul 23, 2012 2:54 pm

CallumPerry wrote:
Sun Sep 16, 2018 7:07 pm
Say I use a one second logger and record all of the key information in the spreadsheet from 20 minutes out until 00:00:00 that's 1,200 rows of data per market. If I record thousand like you lot have and try and get excel to create charts and stuff it would just freeze.
Export all your data into a giant table in one of the free SQL's out there- it's not too difficult to get to grips with for what you seem to be doing.

You can then use Excel pivot tables and pivot charts directly on the SQL data and add extra computed columns either in the SQL itself or add a formula as an extra column to the pivot table.

Allows you to filter and pull out samples quickly. With a little bit of learning you can also add formulae and build summaries in SQL which run miles quicker than Excel trying to crunch the data.

Try it with one of your spreadsheets as a learning exercise - get to grips with it and it will save you untold hours in future.
CallumPerry
Posts: 575
Joined: Wed Apr 19, 2017 5:12 pm
Location: Wolverhampton

That is a fantastic bit of advice thank you! I've just spent about an hour and already it seems so straight forward and I can tell it's going to be something I use a lot in the next phase of my trading once I've gathered some more info. Really happy with my progress in the past month, all down to some brilliant advice and discussion on this forum!
User avatar
ruthlessimon
Posts: 2094
Joined: Wed Mar 23, 2016 3:54 pm

foxwood wrote:
Sun Sep 16, 2018 9:16 pm
+1 for seasonal variations - and variations within season ie start / middle / end are all a bit different as the market works out what this season's runners are like
.. or is it the cyclical volume causing certain strategies to degrade?

Lifted from Dallas's blog post (https://www.betangel.com/blog_wp/2017/0 ... he-tunnel/):

Image

When viewed alongside the race types, does look kinda correlated.

Image
Post Reply

Return to “Trading Horse racing”