Am I fooling myself?

The sport of kings.
Post Reply
User avatar
boony
Posts: 6
Joined: Mon Nov 07, 2016 8:53 pm

ruthlessimon wrote:
Sat Sep 15, 2018 5:14 pm
boony wrote:
Sat Sep 15, 2018 2:54 am
I've been back-testing a strategy and it is showing a profit over the ~3300 races I've tested so far. However, the equity curve is "choppy" to say the least. If I filter out a particular set of courses it looks a lot better (see graph).
Is there data missing from the "everything" line?

Whenever I filter a variable (i.e.course), I always see a drop in frequency - so I'm interested in how you've got them to both match
Simon, there's no data missing. I have a row for each race in Excel, one of the columns is the profit/loss for the race. I create a new column in Excel which is the 'everything' cumulative profit/loss. When I 'filter' I don't use the excel filter, but instead I have a another column with a formula that drives the profit/loss for races at the filtered courses to zero. Then I can simply plot both columns on the same chart.

Thanks to all for the feedback - much appreciated.

I'm continuing to run the back-test on more races - I want to run against a full year. The problem is my back-testing software is painfully slow so the information I seek is going to take some time to obtain... and I'm impatient :)
User avatar
ShaunWhite
Posts: 9731
Joined: Sat Sep 03, 2016 3:42 am

boony wrote:
Sun Sep 16, 2018 2:04 pm
The problem is my back-testing software is painfully slow so the information I seek is going to take some time to obtain... and I'm impatient :)
How long is a long time? I run my big backtests overnight. My 'full' test takes about 6 hours and is just about finishing when I get up again.
User avatar
ruthlessimon
Posts: 2094
Joined: Wed Mar 23, 2016 3:54 pm

ShaunWhite wrote:
Sun Sep 16, 2018 5:36 pm
How long is a long time? I run my big backtests overnight. My 'full' test takes about 6 hours and is just about finishing when I get up again.
6hrs!?!

lol - I moan if it takes anything longer than a couple of minutes
User avatar
boony
Posts: 6
Joined: Mon Nov 07, 2016 8:53 pm

ruthlessimon wrote:
Sun Sep 16, 2018 5:45 pm
ShaunWhite wrote:
Sun Sep 16, 2018 5:36 pm
How long is a long time? I run my big backtests overnight. My 'full' test takes about 6 hours and is just about finishing when I get up again.
6hrs!?!

lol - I moan if it takes anything longer than a couple of minutes
Lol

I kicked my back-test off at 01:30 on Friday. The 3300 races was a snapshot roughly 24 hours later.

Please tell me how you're doing it so quickly!!

I suspect it's the amount of data I'm processing that is the difference. I log full market depth from 30 mins out until market is suspended, including all the in-play data. My back-test then involves replaying all that data and simulating the bet placement and matching.
User avatar
ruthlessimon
Posts: 2094
Joined: Wed Mar 23, 2016 3:54 pm

boony wrote:
Sun Sep 16, 2018 5:58 pm
Please tell me how you're doing it so quickly!!

I suspect it's the amount of data I'm processing that is the difference. I log full market depth from 30 mins out until market is suspended, including all the in-play data.
30mins out + inplay! Blimely yah that'd be a lot of data ;)

If I'm looking at a "specific group" - I will initially refine my full dataset (i.e. only Hcaps) - straight away that reduces the workload on Excel

But generally, I'll be working on 3mth samples, with only the top 4 runners - this usually (max) equals between 10,000 - 20,000 rows, 600 columns (5mins price, 5mins vol)

For me personally, the majority of speed issues seem to be related to inefficient formulas
User avatar
boony
Posts: 6
Joined: Mon Nov 07, 2016 8:53 pm

So, I'm another couple thousand races processed and my analysis brings me back to my original concern as to whether it's a legit process to filter out courses.
Untitled2.png
Blue line is betting everything. Orange line is betting everything except races at the courses I decided were poor after looking at results after ~3300 races. Grey line is a new set of excluded courses which I determined looking at the results after ~5300 races.

After the first 3300 races, you can see that orange line moved about but ultimately flat-lined over the next 2000 races.

Obviously the grey line looks a lot better, but what will happen over the next 2000 races? Will it flat-line again and force me to come up with another set of course exclusions to make it profitable?

It seems to me that if the filters I come up with have no relevance on future races then they're pointless and this whole process of back-testing and applying filters is flawed.
You do not have the required permissions to view the files attached to this post.
CallumPerry
Posts: 575
Joined: Wed Apr 19, 2017 5:12 pm
Location: Wolverhampton

I personally think at this stage you just go in with small stakes and try it out. Otherwise you run the risk of over-fitting. That's looks as good of an indicator to give it a go with a couple of quid you don't mind losing as anything I can imagine. Track everything when you go live and then after a few hundred markets you should see whether it looks promising, after a few thousand if it still works you've cracked it!

May I ask, what programmes do you lot use to back test? I'm just using excel. Say I use a one second logger and record all of the key information in the spreadsheet from 20 minutes out until 00:00:00 that's 1,200 rows of data per market. If I record thousand like you lot have and try and get excel to create charts and stuff it would just freeze. I've seen a video before, I think it was on Nigelk's YouTube channel where there was a loading bar. Is this something you lot use or is it a completely different programme? Point me in the right direction please!
User avatar
boony
Posts: 6
Joined: Mon Nov 07, 2016 8:53 pm

CallumPerry wrote:
Sun Sep 16, 2018 7:07 pm
I personally think at this stage you just go in with small stakes and try it out. Otherwise you run the risk of over-fitting. That's looks as good of an indicator to give it a go with a couple of quid you don't mind losing as anything I can imagine. Track everything when you go live and then after a few hundred markets you should see whether it looks promising, after a few thousand if it still works you've cracked it!

May I ask, what programmes do you lot use to back test? I'm just using excel. Say I use a one second logger and record all of the key information in the spreadsheet from 20 minutes out until 00:00:00 that's 1,200 rows of data per market. If I record thousand like you lot have and try and get excel to create charts and stuff it would just freeze. I've seen a video before, I think it was on Nigelk's YouTube channel where there was a loading bar. Is this something you lot use or is it a completely different programme? Point me in the right direction please!
I've written a suite of programs.

First one is the Logger; every half second it logs pretty much everything it gets from the Betfair API starting at 30 mins pre-scheduled off, until the race is suspended after going in-play. So the amount of data is vast - hence slow to process.

Then I have the Simulator that mimics the Betfair API and can replay the data logged by the Logger.

Finally a Strategy-Runner calls the Simulator as though it was Betfair to get data and allows me to test strategies. The strategy receives the data one row at a time, does the processing and then places the bets. The bet matching is simulated to mimic real-life as closely as possible. My simulator outputs a results file that I load into excel to do charts and other stuff.

I've tested loads of strategies but have only managed to get one that is profitable so far- I'm hoping this one will be my second. The beauty is that once the strategy is coded, it's a one-line code change to make it use the real API .. and the 'bot' is born :)
User avatar
Dublin_Flyer
Posts: 688
Joined: Sat Feb 11, 2012 10:39 am

Have you re-ran and included the orange line including the bad races to see was the recent flat line an arbitrary/seasonal fvck up? This month and next month are notorious for weird results with the changeover from flat to NH and likewise in April/May from NH to flat.

5500 races is only about 3 or 3.5 months because of summer racing quantities, could be worth running it a good while longer to see if the present streak is a seasonal change/aberration. The last 250 or so downhill streak only equates to about a 7 or 8 day period when there's 35-45 races daily, everyone has a bad week, prior to the last week it's growth was significant for the number of races involved.

I'm of the view that backfitting your system if you have a logical reason to do so is ok, it's when you start superbackfitting that it screws up.
CallumPerry
Posts: 575
Joined: Wed Apr 19, 2017 5:12 pm
Location: Wolverhampton

That is seriously impressive stuff boony... not going to lie some people's knowledge and skills on this forum frightens me and my basic little robots. It's like I'm designing a really good paper airplane that is being thrown into a hurricane when other traders have made military standard fighter jets with their own hands.

Hope all of your work pays off for you!
foxwood
Posts: 390
Joined: Mon Jul 23, 2012 2:54 pm

Dublin_Flyer wrote:
Sun Sep 16, 2018 8:37 pm
Have you re-ran and included the orange line including the bad races to see was the recent flat line an arbitrary/seasonal fvck up? This month and next month are notorious for weird results with the changeover from flat to NH and likewise in April/May from NH to flat.

5500 races is only about 3 or 3.5 months because of summer racing quantities, could be worth running it a good while longer to see if the present streak is a seasonal change/aberration. The last 250 or so downhill streak only equates to about a 7 or 8 day period when there's 35-45 races daily, everyone has a bad week, prior to the last week it's growth was significant for the number of races involved.

I'm of the view that backfitting your system if you have a logical reason to do so is ok, it's when you start superbackfitting that it screws up.
+1 for seasonal variations - and variations within season ie start / middle / end are all a bit different as the market works out what this season's runners are like

+1 backfitting ok - all prediction is based on historical lessons - unless you are a seer :D
foxwood
Posts: 390
Joined: Mon Jul 23, 2012 2:54 pm

boony wrote:
Sun Sep 16, 2018 6:35 pm
So, I'm another couple thousand races processed and my analysis brings me back to my original concern as to whether it's a legit process to filter out courses.

It seems to me that if the filters I come up with have no relevance on future races then they're pointless and this whole process of back-testing and applying filters is flawed.
The courses are different - they run different race types and classes with different ranges of typical prices so a strategy that suits top class races may not work at cold and wet Wolverhampton in Winter. All depends on how you are slicing and dicing if course is relevant.

Try deriving a good looking set of parameters from 2500 races. Then apply those parameters to the remaining 2500 to see what they "would have done". You are then seeing the outcome of your back-testing applied to "future" races. Be aware you may be mixing jumps/flat though as Dublin pointed out but the importance of that depends on what your strategy is - may be unimportant or could be vital.
spreadbetting
Posts: 3140
Joined: Sun Jan 31, 2010 8:06 pm

CallumPerry wrote:
Sun Sep 16, 2018 9:13 pm
That is seriously impressive stuff boony... not going to lie some people's knowledge and skills on this forum frightens me and my basic little robots. It's like I'm designing a really good paper airplane that is being thrown into a hurricane when other traders have made military standard fighter jets with their own hands.

Hope all of your work pays off for you!
I used to feel the same but I've been using the same simple techniques and spreadsheets I made years ago and they still work fine. No offence to the data gatherers but they do all seem to assume their skillset/backgrounds are going to crack the markets for them rather than thinking about how and why markets move the way they do.

Your little paper airplane may turn out to be a lot more profitable than some complicated lead balloon so don't assume the war is lost.
User avatar
boony
Posts: 6
Joined: Mon Nov 07, 2016 8:53 pm

Callum thanks for your nice comment but I agree with Spreadbetting... Coding is what I do for a living so it felt natural for me to approach it this way. Given how long it's taking me to get the results compared to others makes me wonder if I haven't taken the wrong path. :)

Hope your bots continue to work
LinusP
Posts: 1871
Joined: Mon Jul 02, 2012 10:45 pm

boony wrote:
Mon Sep 17, 2018 4:54 pm
Callum thanks for your nice comment but I agree with Spreadbetting... Coding is what I do for a living so it felt natural for me to approach it this way. Given how long it's taking me to get the results compared to others makes me wonder if I haven't taken the wrong path. :)

Hope your bots continue to work
I found significant speed boosts in both runtime and development by combining live trading with my backtester. I now store bets in my live framework and calculate matching and profit/loss locally. When live this data gets ignored but is available to paper trade or backtest with a flick of a switch. I use streaming so when backtesting I just pass it the raw recorded data either from Betfair or what I have recorded and as far as the framework is concerned it makes no difference.

With the above being single threaded I then either spin up multiple instances locally or when doing a few months plus I use a mixture of AWS spot instances and AWS Fargate. With the latter I am looking at processing 12 months of inplay racing data in just over 20 minutes.

I am not sure how you are setup but I went down the route of a separate program that acted like a local API but it ended up being a complete nightmare due to speed and development time.
Post Reply

Return to “Trading Horse racing”