Hi all,
I started recording horse racing data a few weeks ago using Switesh's brilliant spreadsheet. I'm now looking to go a step further and move out of Excel and into a database. I'm going to start with MS Access to help improve my knowledge of database design before moving on to something more sophisticated when the data volume ramps up.
The issue I have is that I have no real experience with MS Access or databases in general and having to learn as I go (which is the fun part anyway). But I'm getting a little bit confused in terms of the best way to structure the database to cover all of the data that can be captured, especially the pre-off data at each 1 second interval from 5:30 out to post time (i.e. back prices, lay prices, LTP, volume, WOM Etc... as design by Switesh himself). How do people generally structure the tables to capture this level of data? As someone that is experienced in Excel but not Access, I'm struggling with the adaptation!
I've split some of the data into separate tables, i.e. courses, race types, distances etc., but it's the detail that's throwing me... I'd really appreciate some advice and guidance from those who are far more experienced at this stuff than I am!
Thanks in advance
Tom
Database Structure for Horse Racing
This is how I structure mine based on the free betfair data
Basically if you can achieve 3rd normal form (google it) then you're pretty much there in structure.
It's to a certain level of normalisation which is where you dont want to have masses of duplicate informationBasically if you can achieve 3rd normal form (google it) then you're pretty much there in structure.
You do not have the required permissions to view the files attached to this post.
So looking into normalisation has certainly helped my understanding in terms of how to create separate tables. I'd appreciate a bit of help in understanding how that relates to the more detailed information that is captured on a second by second basis, i.e. I'm currently capturing the following information at one second intervals for 5 mins pre off. Is the most efficient way of capturing this is to have separate tables for each item below , and then separate fields in each table for each second? Or have I overcomplicated it?
Lay Price 3
"Lay Price
3"
"Lay Price
2"
"Lay Price
1"
LTP
LTP IP
"Back Price
1"
"Back Price
2"
"Back Price
3"
"Lay Money
3"
"Lay Money
2"
"Lay Money
1"
Rnr Volume
Rnr Volume %
"Back Money
1"
"Back Money
2"
"Back Money
3"
"WOM
Lay"
"WOM
Back"
WOM: Lay%
WOM: Back%
Lay Price 3
"Lay Price
3"
"Lay Price
2"
"Lay Price
1"
LTP
LTP IP
"Back Price
1"
"Back Price
2"
"Back Price
3"
"Lay Money
3"
"Lay Money
2"
"Lay Money
1"
Rnr Volume
Rnr Volume %
"Back Money
1"
"Back Money
2"
"Back Money
3"
"WOM
Lay"
"WOM
Back"
WOM: Lay%
WOM: Back%
Great! Yes I think you're overcomplicating things slightly. In this case the best thing to do is probably to create separate fields (in the same table) for each of the items you mentioned, and then insert a separate row for each second.TLindeth wrote: ↑Sun Mar 14, 2021 2:19 pmSo looking into normalisation has certainly helped my understanding in terms of how to create separate tables. I'd appreciate a bit of help in understanding how that relates to the more detailed information that is captured on a second by second basis, i.e. I'm currently capturing the following information at one second intervals for 5 mins pre off. Is the most efficient way of capturing this is to have separate tables for each item below , and then separate fields in each table for each second? Or have I overcomplicated it?
Lay Price 3
"Lay Price
3"
"Lay Price
2"
"Lay Price
1"
LTP
LTP IP
"Back Price
1"
"Back Price
2"
"Back Price
3"
"Lay Money
3"
"Lay Money
2"
"Lay Money
1"
Rnr Volume
Rnr Volume %
"Back Money
1"
"Back Money
2"
"Back Money
3"
"WOM
Lay"
"WOM
Back"
WOM: Lay%
WOM: Back%
Also note re normalisation: it can be helpful if you're trying to minimise duplication, but has the downside that you'll have to do a lot more joins when you query. You might be better off using a flatter data structure, maybe something like a star schema.
- ShaunWhite
- Posts: 9731
- Joined: Sat Sep 03, 2016 3:42 am
Small world indeed, I worked in station approach in Cheam for ages in 80s when Clarke & Tilley were there (who eventually became part of DST). I lived in South Croydon for 20yrs (22-42) so know your manor pretty well.
Thanks Harry - this is an interesting concept (I've certainly underestimated this database design lark... ). So in essence I could create a fact table which has something along the lines of:harry wrote: ↑Sun Mar 14, 2021 3:53 pmGreat! Yes I think you're overcomplicating things slightly. In this case the best thing to do is probably to create separate fields (in the same table) for each of the items you mentioned, and then insert a separate row for each second.TLindeth wrote: ↑Sun Mar 14, 2021 2:19 pmSo looking into normalisation has certainly helped my understanding in terms of how to create separate tables. I'd appreciate a bit of help in understanding how that relates to the more detailed information that is captured on a second by second basis, i.e. I'm currently capturing the following information at one second intervals for 5 mins pre off. Is the most efficient way of capturing this is to have separate tables for each item below , and then separate fields in each table for each second? Or have I overcomplicated it?
Lay Price 3
"Lay Price
3"
"Lay Price
2"
"Lay Price
1"
LTP
LTP IP
"Back Price
1"
"Back Price
2"
"Back Price
3"
"Lay Money
3"
"Lay Money
2"
"Lay Money
1"
Rnr Volume
Rnr Volume %
"Back Money
1"
"Back Money
2"
"Back Money
3"
"WOM
Lay"
"WOM
Back"
WOM: Lay%
WOM: Back%
Also note re normalisation: it can be helpful if you're trying to minimise duplication, but has the downside that you'll have to do a lot more joins when you query. You might be better off using a flatter data structure, maybe something like a star schema.
Date
Time
RaceID
RunnerID
CourseID
RaceTypeID
etc.. etc..
With this central fact table then providing a link to other tables of greater detail.. i.e. a table for the pre-off data (structured as you mentioned), names of runners, distances, courses etc..
Have I got that right? I'm sure I'll eventually get my head around this - you'll have to excuse my noddy questions until then but thanks for your help!
Last edited by TLindeth on Sun Mar 14, 2021 4:25 pm, edited 1 time in total.
Christ it certainly is Shaun! I'm just off Mulgrave Road which is basically adjacent to that. I know South Croydon well, grew up playing football regularly in that area, still a regular in The Jolly Farmer just down the road in Purley too!ShaunWhite wrote: ↑Sun Mar 14, 2021 4:14 pmSmall world indeed, I worked in station approach in Cheam for ages in 80s when Clarke & Tilley were there (who eventually became part of DST). I lived in South Croydon for 20yrs (22-42) so know your manor pretty well.
Haha yeah there's so much to it, I always find I end up spending a lot more time than I plan on the database side of things and it takes me a few iterations to get it the way I want. The main thing is to get something reasonable that will work, start capturing the data, and then when you think of a better way to do it later you can migrate your old data into that. I think your plan sounds pretty good!TLindeth wrote: ↑Sun Mar 14, 2021 4:20 pm
Thanks Harry - this is an interesting concept (I've certainly underestimated this database design lark... ). So in essence I could create a fact table which has something along the lines of:
Date
Time
RaceID
RunnerID
CourseID
RaceTypeID
etc.. etc..
With this central fact table then providing a link to other tables of greater detail.. i.e. a table for the pre-off data (structured as you mentioned), names of runners, distances, courses etc..
Have I got that right? I'm sure I'll eventually get my head around this - you'll have to excuse my noddy questions until then but thanks for your help!
- ShaunWhite
- Posts: 9731
- Joined: Sat Sep 03, 2016 3:42 am
Bloody google. I've just spend the last 5mins 'walking' down Mulgrave Rd re-living my old commute. I wasn't in the posh end down near Purley, I was up near the flyover (in The Waldrons), it was only estate agents that called it South Croydon really.
harry wrote: ↑Sun Mar 14, 2021 5:09 pmHaha yeah there's so much to it, I always find I end up spending a lot more time than I plan on the database side of things and it takes me a few iterations to get it the way I want. The main thing is to get something reasonable that will work, start capturing the data, and then when you think of a better way to do it later you can migrate your old data into that. I think your plan sounds pretty good!TLindeth wrote: ↑Sun Mar 14, 2021 4:20 pm
Thanks Harry - this is an interesting concept (I've certainly underestimated this database design lark... ). So in essence I could create a fact table which has something along the lines of:
Date
Time
RaceID
RunnerID
CourseID
RaceTypeID
etc.. etc..
With this central fact table then providing a link to other tables of greater detail.. i.e. a table for the pre-off data (structured as you mentioned), names of runners, distances, courses etc..
Have I got that right? I'm sure I'll eventually get my head around this - you'll have to excuse my noddy questions until then but thanks for your help!
Thanks Harry - now I just need to put the plan into action, maybe that will my help my understanding of how this truly works...
Haha - Sutton has changed a fair bit since your commuting days. There are significant developments around the station now, it actually looks pretty impressive. Shame the high street hasn't really received the same attention. The same could be said for Croydon too!ShaunWhite wrote: ↑Sun Mar 14, 2021 5:39 pmBloody google. I've just spend the last 5mins 'walking' down Mulgrave Rd re-living my old commute. I wasn't in the posh end down near Purley, I was up near the flyover (in The Waldrons), it was only estate agents that called it South Croydon really.