Hello
I am trying to build some data / models using Poisson distribution and XG at which I am a complete noob and would value some input
Currently working on Premier League before the start of the season and I wonder if there is a wrong or right answer to this question
When working out total home/away goals how are newly promoted teams factored in to the calculations? If I work out the average goals around a 38 game season for the PL then a 46 game season for the Championship will this mean my calculations are still valid or will they be slightly skewed?
also, should the teams relegated from PL still be included in the 17/18 numbers or should I swap them out completely for the newly promoted teams?
any thoughts greatly appreciated
Premier League Poisson Distribution - Help Please
-
- Posts: 1088
- Joined: Fri Nov 20, 2015 9:38 am
Why don't you look at the previous season and build you model against that then see if it delivers what you expect. Promoted / relegated teams can be included / excluded under different models and look at the difference in results / variance between the models.
I suspect you will just not have enough data over one season to get a robust model and in addition there will no doubt be new characteristics of the coming season which will significantly impact expectations from the model.
Are you sure you want to use a Poisson distribution: the events (I'm assuming goals) need to occur at a known constant rate and be independent of time since last event. Not sure you could argue that is true for footy?
I suspect you will just not have enough data over one season to get a robust model and in addition there will no doubt be new characteristics of the coming season which will significantly impact expectations from the model.
Are you sure you want to use a Poisson distribution: the events (I'm assuming goals) need to occur at a known constant rate and be independent of time since last event. Not sure you could argue that is true for footy?
It's not true for footy. I used it about 30 years ago to model and realised it came up short.sionascaig wrote: ↑Fri Jul 13, 2018 2:47 pmAre you sure you want to use a Poisson distribution: the events (I'm assuming goals) need to occur at a known constant rate and be independent of time since last event. Not sure you could argue that is true for footy?
i've tinkered with models for this on and off for a while and can conclude that (in my instance), they didn't produce a reliable hit rate. This could be down to varying conditions on the day, teams changes, injuries during the match, etc, etc... Bookies have analyst teams dedicated to this task and i'm certain they use a variety of models to approximate their correct score markets. You only have to review those ahead of the game vs the end result to see that they also get it consistantly out of kilter.
that said, persistence is the key to innovation, so if you crack something that's > 50% correct, then you're onto something!!
that said, persistence is the key to innovation, so if you crack something that's > 50% correct, then you're onto something!!
I played wth lots of paramters and would jump with joy when my 3-2 prediction on a game came out as 3-1. other thasn that, i saw a lot of failed predictions and ended up relegating Poisson to the curiosity corner. There was a pinnacle article a few years back that drilled down on the poisson approach and offered up enhancements to the basics, that may be helpful if you can find that article (I'll see if i can find it and post the link).
in short, i think more milage can be had from looking at the OU markets and positioning yourself around the 70 minute mark with some sharp lay based value metrics.
- Kafkaesque
- Posts: 886
- Joined: Fri Oct 06, 2017 10:20 am
A number of thoughtssmd100 wrote: ↑Fri Jul 13, 2018 2:26 pmHello
I am trying to build some data / models using Poisson distribution and XG at which I am a complete noob and would value some input
Currently working on Premier League before the start of the season and I wonder if there is a wrong or right answer to this question
When working out total home/away goals how are newly promoted teams factored in to the calculations? If I work out the average goals around a 38 game season for the PL then a 46 game season for the Championship will this mean my calculations are still valid or will they be slightly skewed?
also, should the teams relegated from PL still be included in the 17/18 numbers or should I swap them out completely for the newly promoted teams?
any thoughts greatly appreciated
- It might just be wires crossed on my end, but you need to start with getting your terminology straight to clarify what you're trying here. You mention Xg and then talk about goals. Xg is not based on goals, but on shots (and the likelyhood of x number of goals in future matches; in its most simplistic form).
- If you're looking to build on goals only, I'd say you have pretty much no chance of making a profiable model. How effective Xg models actually are is very much up for debate, but imo Xg has been shown beyond doubt to outperform goal models.
- If you're actually looking at Xg, then looking at 17/18 numbers are not enough. It's way too small a sample size. You need the full set since OPTA started recording, which is - if memory serves me correctly - about 8-9 seasons for the top 4-5 leagues, 5-6 seasons for another handfull of leagues, and a few seasons for some medium-sized leagues
- The PL/CH challenge doesn't just relate to the number of matches in a season, but also that promoted teams go from being a top team in their division to (in all likelyhood) a bottom team. You'll struggle to use any kind of model for those team, for either the first part of the season or the entirerity of it, depending on who you ask.
- Poisson is generally considered to be ineffective for football for a few reasons. It can work - maybe - but needs to at least be tweaked. Google Dixon-Coles model and go down into the deep, deep rabbithole of people attempting to tweak and adjust poisson for football, and then decide for yourself if you feel it's worthwhile
- ShaunWhite
- Posts: 9731
- Joined: Sat Sep 03, 2016 3:42 am
The issue I see with cs prediction is the low range of expected values (ie 0 - 4ish) and the disproportionate effect of variables (ie luck, decisions etc). Footy cs scores aren't many steps removed from binary outcomes and a miss is as good as a mile.
If it was like basketball with a 50 'goals' then getting in the right zone, or range of scores would be easier.
just mho.
If it was like basketball with a 50 'goals' then getting in the right zone, or range of scores would be easier.
just mho.
apart from the aforementioned random variables that crop up during the game, one thing that poisson (imho) doesn't cater for is the dynamic of THOSE two teams playing each other. it's all very well looking at the respective home/away goal stats and apprortioning them out. however, it just needs the two opposing teams to have a very similar game style (tick tack passing, midfield dominance etc) and you end up with a game that potentially has a null outcome, despite the model predicitng a 4-1 home win etc...
as i said, you'll be standing on the shoulders of giants of you do crack it anywhere near a personal acceptable level.
as i said, you'll be standing on the shoulders of giants of you do crack it anywhere near a personal acceptable level.
- Kafkaesque
- Posts: 886
- Joined: Fri Oct 06, 2017 10:20 am
Doesn't have to be CS though. Can just as easily be used for match odds, Asian, and over/under. Apart from that your point is solid and the reason poisson doesn't work straight up. It cannot account for the good old football cliche of "goals changes football matches". The dynamics of how much teams chase goals or instead sit back holding what they have, changes too dramatically in certain types of matches and with certain goals in terms of their timing.ShaunWhite wrote: ↑Fri Jul 13, 2018 3:41 pmThe issue I see with cs prediction is the low range of expected values (ie 0 - 4ish) and the disproportionate effect of variables (ie luck, decisions etc). Footy cs scores aren't many steps removed from binary outcomes and a miss is as good as a mile.
If it was like basketball with a 50 'goals' then getting in the right zone, or range of scores would be easier.
just mho.