Wednesday, October 3, 2012

Believing the numbers: when do goals for and against stabilise?

At this point of the year it's important that we try and establish what numbers can be believed. In other words: how small is too small for a sample size? Consider, for example, that through six gameweeks last season:

  • Aston Villa ranked 10th in goals scored at home and 3rd on the road. They wound up being the 2nd worst attacking side in the league, managing to notch just one goal more than Stoke. 
  • Tottenham placed 19th in home defense, shipping an average of 2.5 goals per game. They finished the year as one of the league's best defenses, ranking 4th at home and 5th on the road.
  • QPR conceded just one goal in their first three away games. They went on to concede 40 more in the next 16, at an alarming rate of 2.5 a game, enough to rank behind everyone outside of lowly Blackburn.

We could go on and on. The point being, six games is not enough to write teams off or buy into short lived success. When then, is long enough? Using data from the past two seasons (ideally we'd use a much larger sample but I don't have it to hand and, honestly, I don't want to to the leg work to get it in the right form), we can plot a teams goals scored/conceded per game on a weekly cumulative basis against their final attacking/defensive record and then try and locate when the GPG rate becomes sufficiently predictive (I'm arbitrarily setting this at 75%). Let's look at the trends:


The first thing to note is that defensive goals per game seem to stabilise at a similar rate whether at home or away, gaining 80% correlation, somewhere around gameweek 12. That said, even at the stage we're currently at this year, it seems we have a decent set of data to predict year end results. That suggests that there is a pretty good chance that the likes of West Brom and West Ham are for real, while formerly great defenses like Liverpool and City might continue to have struggles. Of course, as noted in the introduction, some of these trends will reverse as, if the correlation holds, they're only ~65% predictive, but even so, we're already at a point where one can't just simply say that City are great and West Ham are terrible simply because "that's the way it is".

The second point to note is that for goals scored, the results at home seem to be much more reliable than those on the road. Indeed, we see close to 80% correlation after just seven gameweeks at home while away data is all over the place, only settling down into the second half of the year. Without looking deeper into the numbers I'm not really sure why this is, other than the obvious fact that it's hard to score away from home so your denominator is smaller when calculating GPG. Even one goal swings in that scenario will therefore have a large impact on this analysis. As a side note, we also observe that the r-squared of the home scoring data stabilises at 70% as early as gameweek 10, which suggests that not only is this data predictive of the overall picture, but it also applied to a decent proportion of teams. This suggests that teams like Fulham, Southampton and West Brom could, and possibly should, continue to have success based on their performances to date.

So what does this mean? For the next week or so, not too much. I'm still not ready to give up on City's defense and I'm still not willing to believe that Fulham are an elite attacking side at the Cottage. We should however be looking out for the below (approximate!) dates, at which we can assess where team's stand and, with some certainty, will finish up:

  • Home goals scored: Gameweek7
  • Away goals scored: Gameweek 21
  • Home goals conceded: Gameweek 13
  • Away goals conceded: Gameweek 12
Try and keep these dates in mind before you buy you third Everton midfielder/forward or you build your defense around a pair of West Ham defenders.

22 comments:

chemikills said...

Top notch analysis, but I'm not sure how relevant the results are, tough to assess.

If you calculate the average shots that an offensive team makes within the 18 yard box.
Then average it and calculate across the fixtures that have past. For example, Man city:
SOT liv QPR
6.9 10.2 6.5
stk ARS ful
6.6 8.8 8.2 Ave. 7.9
Then compare that with the actual average shots conceded within the 18 yard box over the first 6 fixtures (4.8)
You have a legit reason to believe the Man City defence is comparatively better than the average.
Since they have conceded 3.1 less shots in the box each game than the average opponent (This measure has excluded H/A because of the small sample size).

Chris Glover said...

Not sure I follow. Totally agree with the shots in the box numbers and I believe I mentioned this in an earlier post as being the best indicator of future success available from Opta.

This post is more about why you shouldn't look at goal stats yet, but could feel more comfortable doing so after X weeks, depending on what you're looking at. Shot data is useful for sure but as with everything there are exceptions (Liverpool last year) so I'm just saying we can rely on goals scored/conceded a bit after a while.

I think there is a tendency in the stat community in general to overlook the obvious so this was a bit of a counter argument to the proposition that historical goals scored/conceded is totally useless. Not rocket science, agreed, but worth a quick post I thought.

Plus, you know, pretty graphs! :)

stuckert said...

Interesting article. Can I suggest there might actually be something else to draw out of your pretty graph?

Look at GW25 ... by that week, all four metrics have reached 90% of their predictive value. You could theoretically use that point in time as the week to use your wildcard. It would still give you about a third of the season left to score points, and you'd have a very good feeling for which teams should perform well over the remaining games.

Of course by that time, a lot of the key squad members will be set. But perhaps you could round out your team better and gain something like 5-10% per week over others in your league.

Just a thought.

Shreyansh Jain said...

Keeping 70 to 80% as the benchmark for "guessable" performance is actually optimal! Of course at 90% saturation you'd have a better chance, but that won't put you ahead of the pack. I'd rather go with the 15% lesser correlation because it comes at an earlier gameweek.
Let's say the 15% loss in correlation is in direct proportion to points. That would mean that starting gameweek 28, in each gameweek then onwards till the end of the season, I would have 15% lesser points than someone else who goes with the 90% stats. But before that, other stats drop to an average low of 60%, starting gameweek 13(which is again the average of 7,21,13,12).So I gain a 15% higher correlation for 15 gameweeks and lose 15% correlation for 10 gameweeks. Really good choice, that 70-80% range. Love it.

stuckert said...

Yeah, I'm not arguing against the original post, just suggesting there may be two approaches: one more aggressive, and a second more conservative.

Actually I have long hoped for an analysis of wildcard strategies. Last year I bet on Chelsea early; finding that a disaster, I used my WC around GW3 or 4 to replace my Chelsea-heavy lineup with RVP and a few other options. I was able to come back to place in the top 3 in my league, but it took a year's effort and I never really challenged for the lead.

This year I am in first, and have not yet used my WC.

chemikills said...

@Chris I did write that up all fairly fast

Step 1) Get a table of the average quantity of shots each PL team has made within the 18 yard box
2) Insert these quantities into each fixture that has been played (i.e. the Man City example) and then average across all fixtures
3) Compare this value with the average quantity of shots 'conceded' in the 18 yard box for the specific team, in this example Man City

So my step 2 answer was 7.9, my step 3 answer was 4.8. Based on the assumption that offensive performances are the same each week (bad assumption but useful here) the Man City defence has conceded 3.1 shots less per game than we would have expected them too.

Also as it turns out so far they have conceded 1.6 less shots in the 18Y per game than last season

Gummi said...

Following up on the 70 - 90% correlation discussion this could also be a rule of thumb on how to play the game based on you goals.

If you are aiming to win the overall game, you need to be going for 70-80% correlation, at the latest.

If, however, you aim to win your not-so-competitive mini-league a better strategy could to be aiming for a 90% correlation wildcard.

There's always the question of rising prices though and the likely inability to get the optimal team in around that time.

Those damn prices are rising so fast. I feel that there must have been a big chance to the algorithm and it will have a bigger impact on our season than we realize at the moment.

Kalix said...

Chemikills...that's an interesting idea. Replacing our GPG numbers with shots taken and shots conceded. It would probably give a more accurate set of data, given that sometimes the scoreline doesn't always match each team's performance.

Very interesting indeed!

CDI said...

Gummi, There was no change to the price rise algorithm. If anything it's taking more NTI to achieve a price rise( 22k NTI at the end of last season compared to 32k NTI this season so far). People forget that everyone had a free WC after the mess up in GW1 last season so there wasn't as much dead weight in peoples teams as there is this season.

Bryan McKenna said...

For those of us who dread statistics/maths/Economics (okay, that last one may not really fit in here, but after a crash course in it recently, figured id throw it in!)

By game week 12/13, we can use this seasons stats to predict how teams will do going forward with a *reasonable* amount of certainty? (bar away goals)


I was considering my WC for around week 11/12/13 as we have one in December and I never saw waiting for a DGW to be that attractive (they can be tricky).

So, dumb question ahoy, if I wanted to get ahead of the pack GW 11/12 would be the time? Can always use free transfers to sort out the inevitable mistakes, but by that time I think you can be fairly confident in which of the big hitters will perform and how the other sides may fare long term.

Or at least last year, it seemed by mid November everyone was catching onto the trends.

Bryan McKenna said...

That should be the winter WC in January of course, not december.

@shots_on_target said...

@Chris - i believe you are missing something very important in your analysis, and this is what Chemikillis is getting at. You point to Aston Villa last year as an example of how early season form based on goals scored after 6 weeks is a poor indicator of future goal-scoring - and you are right, and here's why. You first would need to incorporate the relative strength of Villa's opposition in those first 6 games - Ful, Bla, Wol, Eve, QPR, New. - Mostly v.poor teams defensively, with only Newcastle at this time defending well..

You would also, I believe, need to take into account likely regression and the the underlying performance stats (again as per @ChemiKills). In those first 6 games Villa scored 7 goals but this was from just 15 shots on target, i.e an unsustainable 50% conversion rate. Also, Villa's opposition in the first 6 GW conceded 43 goals from 130 SoT, a collective average of Goals Against of 1.2 and 3.7 SoT. Villa averaged against these teams 1.4 goals and a very poor 2.5 SoT / game. Considering average SoT per game over tlast season was 5 for home sides and 4 for the away team Villa's avg. of 2.5 in those first games was actually a strong indicator of how very poor they would be throughout the season. Hope this makes sense,

Gummi said...

@CDI: Interesting, thanks for the clarification. Could it also be some other trend we haven't thought of?

E.g. are more people playing some kind of price game, trying to finish up with the most valuable team?

Or is a site like Fantasy Football Scout, which I believe is by far the most popular one on the subject, creating massive bandwagons?

CDI said...

Its possible that more people are playing the price game this season as they added the most valuable team league this season. I don't think that alone would cause the price fluxuations we are experienceing so far this season.

The FFS league only has around 13k members so I don't think they could move the markets that much.

I remember in GW1 players like Aguero, Dzeko etc all went from like 2% ownership to 30%+ while not experience any price rises as the prices were fixed for GW2. Players were able to get rid of all their GW1 mistakes and bring in the form players without triggering any price rises which calmed the market alot. Right now everyone is still scrambling to cut the dead weight and get the inform players.

Chris Glover said...

From the overall comments here I think the point of the piece has been missed a bit, and seeing as most people seem to feel the same way, we can assume that's because I didn't communicate it very well.

The point of the piece was really to say that we shouldn't be giving up on City defenders or falling over ourselves to buy Fulham forwards purely based on the fact that they've done well to date.

By GW7 for 3/4 metrics we're seeing only 40-65% correlation which would leave somewhere between 8-12 teams who won't follow this trend. There are a number of ways we could try and weed these out, such as looking at shots on goal, strength of schedule etc but that wasn't my intention here.

My point was supposed to be as we approach GW10 and beyond, we do need to start considering that perhaps great attacking units/defenses of old can't be relied on based on name alone. That was really my only point and I appreciate it's not a deep one, hence the lack of write up.

I really like the idea of looking at shots on goal, particularly those inside the box which seem to correlate best to future goals, but that's a much much deeper level for another post. I have essentially done exactly what Chemikilis suggests in my goals forecast piece and I will soon add a new column which will show how many 'expected' goals a team "should" have scored or conceded based on their shot numbers. I also plan to add a similar idea for the captain picks.

I talked about the shot regression idea a couple of weeks back:
http://premierleaguefantasy.blogspot.ca/2012/09/judging-team-success-shots-inside-box.html

Thanks for the comments guys, it's getting really fun to write here and get everyone's feedback. On that note, @shots_on_target is launching a new forum to discuss this kind of thing which could be really cool. More on that to come soon I hope.

John Doe, 2008 said...

@Chris -

Great analysis but I do agree with what others (specifically chemikills and shotsontarget) have said. There are two fundamental problems in this piece:

1. Goals are simply too infrequent to be reliable for much of anything statistically at the 5, 6, or even 10 game week. There is some relationship certainly but it is week due to the lack of positive outcomes available. Thus we should probably use SOG or some other metric that highly correlates with goals scored but yet occurs with much more frequency.

2. Unless I a mistaken, there is no adjustment for opponent strength is this analysis. That is a major flaw given the limited sample sizes at play here. A team like Southampton has had a killer schedule (@ MCI, @ EVE, @ ARS, v MUN), so one would expect their numbers to be severely skewed this early in the season, while they will regress towards the "truth" as the schedule strength normalizes over the course of the season. We need a way to analyze these factors.



Chris Glover said...

John - Again, read my response below. I thought it was a given that goal history is not predictive of future success, hence my opening paragraph about Villa. I am arguing the exact same point as everyone else here - I will have to go back and re-read it and perhaps I phrased something badly.

Regardless of shot totals you absolutely can rely on goal data history with some certainty once you get into the GW20+ period, and that makes sense as strength of schedule is no longer an issue and shot data will likely have levelled out.

To be clear, I am saying that using past GPG history is NOT viable for the short term. As I say, in stat community this isn't exactly a breakthrough, but udnerstand that a lot of people aren't using stats that a few on here and I often read that team X is in good form etc. This post was really meant to quash that claim quickly, nothing more.

Chris Glover said...

Ok, just re-read the piece and I think the piece that is throwing people off is where I lable home scoring as stabilisng at GW7. First of all, these are facts, so whether we can find exceptions or not, there was factually an 80% correlation between what teams did to that point and then what they did for the rest of the year. Second, I shouldn't have bothered with these labels and I gave the impression I believe these totals were totally reliable at that stage. I don't. All I meant was that GPG have some use at earlier stage for home teams, but very much concede that ALOT more analysis is needed.

Apologies if all that wasn't clear from the original piece, my mistake for writing it late at night and then not proof reading.

The good news is that I am very much on the same page as chemikills and shots_on_target with regards to shot data, and as I say, I have already baked some of that into my GPG weekly analysis which I will probably expand on next week.

John Doe, 2008 said...
This comment has been removed by the author.
John Doe, 2008 said...

"To be clear, I am saying that using past GPG history is NOT viable for the short term. As I say, in stat community this isn't exactly a breakthrough, but udnerstand that a lot of people aren't using stats that a few on here and I often read that team X is in good form etc. This post was really meant to quash that claim quickly, nothing more."

That helps clarify things a bit.

Hope my early post didn't come off as too critical. I think maybe my thirst for a definitive predictive model is making me a bit cranky! :)

Gummi said...

@Chris Glover: Perhaps some of the misunderstanding comes from the fact that the piece wasn't delving too deep and pointing out the accepted "truth" for us that use stats at all.

Not that it wasn´t a good piece or relevant, it's just that I think we, well, at least me, are getting used to being schooled by your articles.

You often make me think of the Fantasy game in new light, so people perhaps sought out a "deeper" meaning than you were trying to convey.

@shots_on_target said...

I think Gummi is right and it's ahrd to picth an article like this to a wide audience. And I think we all wanna contribute our own thoughts.

Would it be possible to do the same analysi for earlier seasons to see if the same GPG confidence is arrived at by certian gameweeks?

Great stuff Chris.