Sunday, September 9, 2012

Judging team success: shots inside the box

We've touched on 'predictive' stats for individual players in the past on a number of occasions, noting in particular that shots on target and 'key passes' are generally the best indicators of future success. We are often guilty of using past results to predict the future rather than the process which got us there. I won't pretend to have much experience by way of classic statistical training but the typical example here would be to compare this logic with predicting that the roulette wheel is 'due' to fall on red simply because the last 10 turns have been black (I understand this kind of analysis can make many football fans uncomfortable as their beloved stars are complex athletes not machines but we can often make broad generalisations which are still instructive).

We generally focus on individuals in the fantasy world as knowing that, say, Arsenal are a good team isn't too helpful if we can't pinpoint individual players to target. However, early in the season it can be useful to look at team stats if they help open our eyes to new teams who might be better than expected and thus whose players we should be keeping an eye on.

Before digging into the specific teams, let's take a look at some of the best predictive stats for teams from last season:

Shots in general are a good indicator of goals (around 80% correlation) but as one would expect shots from inside the box are even better (85%). Again, I'm not a statistician so the more learned among the readership might cringe at my remedial knowledge on the subject but I believe an r-squared value of 0.72 suggests that the best fit line is somewhat reliable and we only have a couple of teams who fall significantly outside the standard expected range (the two Manchester teams are fairly self explanatory - they have a group of the best players in the league and thus create the best chances and have the best players to convert them. Liverpool being such a disappointing outlier is less clear and is a question for another time). Without wishing to oversimplify things, a team racking up serious shot totals inside the box would be due a bit of attention, so let's take a peek at this year's data (even if it is a worryingly small sample size):

Everton leading the league is somewhat of a surprise but with players like Jelavic, Pienaar and Fellaini very much in the fantasy concious there isn't a great deal to learn here. Instead I want to quickly draw our attention to Wigan, who I have previously dismissed as not offering a great deal of fantasy intrigue. I revised that hasty position last week with the inclusion of Maloney on my personal shortlist but I think this Wigan side still needs a touch more attention.

The Latics' possession numbers have been great - 57% overall (6th ranked), 61% opponent's half (4th) and 62% final third (3rd) - and without forcing a narrative here, a case could be made that Martinez is finally getting this team to play the way he intended. In Shaun Maloney they have a midfielder with some of the best underlying numbers around while McCarthy and Gomez have shown promise when in the side. A lack of goal threat hurts that latter pair so I'd only really consider Maloney for ownership at this point but McCarthy has good assist potential which shouldn't be totally ignored for 5.0m.

Further upfield we have a blossoming strike force with Di Santo in particular coming in with extremely surprising underlying data. His 9 shots rank 7th among all forwards and his 3 shots on target are right in line with league averages from prior year so there are no red flags to suggest these shots won't be converted at a sustainable rate going forward. On the downside (and there has to be a downside for a player costing just 5.5m) he hasn't created a single chance yet in three games so the fantasy points look like they will largely be derived from goals rather than assists. That said, his 21 touches in the penalty area (ranking 5th) suggest a player very much involved in his team's play (rather than just earning chances by springing the offside trap) and so assist chances could develop in the future.

Arouna Kone's profile to date is somewhat the inverse to Di Santo with excellent assist potential (5 chances created) and so-so shot data (though his chance at goals looks better than Di Santo's for assists). Kone has a good scoring profile from across numerous league's in Europe and there's good reason to think he can continue a portion of that success in the Premier League but at 1.0m more than his teammate Di Santo I don't see enough distinction to make the upgrade.

Overall, Wigan's fixtures don't look great, with trips to United, Sunderland, Swansea and Spurs on the horizon. However, Maloney (5.0m) and Di Santo (5.5m) are cheap enough to spot start and home games against Fulham, Everton, West Ham, West Brom and Reading are promising enough to make that a real possibility. Maloney is currently owned by just 1% of managers while Di Santo comes in at 3%. Wigan aren't, and probably won't be, a trendy team and this keeps their fantasy ownership down. If their underlying stats hold up though there's good reason to think this team can climb the scoring charts.

Regressing goals based on shots inside the box
As a final point here I wanted to add a note on the possibility of regressing the current season data using historical conversion rates to see where a team might be getting particularly 'lucky' or 'unlucky' (for want of a better word) and thus might be due to regress to the mean in the coming weeks. In short, a goal was scored for every 5.8 shots taken inside the box last year, with the rate currently sat at 5.6 this season. It's probably too early to begin this kind of testing just yet as the sample sizes for the current year are minuscule, but to give an illustration of what to expect in future weeks, below is the 'regressed' data showing 'expected' goals based on each team's conversion rate and the league average conversion rate.

A positive number in a given column suggests that a team has been 'unlucky' to date and could see an improved rate of goal returns in the future if they continue to generate shots at this rate. A negative number therefore suggests the opposite: a team has converted opportunities at an unusually high rate and thus could be due for some negative regression if everything else remains equal. The three columns are described further below:
  • PY team - current year data is regressed using the conversion rate for a given team last season. This allows us to adjust for the fact that some teams will simply convert chances at a consistently higher rate than others. Some caution needs to be noted where a team's personnel has particularly changed (van Persie's departure from Arsenal would be the classic example here).
  • CY league avg - current year data regressed for each team using the average conversion rate for the league this year.
  • PY league avg - current year data regressed for each team using the average conversion rate for the league last year.
Without placing too much reliance on small sample sizes, I think it's safe to say that Swansea's potent attack will slow down while Chelsea and Fulham need to do more if they are going to continue to score at their current rate. On the positive side, Arsenal and Everton stand out as two teams whose goal totals should rise in the coming weeks, which aligns with what we have seen in their opening games (Arsenal in particular were impressive in the first two weeks without registering a single goal). As I say, we will return to this data in a few weeks when it should be a touch more reliable.


Ste said...

Awesome analysis Chris. You and I are clearly thinking alike. I really don't think the sample size is all that miniscule to be fair.

After 3GW the number of goals scored admittedly is a terrible indicator of true form but I don't believe the first level down of underlying stats (ie shots, shots in box, etc) is all that bad. You can use the opening day 0-0 btw. ARS and SUN as an example for this. Off top of my had, ARS had something like 23 attempts on goal compared to SUN's 4.

Objectively from the score line and in a vacuum you could argue SUN & ARS are equal but the underlying stats tell the real story. I reckon once each team has played 2 games at home and 2 away you can start to draw some conclusions.

Once again, great piece of work, look forward to seeing more soon.

Ste said...

One more thing...
I was looking at shots in box vs. goals scored myself earlier today and saw that the goal:sinbox ratio differs by position:
DEF - 1 : 10
MID - 1 : 5.7
FWD - 1 : 5.4

Mids & Fwds don't differ too much from the 11/12 Lge avg. of 5.8 but I think it's enoughh of a difference to be important.

Jamie McCarthy said...

9 Hey Chris, (genuine long time reader first time comment-er)
I have always had a soft spot for wigan, (so much so that I'm probably the only chelsea fan with a victor moses shirt) This season my search for the new N-zogbia/moses led me to own Beausejour from week one. Due to injurys and deep deployment it hasn't gone well. I'm loving the wigan praise here and have had Maloney on my watchlist ready to move in as his replacement, his penalty duties only sweetening the deal.
I realise I don't really have a question. Um. Good job and keep it up? That'll do instead.

Bryan McKenna said...

Very Insightful Analysis, best FPL source that I check on a daily basis (or close enough( :-)

Gummi said...

Absolutely fantastic!

Thanks for the analysis. I would be interested in a "real" statistician´s take on when the sample size is good enough.

In fact, it´s more the quality of the samples, rather then sample size that controls the validity of the numbers. Having said that, more is never worse (as long as it doesn´t bring other costs).

Bringing this back to football, I though the graphics for Chelsea and Man City were interesting. This should put a bit of a damper on Hazard (a minimal one).

Am I correct in deducing that City´s numbers, however, are indicative of their quality upfront, as PY team looks the best?

Again, thanks for the excellent post.

Pulma said...

This is retarded good man, one of my favorite articles by you so far. Keep up the great work.

Pulma said...

I must add that you do have to consider the quality of the shooter in the box, as this indicator is always likely to stay high for both Liverpool and Arsenal with Suarez and Giroud being depended on for putting the ball in the back of the net.

Kevin Tan said...

Good post, Once again!!!
What do you think about Boyce Vs Maloney?

Mathias Johansson said...

Nice post - liked that last graph. I'm thinking about your first graph, if you remove the top teams (MCI, MUN, ARS, TOT, CHE and LIV for measure) I would say you would get a more horizontal tangent.

Do that really tell us anything valuable or relevant one might argue (it could be seen as obvious)- well perhaps that outside the top teams it's more of a coinflip how the season turns out.

It sounds a bit simplistic to my taste but much of that tangents inclination comes from the big teams so probably a measure of truth.