Friday, October 5, 2012

Gameweek 7 Preview

Clean Sheet Rankings

Captain Data
I've made a change to the captain stats this week, to account for the shot data we have, as discussed in previous posts. In the interest of time I won't go into the details here, but will add more backup over the weekend.

I imagine the main question will be Steve Fletcher's ranking and he has indeed presented a tricky issue for the current data setup. Cr% is essentially a team's forecast goals for a week multiplied by the share of goals an individual player accounts for. For Fletcher that is 100% which is clearly unsustainable but how do we adjust for that? The new xP90 metric regresses data to average shot rates and hence you will see Fletcher do worse there, but that's not simple when looking at things for a team. If Fletcher didn't score against Wigan this week, there's no telling what would have happened so you can't just say he 'should' have scored two goals so far and thus only accounts for 40% of Sunderland goals.

So, I'm asking readers to suggest ideas how to get around this. Luckily, it's a problem that will diminish with time but for now we get the somewhat odd situation where the metric predicts Fletcher to score 0.9 goals this week, calculated simply as 0.9 x 100%. Clearly these stats are a work in progress.

We've had some good discussions on the earlier piece of stabilising data so I encourage everyone to get involved in the comments, @plfantasy or on Facebook.


Bryan McKenna said...

Just to throw a wee curveball in, not that id expect you'd to include it in the analysis, but anyone considering the captaincy should also consider form in all comps, i.e. Europe.

All things told, i'm tempted to nudge Suarez ahead of Bale/Ba/Cazorla this week (on my own team). Has the form, stats and did not play too much midweek. Regression will get Ba at some point (I have him, but reluctant to armband)

Pulma said...

@Brian McKenna
I did the numbers on BA on his first half of last season where he was playing as the main striker and he took 3.2 shots per game(3.3) now and scored with every 4th shot on average, 3.3 this season. So even with me being bullish on him as long as he keeps his main striker spot there will be a return to his averages at some point and vs United and Sunderland will very likely be that time IMO so I would keep the armband off of him. Suarez or Bale surely better options.

@shots_on_target said...

Hi Chris, good going. To get around the Flecther problem, the opposite of last year's Suarez probelme :), how about looking at each player's share of attacking play, weighting % shot in box and % SoT highly, with a lower weighting for total shots. For assists, you could factor in % KP and bias this with %PenBox Tches/Final 3rd touches to get a measure of whether they are creating chances near the box or not. This is how I've tried to approach it on my site. What do you think?

AS for your attacking rankings I have the same top 5 as you but in slightly different order 1)Che 2) Swa 3) Sot 4) City 5) WBA, although the latter 4 are all much the same. It's good news that we have failrly similr forecasts and going forward I think it would be great if we could see where we all agree/differ and combine best approach.

Mitchell Stirling said...

I was worried I was going to have a bit of a nightmare last week, firstly I was on holiday which meant I feared not being able to make changes and so on. I had to finally get rid of De Gea and by doing so had funds to remove Lescott for Baines for 4 points. Then after Berbatov's injury I was forced to wildcard a couple of weeks earlier than I'd have licked (I like to use the October or November international breaks to get some extra funds if possible)

I ended up with 58 points in the end and am back in the top 100k again* mainly as I happened upon Suarez last week but only vice-captained him.

My question is, should I look to get rid of Suarez after the next two games? Even if doesn't give me two 2 pointers coming up, with his conversation rate what it is should I consider that last week was good luck and move on as soon as I can, maybe to Adebayor once he has started a couple of games?

My team is currently Begovic, Foster / Baines, Cole (CHE), Davies (SWA), Hughes, McAuley / Bale, Ben Arfa, Piennaar, S. Carzola, Snodgrass / Lambert, Podolski, Suarez

Are more people looking to have a 9-8-6 split up front rather than 11-7-6 (or 13-7-6) and keeping 2m for midfield like I have?

*Is it me or does the competition seem harder? I'm averaging almost 60 points a week on target for almost 2200 by the end of the season but don't seem to be top of mini-leagues I was last year at a lower average score. Anyone else found this?

Chris Glover said...

Bryan - 'form' is tricky. I did a very very basic study last year which looked at the % of times a player scores having scored in the previous week. The conclusion was that the prior week's result gave little or no predictive value for the next. Of course, there is more to it than this and we'd need to adjust for all kinds of things like strength of opponent, good saves, total chances etc, but overall I think 'form' is measuring the result rather than the process. Form would, for example, put Flecther at the very top of the rankings this week which is surely not right?

Pulma - good points. Ba is one of the key players coming out of ,my shot regression analysis who is set to regress in the coming weeks, all else being equal. I'll post that full list over the weekend.

Shots_on_target - good to hear our rankings are similar on the teams. This is one area where I feel my data still needs a lot of work but it's generally been ok. As for the Fletcher problem, do we all think % of goals scored/assisted is even a valuable measure? I use it because I found it the easiest way to account for strength of schedule this week, but I like your idea from an earlier post of looking at how many shots a defense gives up and adusting accordingly. So instead of % of goals and assists we'd stick to % of shots and key passes which are more predictive, right? I like this idea a lot.

Bryan McKenna said...

Cheers for replies Pulma and Chris, always informative replies here :-)

True Chris in the case of Fletcher, can't really argue if you've done your homework on whether form really exists.

One way of looking at it, may be underlying stats in all competitive comps? Id guess those stats would just reinforce the current one's for the league though.

Bale played 90 last night, fit lad but it's a very slight concern. He will start, but not sure of the full 90.

John Doe, 2008 said...

"As for the Fletcher problem, do we all think % of goals scored/assisted is even a valuable measure?"

Personally, I don't. I look at the percentage of a players involvement in activities that typically beget scoring (penalty touches, SOG, etc.). I feel this provides a much larger sample size and helps us better understand that contributions of an individual to a given team.

As an American, I am fond of, among other sports, baseball. Baseball has a vast and mature statistical community. Interestingly, on one such community, there have been numerous threads on the challenges of statistically modeling soccer, primarily because a scoring event occurs so infrequently (there was even a presentation on the topic in a national sabermetric conference). Seems most have identified the problem but few have publicly solved it-although I am certain the big clubs have proprietary modeling in place.

Cool stuff sabermetrics is.

Royce said...

To echo John Doe's comment, I agree the percentage of goals scored/assisted is not very useful. You'd rather look at the opportunities for scoring rather than the scores themselves.

There is a similar concept tracked in the NBA called Usage Rate, which attempts to measure the ratio of his team's offense that a player uses. The shooting number used for the stat is Field Goal Attempts (not Makes).

Chris Glover said...

John and Royce - I got into 'analytics' in football too because of baseball and a lot of the ideas I try and use and generally adapted from existing concepts in baseball. Basketball analytics would probably be most useful for football given the nature of the game, but I (a) don't watch much basketball and (b) I believe a lot of their stuff is proprietary so I haven't been able to apply it much here. I might try and look into it though.

As for the % created stat, I'm with you guys and I'm ready to eliminate or at least significantly reduce it's impact on the weekly rankings. I'm going to do a separate thread on this over the weekend where I welcome all these helpful suggestions.

I like the idea of trying to forecast the number of shots a team will have in a given week, then looking at the % of these player x tends to get, or create. We could then either apply an individual's conversion rate for their career, or even for this season, or just use a league average where we don't know much about a player.

So we'd get something like (fake numbers): City average 12 shots in the box a game at home, Sunderland give up up 14 on the road, so we'll forecast 13 for the game (not sure if that's overly simplistic, but stick with me for now). We can then say that Aguero accounts for 25% of City's shots so we'd forecast him to get 4 shots for the day. If historically he converts at a 40% rate we'd give him 1.6 goals for the week. If we were to use the league average rate of 33% we'd give him 1.25. The immediate thoughts I have on this are:
- how do split the difference between City's 12 shots for and Sunderland's 14 against. A simple 50:50 average? Ideally we'd do analysis as to which one tends to be predictive but I'm not sure how much impact this will have.
- are shots in the box the best stat? I think shots on target correlates slightly better but I would venture that shots in the box are more predictable week-to-week and I like the slightly large samples.
- career conversion rates for players sound great but I only have Opta data for last season and this one. We can get shot data from ESPN but as far as I can see you'd need to do that on a player by player basis which will be very time consuming. That's okay for captain picks but if we wanted to expand the model for all players it could be an issue. We also have the question of what to do with (a) conversion rate for other leagues and (b) conversion rates in the Prem but on a different team.

As I said, I'll do a large (rambling) post on this over the weekend, but hopefully the above will get the juices flowing for now. Thanks again for all the input this past week.