Wednesday, August 27, 2014

When to believe your eyes (or at least the data table)

When trying to obtain data for use in any kind of forecast you are faced with any number of questions with varying degrees of complication. Is the data from a reliable source? Do we have enough of it? How should it be interpreted? When is it stable enough to be relied upon?

That latter question is where this post will focus. Generally, more data is better than less data. Nate Silver fans’ ears will prick up at that simplistic statement as his excellent book The Signal and the Noise is full of instances where too much data can cloud our judgement, but for our purposes let’s say that when trying to judge the quality of a team you’d prefer to have data for 10 games rather than 5 (data from every game in Liverpool history would start to be too ‘noisy’ as Bill Shankly or Ian Rush have little bearing on the current crop of players).

After two games last season, Everton had amassed 42 shots (26 SiB) giving them a crazy 21(13) average. While the team played well the rest of the way, their totals of 14 shots and 8 SiB were considerably below that initial surge, which could have led to a couple of panic buys as managers sought to get 'coverage' of 'must own' teams. Looking at this season to date, what should we make of West Ham's 35 shots (22 SiB) or Swansea's 15(7) efforts?

More learned statisticians will likely be able to analyse this question with more certainty and skill, but for our purposes, we are just looking for a quick guideline as to when we can believe what the data is showing us.

For simplicity, I have simply plotted each teams’ average shot totals (both in total and those only in the box) for the season against the rolling average on a gameweek-by-gameweek basis. These lines will obviously converge as the season progresses but the speed at which this happens is less obvious. The data is plotted below with some quick analysis below the chart:

By GW6, of the forty team/location pairs (20 teams each at home/away), 34 see their rolling average within just two shots of their final season total. Thus if at that the point in the season a team had an average shot total of 10, we’d expect with some certainty that they would finish the season with between 8-12. The only notable departures were Sunderland at home, who fell from three strong performances (strangely including Arsenal and Liverpool) and a 21 shot average to just 14 on the season and then Liverpool at home, who improved throughout the year, taking their 15 shot average through GW6 to 21 by the time they fell just short of a title bid.

It's dangerous to draw too many conclusions through six weeks, especially when further splitting the data in home/away games but ultimately we can't wait until we're absolutely sure (if that day ever even arrives) as decisions on transfers need to be made sooner rather than later. Still, six weeks feels like a good benchmark to start taking things a bit more seriously and putting some weight behind any big revisions to impressions you had coming into the year. From memory of Silver's aforementioned book, I think this is something akin to Bayesian inference, where our initial hypothesis should be impacted by new data but to varying degrees based on how strong our initial opinion was. Thus, if you loved Alexis Sanchez coming into the season, his somewhat disappointing three shots in two games should move the needle less than David Nugent's zero SiB, as you were probably less sure on the Leicester man's prospects initially (though even there, two weeks is probably too early to panic unless you've seen any real issues with his or Leicester's gameplan).

After being away for a few months I'm sure everyone is thrilled to read a piece which basically tells you what you already knew, but hey, I had limited data to play with while on a recent flight and this is what I managed to cobble together. This also ties in well with my plan to launch the new graphics and forecast tables right around the GW6 mark. Next up is some actual analysis of the new season. 

Tuesday, August 19, 2014

The more things change . . .

Personally, I've just wrapped up one of the best and almost certainly most important years of my life, having got married, travelled to four continents, finally got a new job and bought a house. For this blog though, the results have been less promising. I considered charting the quality of content here with my life developments or perhaps dousing the fire of my own work but given that I'm still travelling I'll stick to simple words for now.

Long story short, I've had priorities which have trumped this blog which meant that (a) the weekly content has suffered (and stopped at the end of the last season) and (b) I totally ignored this year's pre season activities. My policy has always been to only post things which are worth reading so I didn't ever want to mail anything in with out of date data or banal narratives. I wasn't sure I could make a quality preseason guide so I didn't and I wasn't sure if I'd even be back for this year. Now it's started though, I got that familiar buzz on opening weekend - even if my team was assembled the night before - and so I've come to the conclusion I'm not yet ready to walk away.

There will be a couple of changes though. First, I'm hoping to move to a more 'graphic' based site which I'll hopefully host on a new site that allows for a bit more flexibility. Second, I won't aim to put out weekly lineup lessons or 'fanning the flames' pieces which take up masses of time and in all honestly become repetitive for you to read and me to write (no, you shouldn't buy the 19 year old right back who played once but scored with his only shot of his life). I will however continue to post written pieces where a particular player needs attention or where a new concept/trend arises.

There are a lot of good sites around which cover player fitness, team news and what I'll call 'standard' reporting and while I've never tried to offer great depth in those areas, I'm abandoning that area entirely now. I know less about the weekly ups and downs of football than most of you probably do as I'm simply not plugged into it 24/7 thanks to living in Canada. I no longer default to Sky Sports News as my background noise and I don't discuss Rooney's hamstring in the elevator at work anymore. The problem with this approach is that when data tables show Stevan Jovetic as the best forecasted player for a given week despite knowing that there's a 99% chance he won't play, many people get confused/annoyed and complain. You can't please everyone though and there likely won't be comments on the new site anyway (I'm always on Twitter though for any fairer comments or queries).

So the plan for the next couple of weeks is to get the new graphics completed and launch the new site. That should nicely coincide with the time when we have some somewhat useful data (~GW5). In the mean time, I'll start getting back into the swing of things by highlighting some promising new players and offering caution to those whose early success looks unsustainable (basically a prolonged fanning the flames piece).

I've just realised that given my absence there could be no one reading this but if you are, thanks for sticking with me and I hope and I can reward that loyalty with a couple of useful tips in the coming season.