Saturday, September 26, 2015

Players' share of team totals

It wasn't really my intention to roll out pieces of the model in various stages but I've been a bit slower than I hoped in finalising this year's version so wanted to at least present the different pieces as they're available. We first looked at the team +/- which gives an indication of how a team might perform in future weeks beyond a simply shots/game type metric which fails to adjust for strength of schedule. Next up is to look at the players' share of their team totals, which will help turn the forecast team data into something we can use for individuals.

This isn't a complex calculation, but a couple of points are worth noting:
  1. The calculation excludes any games the player misses and only uses team data from games they appear in. It isn't, therefore, the same as simply looking at player's shots to date for the season divided by his team's total. 
  2. I do not make an adjustment for minutes played, so players who make a lot of substitute appearances will suffer a clouded picture. If, for example, a player comes on for 10 minutes and registers his team's only shot inside the box but the team had four before he came on, his percentage for the day will be registered as 20% even though in reality it should be 100%. This is an unfortunate drawback of relying on aggregated data which doesn't include sufficient data tags to identify when individual events happened.
With those caveats in mind, here is a quick visualisation of player's share of team totals to date. 

Thursday, September 17, 2015

Fanning the Flames, Dousing the Fire Gameweek 5

I was away last week visiting beautiful Whistler (highly recommended to all) but we're now back on track with this week's Fanning the Flames piece. We're not too far removed from the Gameweek 3 piece so there's some familiar names and narratives here, but let's press on regardless.

Alexis Sanchez continues to produce shots at an alarming rate but simply cannot catch a break. The Chilean's 28 total shots and 21 SiB lead the league and come close to doubling the totals of his fellow midfielders. His production is a microcosm for Arsenal's "struggles" to date with the team leading the league in SiB by a distance (ARS 79, SOT 59, MCI 51) yet only having five goals to show for their efforts. Sanchez has had a large amount of his efforts blocked which is somewhat tricky to explain, and a quick look at last year's stats tells a similar, if less dramatic story (42/121). Last year also tells us that he was able to convert 13 of his 71 SiB into goals and so there's really no reason to expect his conversion rates to be particularly unusual. With seven SoT we'd expect him to have at least a couple of goals by now and we can be fairly comfortable that continued play like this will lead to such rewards in the future. Indeed with just 13% ownership, and a decent run of fixtures on deck after a trip to Stamford Bridge, he seems like a strong buy candidate.

Harry Kane has been somewhat maligned this year with some "one year wonder" whispers creeping into the darkest depths of internet comment sections (partly reflected in his early price drop). In truth his numbers are relatively good to date and the sticking point should be his price tag rather than his performances to date. If you liked him at 9.5m to start the year I haven't seen much to change that view, but I'd still personally find it hard to justify that investment over the likes of Lukaku, Benteke or Giroud.

It's a somewhat similar story with Wayne Rooney who's seen a lot of divestment already, although partly encouraged by his injury status of course. His data is fine and doesn't do much to alter ones opinion, but again, I didn't like him over the aforementioned players to start the year and the production doesn't sway me the other way either.

I don't think anyone is seriously fooled by Steven Naismith's haul this week so we'll pass on him but the next name is one that has piqued some interest. Dimitri Payet's ownership is up to 10% which is quite significant for a largely unknown asset with a mid-level price tag. If you see his shot line of 8 total, 4 SiB, 3 SoT and 3 goals and believe that warrants a 7.6m investment then I have a sizeable plot of Florida swamp lamp you might be interested in. Now, some of this investment may be based on his performances on the field (I haven't seen a full West Ham game yet this term) and he does bring a healthy assist threat to the party (16 created chances) so it's not that there isn't hope here, I'm just not sure he's all that more than a promising lottery ticket, which could also be said of someone like Xherdan Shaqiri who's produced a similar shot line (7/4/1) in just over half as many minutes as Payet and comes at a lower cost and significantly lower ownership (3%). The standard line applies here: buy if you like the player but there's not enough here if you're just chasing the production.

There's a trio of surprise success stories with Bafetimbi Gomis, Riyad Mahrez and Callum Wilson enjoying explosive starts to the year. Those all appear on the list, the real story is not the gap between their xG and actual goals, but the fact that their xG numbers are terrific. At 3.0, 2.9 and 2.7 respectively, this trio have numbers which can go toe-to-toe with anyone in the league right now, and yet all come in a bargain price. Yes, everyone is on Mahrez already (43%) with Gomis (28%) and Wilson (26%) not too far behind but these ownership rates are not only justified but likely too low. These aren't just cheap players getting a but lucky, they are performing at an elite level and the kind of flexibility you get by installing a ~6.0m players who can start every single week is huge.

I think we can safely conclude that Arsenal are due for some better luck with no less than five players making an appearance here, having collectively "earned" an expected 5.8 assists but having actually been rewarded with just the one. Other than that quintet not too much stands out here.

David Silva has played well this year and has widely been singled out as the lynchpin in this impressive City side, and while that may be true, from a fantasy perspective his totals aren't quite as convincing. He's never been one for huge goal threat numbers (8 shots, 4 SiB and 2 SoT) but his assist totals are supposed to compensate for any deficit here yet 12 created chances place him alongside the likes of Nathan Redmond and Jose Jurado. Of course, we can reasonably suggest that Silva's created chances will likely be converted at a higher rate than that pair thanks to presence of the likes of Sergio Aguero but when you consider than Yaya Toure has the same CC to date and Navas is just behind with 10, it starts to look like Silva is a touch overvalued (it should be noted he has played a game less than Yaya, which given the small samples here is not insigificant). Silva could well have a huge season and he remains one of the best in the league, but these numbers should give some pause before writing him in for a historic assists haul.

People seem to have avoided the Wes Hoolahan bandwagon as yet but the Marc Albrighton express is getting somewhat crowded. The Leicester man has created 10 chances and earned 3 assists for a 30% A/CC rate. Even if you believe Leicester are for real and will continue to create chances at a healthy clip, the rest of the team except Albrighton are seeing their chances converted at a 19% rate; significantly lower than the former Villa man. Given his profile as wide player throwing balls into the box, it's tough to think they are subject to a better conversion rate than a carefully crafted through ball.

Wednesday, September 9, 2015

Which stats should we focus on?

With the proliferation of Opta and multiple news sources starting to dip their proverbial toes into the world of statistical analysis, casual fans have access to a greater depth of data than at any point before. Converting that data into useful information therefore comes more and more into focus and that’s why we need to review any proposed “advanced” metrics to ensure they remain relevant and as accurate as possible (all while acknowledging we are a long way from even touching the kind of analytics that are prevalent in other sports).

Before we go on it should also be noted that any reference to terms like “advanced” should be taken lightly. By “advanced” I mean, slightly more useful than looking at the “goals scored” chart and assuming that the past explains the future. I am not a stats professor nor even a student and more complex models surely exist which might shave a point or two off the margin of error from the analysis in these electronic pages. However, I believe the outputs are good enough to raise warning flags and give clues of over and undervalued assets, and that seems sufficient to warrant committing these words and numbers to my little corner of the Internet.

Which stats to look at
One of the challenges of the increased availability of data is which particular stats do we pay attention to? If we’re trying to forecast individual goals do we want to know how many total shots they have or just those on target? Do we care how often they play? How about their involvement throughout the game, using total passes or touches as a proxy?

The simple way I approach this is to look at the correlation between the individual stats and the goals scored on an individual level. It isn’t perfect, of course, and one could look to exclude players who didn’t play enough or who are obviously not fantasy targets (holding midfielders) but for simplicity, my sample is the entire population of players from the prior season. The obvious starting point is shots, based on the motivational-poster-inspired adage of players missing every shot they don’t take. Those results throw up some predictable but not irrelevant notes:

Total shots 86% correlation to goals
Shots outside the box 58%
Shots inside the box 88%
Shots on target 90%

Unsurprisingly, if you take a lot of shots you have more chance of scoring, and those odds increase if your shots are (a) closer to goal and (b) hit the target. These points sound obvious when written down but they bear repeating to avoid the potential pitfall of equating the value of a long range effort with those of a short range attempt. The final point to note here is that while shots on target show the best correlation, that event also occurs the least and thus can take a little longer to stabilize. For example, if we calculate the SoT and SiB per 90 minutes for players through four gameweeks of last season and compare the results with the final stats in those categories, we see a 50% correlation for SoT but a 65% for SiB. In other words, after four weeks we have a more solid grasp on how many shots inside a box a player will accrue versus his shots on target and thus forecast models may be a little more reliable. This is just one way of looking at the issue, of course, but it’s how I’ve cut the data up and represents one of the reasons I tend to gravitate towards SiB when making quick decisions about players, particularly in these early stretches of the season.

After shots, what else might equate well with goals? More playing time? Increased involvement during games? Let’s take a look:

Minutes played 43%
Total passes completed 10%
Touches in the penalty box 82%

The first number is a bit misleading as you obviously wouldn’t want to completely ignore playing time when selecting your team (Leo Messi is unlikely to rack up many Premier League fantasy points despite his ability as he will log exactly zero minutes this season). The reason I included that number is to highlight that just because a player is playing, doesn’t mean they will have success. We often see debate about a player being “locked” into a starting lineup or suggestions that being captain guarantees him a starting spot and thus success. These factors don’t necessarily forecast glory for the given player.

As for involvement we can see that it’s both vital and essentially useless, depending on the type of action a player is involved with. As with most of this rudimentary analysis, I think we all know that the kind of touches the Michael Carricks and Clause Makeles of the world produce are likely unlinked to goals (at least in terms of directly scoring them) but yet we still often see the nice “action zone” maps highlighted on blogs or the media as evidence that a player is somehow “running” his team and therefore likely to share in the spoils. It should go without saying: context it key.

Touches in the penalty box is a very nice stat as it combines a good overall correlation with goals (82%) while also stabilizing very quickly. Indeed, if we compare the average number of touches in the penalty box players accumulated after just four weeks of last season with their final season totals we see a very promising correlation of 73%. In other words, after just a handful of games we can already identify a stat in which we have relative confidence that it will not only lead to goals but will also continue to be generated at a reasonably consistent pace. For the record, a players actual goals scored after the same number of weeks correlates just 44% with his future goal haul.

I always tend to dwell on goals analysis for two reasons. First, goals are the dominant part of the game and still dictate much of our fantasy success (few non-defenders can be productive without at least some contribution in the goals column). The second reason is that the picture with assists is much clearer thanks to Opta’s extremely useful – if potentially subjective – chances created stat. Last year saw a 95% correlation between that number and eventual assists, so there’s little reason to really push too much further into total passes, crosses or any other number of potential stats. There could well be value in looking as to what type of created chance leads to more goals, or whether having better players around you leads to a better conversion rate, but for now, we can just note that to find assists, just chase the players creating the chances (again, not rocket science).

I hope this served as something of a primer for the season and a warning for new readers (or reminder for old readers) that the main conclusion is that this is all just the ramblings of a crazy guy with a spreadsheet.

Thursday, September 3, 2015

Team Plus-Minus Page

If you direct your attention to the navigation bar you will note that there is now a new link to the team plus-minus page. I've added a brief summary below but it's a pretty simple principle. One word of caution: the data is based solely on 2015-16 data so the sample sizes are ludicrously small. That said, these numbers (particularly with penalty box touches and total shots) do stabilise relatively quickly so it's worth at least having a glance at these numbers in the coming weeks if you are worried that your initial assessment of a team is a bit off (are Chelsea really struggling? Are Leicester for real?).

Plus-minus (+/-) is a very simple metric which simply tries to adjust stats to put them in a context of a team's opponents to date. For example, we might note that two teams have each registered five shots on goal per game and conclude they are of equal ability but if one team did it against Man City and Everton while the other did it against Norwich and Sunderland we would want to adjust our expectations accordingly. As with many "advanced" stats (and I'm using that term lightly here), the goal is simply to explain a concept we are all comfortable and familiar with but often ignore when looking at numbers, hence falling into traps of being overly influenced by stats which don't tell the whole story.

The calculation itself is very simple. Let's take Palace at home for this season to date (through GW4) as a quick example. More specifically, let's look at their Shots on Target plus-minus (hSoT +/-). Their two appearances at Selhurst Park can be summarised as below:

GW2 vs ARS, 4 SoT, Arsenal concede on average 2 SoT away from home, SoT +/- would be 100% (4-2)/2
GW3 vs AVL, 6 SoT, Villa concede on average 4 SoT away from home, SOT +/- would be 50% (6-4)/4
The teams total hSoT +/- is a simple average of the games to date, so 75%

A quick word on the stats used. I've selected the four listed stats - penalty box touches, total shots, shots inside the box and shots on target - because they have shown the greatest correlation to goals scored over past seasons. Perhaps I'll do a separate post on that, though there isn't much too it. The correlation I speak of is very rudimentary and might make more learned statisticians cringe as it's closer to "draw a line of best fit in science class" than anything Nate Silver might put together but I think it does the trick to highlight the key stats.

One key to keep in mind is that while the correlation strength over a season goes from SoT, SiB, Shots and Penalty Box Touches, that's also a reverse order of the frequency with which those events happen and that can lead to wilder swings in the early part of the season. I haven't tried to pin down exactly when those numbers stabilize but I have found in the past that something like SoT can take a bit longer so exercise some caution there.

I will add the same numbers for defensive stats soon.