Summary by Expected Threat (xT). Important details about transition matrix. The best Europe players in TOP5 leagues by xT.

Mikhail Borodastov
18 min readJan 6, 2024

--

There are 3 ways to estimate efficiency of attacking actions in the build-up phase on the pitch.

  • metrics are based on xG: xGChain, xGBuildup
  • metrics are based on Markov Chains (Markov processes): xT
  • metrics are based on ML (Machine learning): VAEP, OBV, PV, g+, EPV

Today we will overlook first two types on specific examples. We will highlight strengths and weaknesses of these approaches.

Here we go.

  1. xG-based metrics

xGChain (xGC) — the sum of xG values for all possession chains during a game which were ending with a shot. This estimation is calculated for specific player.

In other word, we can say that result of each attack in the form of final xG is transmitted on all players involved in a combination. And then for each player all attacks are summed up. Finally we get xGChain by the specific game.

Below you can see possession chain and marking actions using the xGChain metric within one of Barcelona’s possession in the game against Real Madrid on 10-28-2023.

Showcased possession chain consists of 6 actions: Gündogan pass to Yamal, carry and subsequent pass of the Spaniard to Lewandowski, Robert’s pass to Rafinha, back pass to the Pole and final shot.

Next for each player all xGChain for all possessions are summed up and final estimation is obtained. This metic is available for free on understat.com.

Usually xGChain are estimated just for possession chains for open-play phase. It means that possession chains after set pieces are not considered. The understat.com mentioned above provide exactly xGChain OP although this is not explicitly stated anywhere.

At the end of the El Clasico the players got following scores by xGChain.

There is clear imbalance by xGChain among Real Madrid players. The reason is long Madrid’s attack in substitute time ending with score by Jude Bellingham. Let’s have a look on corresponding possession chain.

During that possession Real made 13 passes and one crucial Modric’s ball touch, which is pass as well by its nature. There were 9 players involved in possession and each oh them got xGChain = 0.57.

We can differently try to evaluate importance and contribution of each player in scored goal, but one thing seems absolutely clear — Rudiger and especially Alaba assessments should not be the same as Modric (number 10). First two players made by one pass without pressure in the middle of the pitch, and Luka twice changed the direction of attack via pass after carrying and assisted Jude Bellingham.

In addition to xGChain there is one more metric based on xG which you can also find on understat.comxGBuildup. This estimation is calculated almost the same as xGChain just excluding two final actions within a possession (assist and shot).

Jude made just one shot in our example and his contribution in buildup was zero (xGBuildup = 0). Luka got nothing for his assist, but there were another actions during the attack, and his xGBuildup stay the same as xGChain (xGBuildup = 0.57).

Undoubtedly both considered metrics help in some way to evaluate contribution players in creating threat at the opponent’s goal. But both examples with possession chains showcase limitation of these metrics:

  • only actions in possessions that end with a shot are scored
  • all players within a possession get the same assessments

2. Expected Threat

Let’s move on to consider xT. This metric estimates changing probability to score goal after moving a ball between two points of football pitch (in next N several actions). Expected Threat is used to apart or joint assess of passes and carries.

By summing up the values of this metric during the match, you can evaluate the contribution of each player to the potential threat created at the opponent’s goal. Also it might be summed up for all players and then get final value for each team.

There are two fundamental distinguish to previous metrics following from the definition above:

  • xT allows to evaluate any action that moves the ball on the football pitch regardless of whether the possession will be finished with a shot or not
  • xT provides independent scores for each action (Alaba and Rudiger will not get extra high assessments for insignificant contribution within Real’s possession in the example above and vice versa in the possession with low final xG there will be opportunity to get high scores for actions which might increase threat in front of opponents goal)

If to simplify, we can consider xT as assessment of changing xG of potential shot which might have been taken (after N actions) when moving the ball from A point to B.

One of the possible way to use xT is plotting the graph where is showed cumulative sum of xT values for all players within each team during the game.

The basis for xT is transition matrix. This matrix is calculated on statistic of shot, goals and actions moving the ball (passes and carries) for some historical period.

Number of subsequent actions after target action which is evaluated is important detail in the process of preparing transition matrix. Obtained xT value is provided just for estimation probability to score the goal after specific number of following actions.

Another key detail is dimension of transition matrix which is used for estimation value of metric.

Without information about matrix which was used for providing xT assessments there is often impossible to correctly compare the obtained xT values. Below we will consider some examples.

On the space of internet you can find following transition matrices:

Original Karun’s transition matrix

Karun Singh is a data scientist working for Arsenal Football Club and author of expected threat model. Link on original matrix is attached in his twitter. But there is no details about number of actions which was used for this matrix.

This matrix was calculated on the statistic of EPL 2017–2018. Each cell of matrix is xT value for specific zone or another words it’s probability to score the goal being in each specific zone with the ball after several N actions.

To get the xT value for any pass or movement of the ball from point A to point B, you simply need to subtract from the xT of the final area — xT of the initial area. For example, if a successful pass was made from the corner area to the penalty area, then the passing player will receive 0.11–0.04 = 0.07 xT for such an action, etc.

Below is shown another Karun’s matrix, which is provided in the main article with first showcasing of xT metric.

This matrix has a high granularity and was obtained by estimating the probability of scoring a goal after 5 subsequent actions. Using this matrix to estimate xT for the pass from the right corner flag area to the penalty area, which is slightly closer to the corner flag, we obtain an estimate of xT = 0.176 — 0.068 = 0.108.

The resulting value can be clearly interpreted as follows: as a result of the pass, the probability of scoring a goal in the next 5 actions increased by 10.8 percent.

But for simplicity of perception and explanation to wide public of course we can consider this metric as indicator of changing probability to score the goal as the result of moving the ball via pass or carry. The number of subsequent actions is important on the stage of preparing transition matrix.

You can see the “work” of the metric more clearly in the original Karun’s article in a interactive mode. You can change the number of actions for which the transition matrix is calculated, and observe the change in probabilities for different areas of the football field.

If you move the slider from 5 to lower values, you will see that the xT values in the close area around the goal change slightly, but in the rest of the field more significantly.

If we fix the number of actions equal to 3 and consider a similar situation of passing from a corner to the penalty area, then the changing in the probability of scoring a goal as a result of a pass will increase and will be equal to 0.133 or 13.3%.

All of the above indicate that it is important to point out which transition matrix was used when providing estimates for the xT metric. We see that the same pass for a transition matrix calculated on the basis of the same data, for different values of granularity and the number of subsequent actions, receives different estimates according to the xT metric. (7%, 10.8% and 13.3%).

Twelve transition matrix

Below is one more transition matrix from @jernejfl (Data Scientist at Twelve football). This matrix has even more higher dimension which is 21 on 17 cells.

There are several videos on YouTube where it is suggested to use this particular matrix to evaluate xT. However, I have not found any references or descriptions of the details of preparing the presented matrix anywhere.

In the video below, the author of the channel does not go into details and simply points out that statistics from a very large number of matches were used to build this matrix and that each cell is the probability of scoring a goal from the corresponding position.

If you compare similar areas in the matrix from the first author and the matrix from twelve, you will see that in the second one the cells have lower values.

You can also once again note that increasing the granularity leads to the fact that the cell from the previous example inside the penalty area with a value of 0.176 for the matrix (5 actions) splits into two cells on the matrix from twelve (0.068 and 0.0143). And a new xT assessments are obtained when passing from the corner flag may differ in 2 times, depending on which of the two more granular zones the coordinates of the pass fall into.

The Athletic transition matrix

There is the Athletic’s article from 2021, where was published one more transition matrix built on 3 EPL season (2018–2019, 2019–2020, 2020–2021).

The author provides a number of ratings based on the obtained xT metric, in particular, he offers for consideration the rating of the best players in the Premier League according to the xT metric based on 90 minutes of the game for passes and carries the ball.

Separate rating for the most threatening passers. (only xT for passes)

And one more for the best carriers by xT.

I decided to create new transition matrix and to use the Athletic’s ratings for validation the way I will calculate final xT value for players (self-checking).

Important details

The use of the previously discussed matrices is not entirely correct for the following reasons:

  • Karun’s matrices were created on the base of EPL’s data for one season ( it was supposed to be outdated)
  • There are no details about Twelve matrix (how many seasons, what kind of leagues were used). But it was published at 2021 and it seems that this matrix also is outdated.

I have no data about dynamic of changes from season to season with the statistic accumulation, but it seems that patterns of shots and movements on the pitch might be changed over time. There was interesting article on MIT Sloan Sports Analytics 2021 where was shown dynamic of changing for average shot distance in EPL for the 6 years interval. And we know that coordinates and shot statistic directly influence on calculated xT for each specific zone. It means that transition matrix should evolve as well.

Also it is important to keep in mind that different authors may use different inputs data (data providers: StatsBomb, Opta, Wyscout). Data might be preprocessed differently as well. All this leads to the fact that final result matrices will differ from each other.

For illustrative purposes let’s compare two matrices with equal dimension from Karun and the Athletic.

The same comparison only in relative quantities. Each cell contains a value that corresponds to the % by which the values in the original Karun’s matrix are greater or less than similar values in the Athletic matrix.

This example showcases that it is incorrect to compare xT metrics from different authors if they used different matrix.

Validating the way of calculating xT for passes and caries

Let’s take the Athletic matrix and try to calculate the same ratings based on Opta event data.

I got the data from whoscored.com, but unfortunately there is no carries type of events. Without this type of data it’s not possible to create complete transition matrix.

However, as you might see on the examples with possession chains, there were carries as well. These events were added by myself, by interpolating values between two subsequent events for which there was a gap in coordinate values (if you had full access to event data, you would not have to do this).

Accordingly, a similar procedure was performed for the entire dataset. As a result, it was possible to “restore” carries for all players.

Further, I selected only actions which move the ball: passes, carries and take-ons, performed in the open play. Then filtered successful ones. Plus, all crosses were removed.

The last is necessary in order to eliminate a strong bias of xT values for full-backs and wingers due to the higher number of crosses into the opponent’s penalty area from medium and long distances (when compared with other players).

All of the listed stages of data preprocessing and filtering are described in the Athletic’s article.

The next step is to superimpose all the completed actions on the transition matrix. We subtract the initial zone from the end zone and obtain the final xT values.

Also I calculated full-time for all games during the 2020–21 EPL which includes stoppage time as well. It’s important because these kind of sources as fbref.com and understat.com provide total played minutes without extra time. That leads to additional 1–2 “played games” during the season (for these season in EPL it could be more then 3–4 games) for a lot of players. This in turn is reflected in the metrics adjusted to 90 minutes. Therefore, it is not entirely correct to use time from these sources.

Finally we normalize the obtained xT values for 90 played minutes, preliminarily leaving only those players who had more than 900 minutes on the field. The results obtained are shown below.

Comparison with original rankings from the Athletic.

Ratings are very similar at whole. Almost all 20 players from the Athletic’s rating are presented in the new one. The one exeption is Granit Xhaka, which was substituted by Rodrigo (Manchester City) in my rating. Also we can note some difference in the final place of Matic, who moved from the middle of the ranking to the bottom. Moreover, there is a slight shift of 1–2 hundredths xT between the given lists.

Now let’s have a look the same rating for carries.

Comparison with original rankings from the Athletic.

Here the obtained results are slightly less consistent with the original rating. First of all we couldn’t clearly point out Sterling from other players. Raheem got second place behind Grealish. Instead of Curtis Jones and Aaron Connolly, Raphinha and Pedro Neto appeared in our rankings. It can also be noted that some players ended up in slightly different positions within the resulting rating.

Despite the observed differences, we were able to recover carries quite well (18 out of 20 original players were included in the final rating!). The fact is that almost any attempt to recover missing values by interpolation is accompanied by errors.

And if we take into account that the original event data itself may often have errors associated with poor-quality event marking, then it becomes obvious that it is almost impossible to obtain an ideal picture. (In fact, you can try to complicate the data recovery technique; perhaps I will return to this point in the future.)

Finally let’s consider last rating for carries and passes together.

One more comparison with original rankings from the Athletic.

It can be seen that only one football player out of 20 represented in the Athletic’s rating is missed from the resulting rating. Instead of Curtis Jones, Harvey Barnes appeared in our ranking.

Also we should notice that there is some shift in the obtained values against to original estimations (In my rating the values are lower by several hundredths).

But overall picture is pretty similar:

  • first place with a clear margin — Jack Grealish
  • the rest of the players, exception of Raheem Sterling and Mateo Kovacic, are located relatively close to each other in their starting positions

As a result, we can conclude that we were able to validate the technique of preprocessing event data and subsequent evaluation of xT using the transition matrix.

Create new one up-to-date transition matrix

Detailed guide for creating transition matrix you can find on the link.

The process and some details will be described briefly below:

  • Scrape the data on the TOP5 leagues for last 5 compleated seasons (from 2018–19 to 2022–2023) from whoscored.com
  • Recover carries for all matches (I used the carry event definition directly from Opta=Stats Perform : any movement of the ball more than 5 meters, can be found here)
  • Create a dataset with actions that move the ball on the football pitch (passes, carries, take-ons). Next, we leave only open play actions. Also exclude crosses (a debatable topic, according to the matrix of the Athletic and twelve we can confidently say that they also excluded crosses).
  • Also create two separate datasets with shots and with goals (similarly, we leave only open play actions)
  • Carry out a series of calculations described in the example at the link above from the course from soccematics and David Sumpter, and obtain the final transition matrices (5 subsequent actions during construction are considered)

I decided to build separate matrices for each of the TOP 5 leagues and a final matrix for all 5 leagues. (it will be possible to visually compare the matrices with each other)

EPL

Bundesliga

LaLiga

Ligue 1

Serie A

The final matrix for the TOP5 leagues, which will be used further in all subsequent visualizations in this and other articles.

Application of the xT metric

Let’s come back to the last El Clasico and again consider two chains of possession, within which we will mark with the xT metric all actions that progress the ball.

If previously all attack participants received the same xGChain score equal to the low xG value of a long-range shot, now we have a tool that allows us to assess the individual contribution of each attack participant to the created danger.

Yamal received xT = 0.032 for two actions moving the ball to the area in front of the penalty area. Gundogan, who started the attack with a timely pass to the wing, received xT = 0.009. Lewandowski’s pass to Rafinha and the back pass to Robert’s shot received lower xT scores: 0.006 and 0.007.

The obtained results can be interpreted as follows: Lamin Yamal increased the chances of scoring a goal (through several following actions) by 3% by his actions within the considered chain of possession and made the greatest contribution among all participants of the attack according to the xT metric.

Next, we can use the xT metric to estimate all ball progressions of Barcelona players during the match and highlight those who most often delivered the ball from less dangerous zones to zones with a higher probability of a goal.

In the table below, all Barcelona players are sorted with total values for the match according to the metric xT for three types of actions that advance the ball (passes carries and take-ons).

Below is considered the most “effective” possession of Real Madrid according to the xGChain metric.

The Top 3 actions by xT in the considered possession formally included:

  • pre-assist pass from Carvajal to Modric in the penalty area (xT = 0.124). The xT metric largely helps to primarily evaluate the effectiveness of pre-assist actions in the final part of the buildup.
  • Camavinga’s carry during the first phase of the attack (xT = 0.026)
  • Rudiger’s pass to Carvajal, after which the Spaniard passed to Modric in the penalty area (xT = 0.021)

Of course, there is a great temptation to evaluate Modric’s assist using the xT metric, for which the Croatian would have received a cosmic xT = 0.247. However, as described above, Opta does not mark such a touch as a pass, but with a special type of BallTouch, as a result of which the ball moves or bounces to another player unintentionally.

Assessments of all actions within the considered chain can be found in the table below.

It is also should be taken into consideration that Alaba’s pass has a negative value. Considering the shortcomings of the xGChain metric, emphasis was placed on this pass, as this pass received a high positive value along with other passes in the attack.

On the one hand, it seems that it is normal to give a negative mark for a square pass (formally a little bit back). Usually, in the final xT ratings, actions with negative ratings are excluded from consideration.

On the other hand, this assessment demonstrates perhaps the most obvious drawback of the xT metricmost back passes are assessed with negative values.

It is worth noting that another class of metrics based on the use of machine learning does not have this drawback. (We will consider this in following articles)

All Real Madrid players are sorted match according to the xT metric in the table below. (Modrić without xT for BallTouch)

The best football players in Europe right now according to the xT metric (relevant on 06 Dec 2023)

As a final point with the xT metric, it is proposed to consider the current ratings of the best football players in Europe in terms of the danger they create through moving the ball.

Again, scrape the data from whoscored.com for the TOP5 leagues for the current season. Take the transition matrix obtained earlier. Count the pure time. Form the final rating.

However, before showcasing the rating, it is worth to make one note. My transition matrix in its values turned out to be closer to the original matrix from the author of the original model, and not to the matrix from the Athletic. Accordingly, it would be incorrect to compare ratings with each other.

In the Athletic’s rankings, Grealish’s maximum per 90 minutes was 0.34 xT. If I update the rating based on my transition matrix, Jack gets 0.83 xT.

The best football players in Europe in terms of creating danger through passes and carries in the TOP5 leagues.

The resulting rating has the following structure:

  • 7 players from each the Premier League and Ligue 1
  • 4 players from each La Liga and Serie A
  • 3 Bundesliga players

The best football players in Europe in terms of the danger they create through passes in the TOP5 leagues.

Structure:

  • 8 players from the Bundesliga (4 from Bayer Leverkusen!)
  • 5 players from each the Premier League and LaLiga
  • 4 players from Ligue 1
  • 3 players from Serie A

The best football players in Europe in terms of the danger they create by carrying the ball in the TOP5 leagues.

Structure:

  • 9 Premier League players
  • 6 players from Ligue 1
  • 5 La Liga players
  • 4 players from Serie A
  • 1 player from Bundesliga

P.s.

In the next article we will consider the process of constructing a passing map based on event data with adding of xT metric as additional layer.

--

--

Mikhail Borodastov

ML Product Manager 🚀 | ex- Data Scientist 📊 | Football Analytics Enthusiast ⚽