A Detailed Guide to Creating Advanced Pass Maps with Python and Matplotlib

Mikhail Borodastov
15 min readFeb 22, 2024

--

In my earlier article, I conducted an exhaustive review of several templates for crafting pass maps, exploring the complex details and fine points of their design. In this follow-up, I intend to lay out a detailed, step-by-step blueprint for creating your own pass maps. To cap it off, I’ll share a link to a GitHub repository containing the code for the example discussed, allowing you to dive directly into practice.

In the .data/ directory, you will find two files:

  • 1729486_Man_City_1_1_Chelsea.json — raw event data from the Manchester City vs. Chelsea match on February 17, 2024, in a JSON file.
  • Man_City_1_1_Chelsea_17022024_opta_events_processed.csv — parsed event data organized into a table, with an added assessments using the xT metric.
  1. Begin by importing all the necessary libraries.
import os
import json
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm
from scipy.stats import binned_statistic_2d
import matplotlib as mpl
from matplotlib import pyplot as plt
from matplotlib.patches import Arrow, ArrowStyle,FancyArrowPatch, Circle,FancyArrow
from mplsoccer.pitch import Pitch, VerticalPitch
from matplotlib.colors import Normalize
from matplotlib import cm
from highlight_text import fig_text, ax_text


import warnings
warnings.filterwarnings("ignore")

2. Reading the Data.

path1 = './data/Man_City_1_1_Chelsea_17022024_opta_events_processed.csv'
path2 = './data/1729486_Man_City_1_1_Chelsea.json'

events_df = pd.read_csv(path1)

with open(path2, 'r') as f:
match_data = json.load(f)

3. For each team, create several dataframes with pass data and store the results in a dictionary.

  • passes_df_all - all passes
  • passes_df_short - all passes up to a fixed point in time (whichever comes first: the first red card, the first substitution, or the maximum minute of the match)
  • passes_df_suc - all successful passes
  • passes_df_suc_short - all successful passes up to a fixed point in time
res_dict = {}

teamIds = events_df['teamId'].unique()

for teamId in teamIds:

mask = events_df['teamId'] == teamId
df_ = events_df[mask]

teamName = df_['teamName'].unique()[0]

venue = 'home' if df_[df_['teamId'] == teamId]['h_a'].unique()[0] == 'h' else 'away'

mask1 = df_['cardType'].apply(lambda x: x in ["SecondYellow", "Red"])
first_red_card_minute = df_[mask1].minute.min()

mask2 = events_df['type'] == 'SubstitutionOn'
first_sub_min = events_df[mask2].minute.min()

max_minute = df_.minute.max()

num_minutes = min(first_sub_min, first_red_card_minute, max_minute)

passes_df = df_.reset_index().drop('index', axis=1)
passes_df['playerId'] = passes_df['playerId'].astype('Int64')
passes_df = passes_df[passes_df['playerId'].notnull()]
passes_df['passRecipientName'] = passes_df['playerName'].shift(-1)
passes_df = passes_df[passes_df['passRecipientName'].notnull()]

#DF with all passes
mask1 = passes_df['type'].apply(lambda x: x in ['Pass'])
passes_df_all = passes_df[mask1]

#DF with all passes before num_minutes (additional filter on first 11 players)
mask2 = passes_df_all['minute'] < num_minutes
players = passes_df_all[passes_df_all['minute'] < num_minutes]['playerName'].unique()
mask3 = passes_df_all['playerName'].apply(lambda x: x in players)
passes_df_short = passes_df_all[mask2 & mask3]


#DF with successed / completed passes
mask2 = passes_df_all['playerName'] != passes_df_all['passRecipientName']
mask3 = passes_df_all['outcomeType'] == 'Successful'
passes_df_suc = passes_df_all[mask2&mask3]

#DF with successed passes before num_minutes (additional filter on first 11 players)
mask2 = passes_df_suc['minute'] < num_minutes
players = passes_df_suc[passes_df_suc['minute'] < num_minutes]['playerName'].unique()
mask3 = passes_df_suc['playerName'].apply(lambda x: x in players) & \
passes_df_suc['passRecipientName'].apply(lambda x: x in players)
passes_df_suc_short = passes_df_suc[mask2 & mask3]

print('team: ',teamName)
print('passes: ', passes_df_all.shape[0])
print('suc passes: ', passes_df_suc.shape[0])
print('last minute: min(first red / substitution / end game) = ', num_minutes)

print('suc passes befor last minute: ', passes_df_short.shape[0])
print('\n')

res_dict[teamId] = {}

res_dict[teamId]['passes_df_all'] = passes_df_all
res_dict[teamId]['passes_df_short'] = passes_df_short
res_dict[teamId]['passes_df_suc'] = passes_df_suc
res_dict[teamId]['passes_df_suc_short'] = passes_df_suc_short
res_dict[teamId]['minutes'] = num_minutes

This results in generating intermediate statistical data.

Let's verify our results by comparing them with data from fbref.

https://fbref.com/en/matches/78fb7fc2/Manchester-City-Chelsea-February-17-2024-Premier-League

  • Missing 1 unsuccessful pass for Manchester City
  • Missing 5 passes for Chelsea, including 2 successful and 3 unsuccessful ones

The observed discrepancies can be attributed to differences in data annotation among providers, as well as potential losses during processing. Overall, there are no significant deviations, so we can proceed further.

4. Prepare the dataframes with statistics that will be utilized in the map rendering process.

player_position

This is a dataframe containing coordinates for drawing. In this case, we’re adopting a straightforward method by calculating coordinates from passes_df_short, incorporating all passes—both successful and unsuccessful—up until a specified moment (for this game, it's the 63rd minute for both teams, corresponding to the time of the first substitution).

We will explore alternative calculation methods later and compare the outcomes. The median is used for our statistical calculations due to its robustness against outliers.

player_pass_count

This dataframe compiles statistics on all passes (player_pass_count_all), successful passes (player_pass_count_suc), and successful passes up to a fixed time limit (player_pass_count_suc_short). This dataset will be used to draw nodes on the map, with their size proportional to the number of passes made by the player.

pair_pass_count

This dataframe stores data on the passing statistics between each pair of players. Similarly, it consists of three components, mirroring the structure of the previous dataframe.

player_pass_value

This dataframe contains the cumulative xT value for successful actions taken during open play. It aggregates two columns — player_pass_value_suc and pair_pass_value_suc_short—for all successful actions and those filtered by time, respectively.

pair_pass_value

Similarly, it stores the xT values for all passes between each pair of players.

var = 'playerName'
var2 = 'passRecipientName'

for teamId in teamIds:

passes_df_all = res_dict[teamId]['passes_df_all']
passes_df_suc = res_dict[teamId]['passes_df_suc']
passes_df_short = res_dict[teamId]['passes_df_short']
passes_df_suc_short = res_dict[teamId]['passes_df_suc_short']

player_position = passes_df_short.groupby(var).agg({'x': ['median'], 'y': ['median']})

player_position.columns = ['x', 'y']
player_position.index.name = 'playerName'
player_position.index = player_position.index.astype(str)

player_pass_count_all = passes_df_all.groupby(var).agg({'playerId':'count'}).rename(columns={'playerId':'num_passes_all'})
player_pass_count_suc = passes_df_suc.groupby(var).agg({'playerId':'count'}).rename(columns={'playerId':'num_passes'})
player_pass_count_suc_short = passes_df_suc_short.groupby(var).agg({'playerId':'count'}).rename(columns={'playerId':'num_passes2'})
player_pass_count = player_pass_count_all.join(player_pass_count_suc).join(player_pass_count_suc_short)


# player_pass_count = passes_df.groupby(var).agg({'playerId':'count'}).rename(columns={'playerId':'num_passes'})

passes_df_all["pair_key"] = passes_df_all.apply(lambda x: "_".join([str(x[var]), str(x[var2])]), axis=1)
passes_df_suc["pair_key"] = passes_df_suc.apply(lambda x: "_".join([str(x[var]), str(x[var2])]), axis=1)
passes_df_suc_short["pair_key"] = passes_df_suc.apply(lambda x: "_".join([str(x[var]), str(x[var2])]), axis=1)



pair_pass_count_all = passes_df_all.groupby('pair_key').agg({'playerId':'count'}).rename(columns={'playerId':'num_passes_all'})
pair_pass_count_suc = passes_df_suc.groupby('pair_key').agg({'playerId':'count'}).rename(columns={'playerId':'num_passes'})
pair_pass_count_suc_short = passes_df_suc_short.groupby('pair_key').agg({'playerId':'count'}).rename(columns={'playerId':'num_passes2'})
pair_pass_count = pair_pass_count_all.join(pair_pass_count_suc).join(pair_pass_count_suc_short)



player_pass_value_suc = (passes_df_suc.groupby(var)
.agg({'xT_added':'sum'})
.round(3)
.rename(columns={'xT_added':'pass_value'}))
player_pass_value_suc_short = (passes_df_suc_short.groupby(var)
.agg({'xT_added':'sum'})
.round(3)
.rename(columns={'xT_added':'pass_value2'}))
player_pass_value = player_pass_value_suc.join(player_pass_value_suc_short)




pair_pass_value_suc = (passes_df_suc.groupby(['pair_key'])
.agg({'xT_added':'sum'})
.round(3)
.rename(columns={'xT_added':'pass_value'}))
pair_pass_value_suc_short = (passes_df_suc_short.groupby(['pair_key'])
.agg({'xT_added':'sum'})
.round(3)
.rename(columns={'xT_added':'pass_value2'}))
pair_pass_value = pair_pass_value_suc.join(pair_pass_value_suc_short)


player_position['z'] = player_position['x']
player_position['x'] = player_position['y']
player_position['y'] = player_position['z']

res_dict[teamId]['player_position'] = player_position
res_dict[teamId]['player_pass_count'] = player_pass_count
res_dict[teamId]['pair_pass_count'] = pair_pass_count
res_dict[teamId]['player_pass_value'] = player_pass_value
res_dict[teamId]['pair_pass_value'] = pair_pass_value

5. Visualization — the most extensive part in terms of code.

Creating a basic pass map is not overly challenging, with numerous examples available online. I selected the most informative template for my purposes, drawing inspiration from a template by The Athletic. The construction details of the template, excluding the code, are described in the article.

Before starting the visualization, I suggest using a pass map from The Athletic’s dashboard as a reference. While the quality may not be very high, it serves the purpose of roughly validating the obtained results. These are the kinds of maps we expect to produce (on both the left and right sides).

https://theathletic.com/live-blogs/manchester-city-chelsea-live-updates-premier-league-score-result/1eSIlP0mA78v/95veMydtAiyx/

5.1. Setting Parameters and Defining Utility Functions

Let’s establish a color scale to overlay an additional layer on the map, showcasing the efficiency metric for passes — xT. This colormap will be applied to both nodes and edges.

nodes_cmap = mpl.colors.LinearSegmentedColormap.from_list("", ['#E15A82',
'#EEA934',
'#F1CA56',
'#DCED69',
'#7FF7A8',
'#5AE1AC',
'#11C0A1'


])

nodes_cmap

Now, let’s specify five distinct colors for use in the legend.

node_cmap = cm.get_cmap(nodes_cmap)

norm = Normalize(vmin=0, vmax=1)
node_color1 = node_cmap(norm(0))
node_color2 = node_cmap(norm(0.25))
node_color3 = node_cmap(norm(0.5))
node_color4 = node_cmap(norm(0.75))
node_color5 = node_cmap(norm(1))

Set up the variables that will be utilized for creating the visualization.

#nodes
min_node_size = 5
max_node_size = 35

max_player_count = 88
min_player_count = 1

max_player_value = 0.36
min_player_value = 0.01

#font
font_size = 8
font_color = 'black'

#edges arrow
min_edge_width = 0.5
max_edge_width = 5

head_length = 0.3
head_width = 0.1

max_pair_count = 16
min_pair_count = 1

min_pair_value = 0.01
max_pair_value = 0.085

min_passes = 5

5.2 Drawing the Map

We’ll draw two vertical maps using the mplsoccer library and conduct some data preparation.

  • We will draw edges (arrows between players) only for pairs where the number of passes exceeds a minimum threshold of min_passes = 5.
  • player_stats is the dataframe with the final statistics for drawing nodes
  • pair_stats2 is the dataframe with the final statistics for drawing edges.
plt.style.use('fivethirtyeight')

fig,ax = plt.subplots(1,2,figsize=(6,6), dpi=400)

teamId_home = events_df[events_df['h_a'] == 'h']['teamId'].unique()[0]
teamId_away = events_df[events_df['h_a'] == 'a']['teamId'].unique()[0]

for i, teamid in enumerate([teamId_home, teamId_away]):

#define dataframes
position = res_dict[teamid]['player_position']
player_pass_count = res_dict[teamid]['player_pass_count']
pair_pass_count = res_dict[teamid]['pair_pass_count']
player_pass_value = res_dict[teamid]['player_pass_value']
pair_pass_value = res_dict[teamid]['pair_pass_value']
minutes_ = res_dict[teamid]['minutes']

pitch = VerticalPitch(pitch_type='opta',
line_color='#7c7c7c',
goal_type='box',
linewidth=0.5,
pad_bottom=10)

#plot vertical pitches
pitch.draw(ax=ax[i], constrained_layout=False, tight_layout=False)

# Step 1: processing for plotting edges
pair_stats = pd.merge(pair_pass_count, pair_pass_value, left_index=True, right_index=True)
pair_stats = pair_stats.sort_values('num_passes',ascending=False)
pair_stats2 = pair_stats[pair_stats['num_passes'] >= min_passes]

# Step 2: processing for plotting nodes
player_stats = pd.merge(player_pass_count, player_pass_value, left_index=True, right_index=True)

# Filter first 11 players
mask = events_df['minute'] < minutes_
players_ = list(set(events_df[mask]['playerName'].dropna()))

mask_ = player_stats.index.map(lambda x: x in players_)
player_stats = player_stats.loc[mask_]

mask_ = pair_stats2.index.map(lambda x: (x.split('_')[0] in players_) & (x.split('_')[1] in players_))
pair_stats2 = pair_stats2[mask_]

ind = position.index.map(lambda x: x in players_)
position = position.loc[ind]

We got such a picture.

Adding code to draw nodes.

# Step 3: plotting nodes
for var, row in player_stats.iterrows():
player_x = position.loc[var]["x"]
player_y = position.loc[var]["y"]

num_passes = row["num_passes"]
pass_value = row["pass_value"]

marker_size = change_range(num_passes, (min_player_count, max_player_count), (min_node_size, max_node_size))
norm = Normalize(vmin=min_player_value, vmax=max_player_value)
node_color = node_cmap(norm(pass_value))

ax[i].plot(player_x, player_y, '.', color=node_color, markersize=marker_size, zorder=5)
ax[i].plot(player_x, player_y, '.', markersize=marker_size+2, zorder=4, color='white')

var_ = ' '.join(var.split(' ')[1:]) if len(var.split(' ')) > 1 else var
ax[i].annotate(var_, xy=(player_x, player_y+5 if player_y > 48 else player_y - 5), ha="center", va="center", zorder=7,
fontsize=5,
color = 'black',
font = 'serif',
weight='heavy')

player_stats.loc[var, 'marker_size'] = marker_size

Adding code to draw edges.

To control the offset between pairs of arrows relative to each other depending on the direction of drawing — horizontal or vertical — I use a slope parameter, which I calculate as the ratio of dy to dx.

To determine the drawing direction, I use the sign of dx and dy.

Depending on the orientation (vertical or horizontal) and direction (left, right), I add or subtract a shift to separate the arrows from each other.

 # Step 4: ploting edges  
for pair_key, row in pair_stats2.iterrows():
player1, player2 = pair_key.split("_")

player1_x = position.loc[player1]["x"]
player1_y = position.loc[player1]["y"]

player2_x = position.loc[player2]["x"]
player2_y = position.loc[player2]["y"]

num_passes = row["num_passes"]
pass_value = row["pass_value"]

line_width = change_range(num_passes, (min_pair_count, max_pair_count), (min_edge_width, max_edge_width))
alpha = change_range(pass_value, (min_player_value, max_player_value), (0.4, 1))

norm = Normalize(vmin=min_pair_value, vmax=max_pair_value)
edge_cmap = cm.get_cmap(nodes_cmap)
edge_color = edge_cmap(norm(pass_value))

x = player1_x
y = player1_y
dx = player2_x-player1_x
dy = player2_y-player1_y
rel = 68/105
shift_x = 2
shift_y = shift_x*rel

slope = round(abs((player2_y - player1_y)*105/100 / (player2_x - player1_x)*68/100),1)

mutation_scale = 1
if (slope > 0.5):
if dy > 0:
ax[i].annotate("", xy=(x+dx+shift_x, y+dy), xytext=(x+shift_x, y),zorder=2,
arrowprops=dict(arrowstyle=f'->, head_length = {head_length}, head_width={head_width}',
color=tuple([alpha if n == 3 else i for n, i in enumerate(edge_color)]),
fc = 'blue',
lw=line_width,
shrinkB=player_stats.loc[player2, 'marker_size']/5))


elif dy <= 0:
ax[i].annotate("", xy=(x+dx-shift_x, y+dy), xytext=(x-shift_x, y),zorder=2,
arrowprops=dict(arrowstyle=f'->, head_length = {head_length}, head_width={head_width}',
color=tuple([alpha if n == 3 else i for n, i in enumerate(edge_color)]),
fc = 'blue',
lw=line_width,
shrinkB=player_stats.loc[player2, 'marker_size']/5))

elif (slope <= 0.5) & (slope >=0):
if dx > 0:
ax[i].annotate( "", xy=(x+dx, y+dy-shift_y), xytext=(x, y-shift_y),zorder=2,
arrowprops=dict(arrowstyle=f'->, head_length = {head_length}, head_width={head_width}',
color=tuple([alpha if n == 3 else i for n, i in enumerate(edge_color)]),
fc = 'blue',
lw=line_width,
shrinkB=player_stats.loc[player2, 'marker_size']/5))

elif dx <= 0:
ax[i].annotate("", xy=(x+dx, y+dy+shift_y), xytext=(x, y+shift_y),zorder=2,
arrowprops=dict(arrowstyle=f'->, head_length = {head_length}, head_width={head_width}',
color=tuple([alpha if n == 3 else i for n, i in enumerate(edge_color)]),
fc = 'blue',
lw=line_width,
shrinkB=player_stats.loc[player2, 'marker_size']/5))

We obtain an almost finished map.

Comparing it to the map from The Athletic, there are minor discrepancies with some positions — Enzo Fernandez is shifted slightly closer to the center on our map, while Disasi is positioned slightly below Colwill.

However, overall, the maps are quite similar in terms of player position coordinates.

5.3 Adding a Legend and Additional Details

def add_details(ax_n):

ax[i].plot([21, 21], [ax[i].get_ylim()[0]+19, ax[i].get_ylim()[1]-19], ls=':',dashes=(1, 3), color='gray', lw=0.4)
ax[i].plot([78.8, 78.8], [ax[i].get_ylim()[0]+19, ax[i].get_ylim()[1]-19], ls=':',dashes=(1, 3), color='gray', lw=0.4)
ax[i].plot([36.8, 36.8], [ax[i].get_ylim()[0]+8.5, ax[i].get_ylim()[1]-8.5], ls=':',dashes=(1, 3), color='gray', lw=0.4)
ax[i].plot([100-36.8, 100-36.8], [ax[i].get_ylim()[0]+8.5, ax[i].get_ylim()[1]-8.5], ls=':',dashes=(1, 3), color='gray', lw=0.4)

ax[i].plot([ax[i].get_xlim()[0]-4, ax[i].get_xlim()[1]+4], [83,83], ls=':',dashes=(1, 3), color='gray', lw=0.4)
ax[i].plot([ax[i].get_xlim()[0]-4, ax[i].get_xlim()[1]+4], [67, 67], ls=':',dashes=(1, 3), color='gray', lw=0.4)

ax[i].plot([ax[i].get_xlim()[0]-4, ax[i].get_xlim()[1]+4], [100-83,100-83], ls=':',dashes=(1, 3), color='gray', lw=0.4)
ax[i].plot([ax[i].get_xlim()[0]-4, ax[i].get_xlim()[1]+4], [100-67, 100-67], ls=':',dashes=(1, 3), color='gray', lw=0.4)


head_length = 0.3
head_width = 0.05
ax[i].annotate(xy=(102, 58),
xytext=(102, 43),zorder=2,
text='',
ha='center',
arrowprops=dict(arrowstyle=f'->, head_length = {head_length}, head_width={head_width}',
color='#7c7c7c',
lw=0.5))
ax[i].annotate(xy=(104, 48),zorder=2,
text='Attack',
ha='center',
color='#7c7c7c',
rotation=90,
size=5)

ax[i].annotate(xy=(50, -5),zorder=2,
text=f'Passes from minutes 1 to {minutes_}',
ha='center',
color='#7c7c7c',
size=6)

font = 'serif'
fig_text(
x = 0.5, y = .90,
s = "Passing network for Man City 1 - 1 Chelsea",
weight = 'bold',
va = "bottom", ha = "center",
fontsize = 10, font=font )

fig_text(
x = 0.25, y = .855,
s = "Man City",
weight = 'bold',
va = "bottom", ha = "center",
fontsize = 8, font=font )
fig_text(
x = 0.73, y = .855,
s = "Chelsea",
weight = 'bold',
va = "bottom", ha = "center",
fontsize = 8, font=font )

fig_text(
x = 0.5, y = 0.875,
s = "Premier League | Season 2023-2024 | 2024-02-17 ",
va = "bottom", ha = "center",
fontsize = 6, font=font)

fig_text(
x = 0.87, y = -0.0,
s = "FOOTSCI",
va = "bottom", ha = "center", weight='bold',
fontsize = 12, font=font, color='black')


fig_text(
x = 0.14, y = .14,
s = "Pass count between",
va = "bottom", ha = "center",
fontsize = 6, font=font)

fig_text(
x = 0.38, y = .14,
s = "Pass value between (OP xT)",
va = "bottom", ha = "center",
fontsize = 6, font=font)

fig_text(
x = 0.61, y = .14,
s = "Player pass count",
va = "bottom", ha = "center",
fontsize = 6, font=font)

fig_text(
x = 0.84, y = .14,
s = "Player pass value (OP xT)",
va = "bottom", ha = "center",
fontsize = 6, font=font)

fig_text(
x = 0.41, y = .038,
s = "Low",
va = "bottom", ha = "center",
fontsize = 6, font=font)

fig_text(
x = 0.6, y = .038,
s = "High",
va = "bottom", ha = "center",
fontsize = 6, font=font)

fig_text(
x = 0.1, y = -0.0,
s = "t.me/footsci\nt.me/footsci_eng",
va = "bottom", ha = "center", weight='bold',
fontsize = 6, font=font, color='black')


fig_text(
x = 0.13, y = 0.07,
s = "5 to 16+",
va = "bottom", ha = "center",
fontsize = 5, font=font, color='black')

fig_text(
x = 0.37, y = 0.07,
s = "0 to 0.09+",
va = "bottom", ha = "center",
fontsize = 5, font=font, color='black')

fig_text(
x = 0.61, y = 0.07,
s = "1 to 88+",
va = "bottom", ha = "center",
fontsize = 5, font=font, color='black')

fig_text(
x = 0.84, y = 0.07,
s = "0.01 to 0.36+",
va = "bottom", ha = "center",
fontsize = 5, font=font, color='black')


head_length = 20
head_width = 20

x0 = 190
y0 = 280
dx = 60
dy = 120
shift_x = 70

x1 = 700
x2 = 1350
y2 = 340
shift_x2 = 70
radius = 20

x3 = 1800
shift_x3 = 100

color='black'

style = ArrowStyle('->', head_length=5, head_width=3)

arrow1 = FancyArrowPatch((x0,y0), (x0+dx, y0+dy), lw=0.5,
arrowstyle=style, color=color)
arrow2 = FancyArrowPatch((x0+shift_x,y0), (x0+dx+shift_x, y0+dy), lw=1.5,
arrowstyle=style, color=color)
arrow3 = FancyArrowPatch((x0+2*shift_x,y0), (x0+dx+2*shift_x, y0+dy), lw=2.5,
arrowstyle=style, color=color)


arrow4 = FancyArrowPatch((x1,y0), (x1+dx, y0+dy), lw=2.5,
arrowstyle=style, color=node_color1)
arrow5 = FancyArrowPatch((x1+shift_x,y0), (x1+dx+shift_x, y0+dy), lw=2.5,
arrowstyle=style, color=node_color2)
arrow6 = FancyArrowPatch((x1+2*shift_x,y0), (x1+dx+2*shift_x, y0+dy), lw=2.5,
arrowstyle=style, color=node_color3)
arrow7 = FancyArrowPatch((x1+3*shift_x,y0), (x1+dx+3*shift_x, y0+dy), lw=2.5,
arrowstyle=style, color=node_color4)
arrow8 = FancyArrowPatch((x1+4*shift_x,y0), (x1+dx+4*shift_x, y0+dy), lw=2.5,
arrowstyle=style, color=node_color5)

# arrow1 = FancyArrow(x0,y0,dx,dy, lw=0.5, color=color,head_width=head_width, head_length=head_length)
# arrow2 = FancyArrow(x0+shift_x,y0,dx,dy, lw=1, color=color,head_width=head_width, head_length=head_length)
# arrow3 = FancyArrow(x0+shift_x*2,y0,dx,dy, lw=2, color=color,head_width=head_width, head_length=head_length)

circle1 = Circle(xy=(x2, y2), radius=radius, edgecolor='black',fill=False)
circle2 = Circle(xy=(x2+shift_x2, y2), radius=radius*1.5, edgecolor='black',fill=False)
circle3 = Circle(xy=(x2+2.3*shift_x2, y2), radius=radius*2, edgecolor='black',fill=False)

circle4 = Circle(xy=(x3, y2), radius=radius*2, color=node_color1)
circle5 = Circle(xy=(x3 + shift_x3, y2), radius=radius*2, color=node_color2)
circle6 = Circle(xy=(x3 + 2*shift_x3, y2), radius=radius*2, color=node_color3)
circle7 = Circle(xy=(x3 + 3*shift_x3, y2), radius=radius*2, color=node_color4)
circle8 = Circle(xy=(x3 + 4*shift_x3, y2), radius=radius*2, color=node_color5)



fig.patches.extend([arrow1, arrow2, arrow3 ])
fig.patches.extend([arrow4, arrow5, arrow6, arrow7, arrow8 ])
fig.patches.extend([circle1 , circle2, circle3])
fig.patches.extend([circle4,circle5,circle6,circle7,circle8])


x4 = 1020
y4 = 180
dx = 350

arrow9 = FancyArrowPatch((x4,y4), (x4+dx, y4), lw=1,
arrowstyle=style, color='black')

fig.patches.extend([arrow9])


plt.tight_layout()
plt.subplots_adjust(wspace=0.1, hspace=0, bottom = 0.1)

Saving the obtained results.

fig.savefig('./img/pass_map.jpeg', bbox_inches='tight', dpi=400)

6. Comments on the resulting map.

Coordinates

We used the coordinates of players who made any passes, both successful and unsuccessful.

player_position = passes_df_short.groupby(var).agg({'x': ['median'], 'y': ['median']})

Some authors prefer to construct radars only for successful passes; therefore, as an alternative, passes_df_suc_short can be used. However, accounting for unsuccessful passes can still be useful for more accurately determining the player's position on the field from which they are making passes.

An additional nuance is that we do not account for the field coordinates where the player receives the ball, i.e., we assess the position on the field only by the locations from which passes were made. For forwards, this can have a negative impact.

The example of Haaland is illustrative. He made only 12 passes, 9 of which were successful. The median position for the Norwegian lies below the level of the left defender, which could give the misleading impression from the map that this position is his “average” coordinate.

But we know that Erling took 9 shots from the penalty area, which likely means he received the ball much higher up the field than his final position on the map suggests.

As an alternative approach, we could consider the average or median value for all points where a player either made or received a pass. In that case, player positions could be calculated as follows.

  player_position = pd.DataFrame()
for player in passes_df_all['playerName'].unique():

#select all passes where a player either passed or received the ball
mask = (passes_df_all['playerName'] == player) | \
(passes_df_all['passRecipientName'] == player) & \
(passes_df_all['minute'] <= minutes_)

#option for filtering only success ones
# mask = (
# (passes_df_all['playerName'] == player) | \
# ((passes_df_all['passRecipientName'] == player) & \
# (passes_df_all['outcomeType'] == 'Successful'))) &\
# (passes_df_all['minute'] <= minutes_)

proxy = passes_df_all[mask]
proxy['real_x'] = proxy[['playerName','passRecipientName','x','endX']].apply(lambda x: x[2] if x[0] == player else
(x[3] if x[1] == player else np.nan), axis=1)

proxy['real_y'] = proxy[['playerName','passRecipientName','y','endY']].apply(lambda x: x[2] if x[0] == player else
(x[3] if x[1] == player else np.nan), axis=1)

df_ = pd.DataFrame({'x':[proxy['real_x'].median()],
'y':[proxy['real_y'].median()]}, index=[player])
player_position = pd.concat([player_position,df_])

The resulting outcome.

We observe that on the left map, Haaland’s position has been significantly adjusted. On the right graph, Enzo Fernandez’s position has shifted slightly to the left, and the median positions of Palmer and Gusto, who periodically joined the attack and received the ball high on the flank, have also changed.

The choice of approach depends on the objectives you aim to achieve with this visualization.

Nodes and Edges

The size of a node is proportional to the number of successful passes made, and the size of an edge is also proportional to the number of successful passes for each pair of players. In the example above, we used player_pass_count_suc and pair_pass_count_suc. This might seem like an imperfect solution because the visualization becomes somewhat inconsistent—the positions are calculated based on coordinates up to the first substitution, while the size of nodes and the color are based on statistics for the entire match. In my view, the decision on this approach should be determined by the ultimate goal set for this visualization.

If we aim to obtain the most comprehensive information about the game using a single pass map, then we can use the approach as in my example.

If we are prepared to use multiple maps for each specific time interval, then it is more appropriate to use the current statistics for nodes and edges, calculated for a fixed period of the match.

Below is a map for Manchester City before and after the sole substitution. The minimum number of passes has been reduced from 5 to 4.

In cases where there are several substitutions, aggregated statistics for short intervals of 5, 10, or 15 minutes often turn out to be quite unrepresentative, and the maps generated on their basis do not provide much value for analysis.

Below is an example set of maps for Chelsea. For maps from the second to the fourth, the minimum passing statistic between players for display is reduced to 1 pass. There is also a case where a player did not make any passes during the specified period of time — the second map, Gusto is missing.

It’s also important to note that when dividing the entire match into intervals, the thresholds for nodes and edges should be normalized not to values converted to a 90-minute match, as indicated in the map legend, but to the corresponding time period.

Below is a link to GitHub with the data and code for building the pass map discussed in the article.

I welcome any comments and feedback!

P.s. Article with overview of the most popular templates:

Passing networks with expected threat (xT) layer. Walking through popular templates. Explaining the details.

GitHub

Twitter

Telegram

Linkedin

--

--

Mikhail Borodastov

ML Product Manager 🚀 | ex- Data Scientist 📊 | Football Analytics Enthusiast ⚽