Calculating Winning or Losing Streak of Players in Python DataFrame: A Step-by-Step Solution

Calculating Winning or Losing Streak of Players in Python DataFrame

Problem Description

In this article, we will discuss how to calculate the winning or losing streak of players in a given tennis match DataFrame. We have a DataFrame with columns tourney_date, player1_id, player2_id, and target. The target column represents whether player 1 won (1) or lost (0).

Introduction
Problem Context
Requirements and Assumptions
Step-by-Step Solution

Introduction

The problem requires us to calculate the winning or losing streak of players in a given tennis match DataFrame. This involves iterating through each row of the DataFrame, updating the streaks for each player based on their target values, and finally joining this streak information with the original DataFrame.

Problem Context

We are working with a DataFrame tennis_data_processed that contains columns tourney_date, player1_id, player2_id, and target. The target column indicates whether player 1 won (1) or lost (0).

tennis_data_processed = {
    'tourney_date': ['2022-10-31', '2023-02-06', '2023-02-06', '2023-02-06', '2023-02-06', '2023-02-20', '2023-02-20'],
    'player1_id': [100000, 123456, 100000, 100000, 345612, 432154, 100000],
    'player2_id': [209950, 100000, 543210, 876543, 100000, 100000, 929292],
    'target': [0, 0, 1, 1, 1, 1, 0]
}

Requirements and Assumptions

We assume that the player_id values in the DataFrame are unique. This uniqueness is crucial for calculating streaks correctly.

Step-by-Step Solution

Step 1: Data Preparation

First, we import necessary libraries: pandas (for data manipulation) and numpy (for numerical operations).

import pandas as pd
import numpy as np

# For convenience: df instead of tennis_data_processed
df = pd.DataFrame(tennis_data_processed)

Step 2: Initialize Dictionary to Track Streaks

We create a dictionary that tracks the current streak for each player. This dictionary is initialized with dict.fromkeys(np.unique(df.filter(like="player")), 0). The np.unique function returns an array of unique values in df.filter(like="player"), which corresponds to player1_id and player2_id.

cnt = dict.fromkeys(np.unique(df.filter(like="player")), 0)

Step 3: Calculate Streaks for Each Player

We iterate through each row of the DataFrame, updating the streaks for each player based on their target values.

streaks = []
for p1, p2, t in df[["player1_id", "player2_id", "target"]].to_numpy():
    streaks.append([cnt[p1], cnt[p2]])
    cnt[p1] = max(cnt[p1] + 1 - t, +1)
    cnt[p2] = min(cnt[p2] - 1 + t, -1)

Step 4: Join Streak Information with Original DataFrame

Finally, we join the streak information with the original DataFrame.

out = df.join(pd.DataFrame(streaks, columns=["player1_streak", "player2_streak"]))

The resulting out DataFrame contains the original columns plus two new columns for each player’s streak (player1_streak and player2_streak).

Last modified on 2024-09-04