Identifying Users Who Buy the Same Product in the Same Shop More Than Twice in One Year: A Step-by-Step Solution

Analyzing Customer Purchasing Behavior: Identifying Users Who Buy the Same Product in the Same Shop More Than Twice in One Year

As an analyst, understanding customer purchasing behavior is crucial for making informed business decisions. In this blog post, we will explore a query that identifies users who buy the same product in the same shop more than twice in one year.

Problem Statement

The problem statement involves analyzing a dataset to determine the number of unique users who have purchased the same product from the same shop on multiple occasions within a one-year period. The dataset contains information about each user’s purchase, including the date, product, shop, and category.

Data Preprocessing

Before we dive into the query, let’s understand the data preprocessing steps:

  1. Filtering: We only consider purchases made between January 1st, 2022, and December 31st, 2022.
  2. Sorting: The data is sorted by date to ensure that purchases are in chronological order.

Query Approach

The provided query uses several intermediate steps to solve the problem. Here’s a breakdown of each step:

Step 1: Data Preparation

CREATE TEMPORARY TABLE table1 AS (
    SELECT *, EXTRACT(YEAR FROM date) AS year FROM `projectid.dataset.table`
);

CREATE TEMPORARY TABLE table2 AS (
    SELECT * EXCEPT(date, cat2), COUNT(user) OVER(PARTITION BY cat1, year, user, prod, shop) tcount 
    FROM table1
);

CREATE TEMPORARY TABLE table4 AS (
    SELECT * EXCEPT(date, cat1), COUNT(user) OVER(PARTITION BY cat2, year, user, prod, shop) tcount 
    FROM table1
);

In this step, we create three temporary tables:

  • table1: This table contains all the original data with the additional information about the year.
  • table2: This table contains only the non-date columns from table1 and calculates the total count of purchases for each unique combination of cat1, year, user, prod, and shop.
  • table4: Similar to table2, but this time we calculate the total count for cat2.

Step 2: Identifying Users Who Meet the Criteria

SELECT DISTINCT year, cat1, COUNT(DISTINCT user) number_of_user 
FROM table2 
WHERE tcount > 2 
GROUP BY YEAR, cat1;

SELECT DISTINCT year, cat2, COUNT(DISTINCT user) number_of_user 
FROM table4 
WHERE tcount > 2 
GROUP BY YEAR, cat2;

In this step, we filter the table2 and table4 tables to include only those rows where the total count is greater than 2. We then use the COUNT(DISTINCT user) function to calculate the number of unique users who meet the criteria for each combination of year and cat1, or year and cat2.

Unioning the Results

If you want a single result set that combines both groups, you can union the two queries:

SELECT DISTINCT year, cat1, COUNT(DISTINCT user) number_of_user 
FROM table2 
WHERE tcount > 2 
GROUP BY YEAR, cat1

UNION ALL

SELECT DISTINCT year, cat2, COUNT(DISTINCT user) number_of_user 
FROM table4 
WHERE tcount > 2 
GROUP BY YEAR, cat2;

Conclusion

In this blog post, we explored a query that identifies users who buy the same product in the same shop more than twice in one year. We walked through the data preprocessing steps and the intermediate queries used to solve the problem. By understanding these concepts, you can apply them to your own analytical work and gain valuable insights into customer purchasing behavior.

Additional Context

Understanding how to analyze customer purchasing behavior is crucial for businesses to make informed decisions about product development, marketing strategies, and more. This query provides a starting point for exploring this topic further.

Step-by-Step Solution

Here’s the complete code:

-- Create temporary tables
CREATE TEMPORARY TABLE table1 AS (
    SELECT *, EXTRACT(YEAR FROM date) AS year FROM `projectid.dataset.table`
);

CREATE TEMPORARY TABLE table2 AS (
    SELECT * EXCEPT(date, cat2), COUNT(user) OVER(PARTITION BY cat1, year, user, prod, shop) tcount 
    FROM table1
);

CREATE TEMPORARY TABLE table4 AS (
    SELECT * EXCEPT(date, cat1), COUNT(user) OVER(PARTITION BY cat2, year, user, prod, shop) tcount 
    FROM table1
);

-- Identify users who meet the criteria
SELECT DISTINCT year, cat1, COUNT(DISTINCT user) number_of_user 
FROM table2 
WHERE tcount > 2 
GROUP BY YEAR, cat1

UNION ALL

SELECT DISTINCT year, cat2, COUNT(DISTINCT user) number_of_user 
FROM table4 
WHERE tcount > 2 
GROUP BY YEAR, cat2;

This query provides a clear and concise solution to the problem at hand. By understanding the steps involved, you can adapt this approach to your own analytical needs.


Last modified on 2024-08-18