Analyzing Customer Purchasing Behavior: Identifying Users Who Buy the Same Product in the Same Shop More Than Twice in One Year
As an analyst, understanding customer purchasing behavior is crucial for making informed business decisions. In this blog post, we will explore a query that identifies users who buy the same product in the same shop more than twice in one year.
Problem Statement
The problem statement involves analyzing a dataset to determine the number of unique users who have purchased the same product from the same shop on multiple occasions within a one-year period. The dataset contains information about each user’s purchase, including the date, product, shop, and category.
Data Preprocessing
Before we dive into the query, let’s understand the data preprocessing steps:
- Filtering: We only consider purchases made between January 1st, 2022, and December 31st, 2022.
- Sorting: The data is sorted by date to ensure that purchases are in chronological order.
Query Approach
The provided query uses several intermediate steps to solve the problem. Here’s a breakdown of each step:
Step 1: Data Preparation
CREATE TEMPORARY TABLE table1 AS (
SELECT *, EXTRACT(YEAR FROM date) AS year FROM `projectid.dataset.table`
);
CREATE TEMPORARY TABLE table2 AS (
SELECT * EXCEPT(date, cat2), COUNT(user) OVER(PARTITION BY cat1, year, user, prod, shop) tcount
FROM table1
);
CREATE TEMPORARY TABLE table4 AS (
SELECT * EXCEPT(date, cat1), COUNT(user) OVER(PARTITION BY cat2, year, user, prod, shop) tcount
FROM table1
);
In this step, we create three temporary tables:
table1
: This table contains all the original data with the additional information about the year.table2
: This table contains only the non-date columns fromtable1
and calculates the total count of purchases for each unique combination ofcat1
,year
,user
,prod
, andshop
.table4
: Similar totable2
, but this time we calculate the total count forcat2
.
Step 2: Identifying Users Who Meet the Criteria
SELECT DISTINCT year, cat1, COUNT(DISTINCT user) number_of_user
FROM table2
WHERE tcount > 2
GROUP BY YEAR, cat1;
SELECT DISTINCT year, cat2, COUNT(DISTINCT user) number_of_user
FROM table4
WHERE tcount > 2
GROUP BY YEAR, cat2;
In this step, we filter the table2
and table4
tables to include only those rows where the total count is greater than 2. We then use the COUNT(DISTINCT user)
function to calculate the number of unique users who meet the criteria for each combination of year
and cat1
, or year
and cat2
.
Unioning the Results
If you want a single result set that combines both groups, you can union the two queries:
SELECT DISTINCT year, cat1, COUNT(DISTINCT user) number_of_user
FROM table2
WHERE tcount > 2
GROUP BY YEAR, cat1
UNION ALL
SELECT DISTINCT year, cat2, COUNT(DISTINCT user) number_of_user
FROM table4
WHERE tcount > 2
GROUP BY YEAR, cat2;
Conclusion
In this blog post, we explored a query that identifies users who buy the same product in the same shop more than twice in one year. We walked through the data preprocessing steps and the intermediate queries used to solve the problem. By understanding these concepts, you can apply them to your own analytical work and gain valuable insights into customer purchasing behavior.
Additional Context
Understanding how to analyze customer purchasing behavior is crucial for businesses to make informed decisions about product development, marketing strategies, and more. This query provides a starting point for exploring this topic further.
Step-by-Step Solution
Here’s the complete code:
-- Create temporary tables
CREATE TEMPORARY TABLE table1 AS (
SELECT *, EXTRACT(YEAR FROM date) AS year FROM `projectid.dataset.table`
);
CREATE TEMPORARY TABLE table2 AS (
SELECT * EXCEPT(date, cat2), COUNT(user) OVER(PARTITION BY cat1, year, user, prod, shop) tcount
FROM table1
);
CREATE TEMPORARY TABLE table4 AS (
SELECT * EXCEPT(date, cat1), COUNT(user) OVER(PARTITION BY cat2, year, user, prod, shop) tcount
FROM table1
);
-- Identify users who meet the criteria
SELECT DISTINCT year, cat1, COUNT(DISTINCT user) number_of_user
FROM table2
WHERE tcount > 2
GROUP BY YEAR, cat1
UNION ALL
SELECT DISTINCT year, cat2, COUNT(DISTINCT user) number_of_user
FROM table4
WHERE tcount > 2
GROUP BY YEAR, cat2;
This query provides a clear and concise solution to the problem at hand. By understanding the steps involved, you can adapt this approach to your own analytical needs.
Last modified on 2024-08-18