Finding Active Customers by Month in BigQuery using SQL
In this article, we’ll explore how to find the count of active customers per month in BigQuery using SQL. We’ll dive into the details of creating a query that filters data based on specific date ranges and handle overlaps between these ranges.
Understanding the Problem
The problem at hand is to retrieve the number of unique customer IDs (active customers) for each region, grouped by month, with promotion active during those months. The catch is that some promotions might overlap across different dates within a month, making it challenging to accurately count active customers without considering these overlaps.
Step 1: Preparing the Data
Before we dive into the SQL query, let’s assume we have a table named customers_table
with columns for customer_id
, region
, and start_time
(the start date of the promotion). We’ll also use end_time
to track the end date of each promotion.
# customers_table
| customer_id | region | start_time | end_time |
|-------------|-----------|-------------|------------|
| 1 | A | 2022-06-01 | 2022-06-30 |
| 2 | B | 2022-07-01 | 2022-08-31 |
| ... | ... | ... | ... |
Step 2: Creating the Query
To find the count of active customers per month, we’ll use a combination of CASE
, WHEN
, and OR
statements within our SQL query. We’ll start by wrapping each WHEN
statement in parentheses to ensure proper grouping.
First Month (June)
# WHEN for June
WHEN
(DATE (start_time) >= '2022-06-01' AND DATE(end_time) <= '2022-06-30')
OR
( DATE(start_time) < '2022-06-01'AND DATE_TRUNC(end_time,month) = '2022-06-01')
OR
(DATE(end_time)>='2022-06-30' AND DATE_TRUNC(start_time,month) ='2022-06-01')
THEN '2022-06-01'
Second Month (July)
# WHEN for July
WHEN
(DATE (start_time) >= '2022-07-01' AND DATE(end_time) <= '2022-07-31')
OR
( DATE(start_time) < '2022-07-01'AND DATE_TRUNC(end_time,month) = '2022-07-01')
OR
(DATE(end_time)>='2022-07-31' AND DATE_TRUNC(start_time,month) ='2022-07-01')
THEN '2022-07-01'
Third Month (August)
# WHEN for August
WHEN
(DATE (start_time) >= '2022-08-01' AND DATE(end_time) <= '2022-08-31')
OR
( DATE(start_time) < '2022-08-01'AND DATE_TRUNC(end_time,month) = '2022-08-01')
OR
(DATE(end_time)>='2022-08-31' AND DATE_TRUNC(start_time,month) ='2022-08-01')
THEN '2022-08-01'
Step 3: Final Query
Now that we have the WHEN
statements for each month, let’s put it all together in a single SQL query.
SELECT
region,
STRING(DATE(DATE_TRUNC(start_time,month))) AS month,
COUNT(DISTINCT customer_id) AS active_customers
FROM customers_table
GROUP BY region, month
Step 4: Explanation
In this final query, we group the data by region
and month
, using the DATE_TRUNC
function to standardize the date ranges. We then count the distinct customer_id
values for each group.
Note that when dealing with overlapping promotions, it’s essential to consider how these overlaps impact your analysis. In this case, our query only counts customers who have an active promotion during a specific month, without regard to overlap.
Conclusion
In conclusion, finding active customers per month in BigQuery using SQL involves creating a complex CASE
statement that filters data based on specific date ranges. By using the DATE_TRUNC
function and strategically crafting the WHEN
statements, we can accurately count active customers for each region and month.
Last modified on 2025-01-02