Finding Active Customers by Month in BigQuery using SQL

Finding Active Customers by Month in BigQuery using SQL

In this article, we’ll explore how to find the count of active customers per month in BigQuery using SQL. We’ll dive into the details of creating a query that filters data based on specific date ranges and handle overlaps between these ranges.

Understanding the Problem

The problem at hand is to retrieve the number of unique customer IDs (active customers) for each region, grouped by month, with promotion active during those months. The catch is that some promotions might overlap across different dates within a month, making it challenging to accurately count active customers without considering these overlaps.

Step 1: Preparing the Data

Before we dive into the SQL query, let’s assume we have a table named customers_table with columns for customer_id, region, and start_time (the start date of the promotion). We’ll also use end_time to track the end date of each promotion.

# customers_table
| customer_id | region    | start_time  | end_time   |
|-------------|-----------|-------------|------------|
| 1           | A         | 2022-06-01  | 2022-06-30  |
| 2           | B         | 2022-07-01  | 2022-08-31  |
| ...         | ...       | ...          | ...        |

Step 2: Creating the Query

To find the count of active customers per month, we’ll use a combination of CASE, WHEN, and OR statements within our SQL query. We’ll start by wrapping each WHEN statement in parentheses to ensure proper grouping.

First Month (June)

# WHEN for June
    WHEN 
     (DATE (start_time) >= '2022-06-01' AND DATE(end_time) <= '2022-06-30')
     OR 
     ( DATE(start_time) < '2022-06-01'AND DATE_TRUNC(end_time,month) = '2022-06-01')
     OR 
     (DATE(end_time)>='2022-06-30' AND DATE_TRUNC(start_time,month) ='2022-06-01') 
    THEN '2022-06-01'

Second Month (July)

# WHEN for July
    WHEN 
     (DATE (start_time) >= '2022-07-01' AND DATE(end_time) <= '2022-07-31')
     OR 
     ( DATE(start_time) < '2022-07-01'AND DATE_TRUNC(end_time,month) = '2022-07-01')
     OR 
     (DATE(end_time)>='2022-07-31' AND DATE_TRUNC(start_time,month) ='2022-07-01') 
    THEN '2022-07-01'

Third Month (August)

# WHEN for August
    WHEN 
     (DATE (start_time) >= '2022-08-01' AND DATE(end_time) <= '2022-08-31')
     OR 
     ( DATE(start_time) < '2022-08-01'AND DATE_TRUNC(end_time,month) = '2022-08-01')
     OR 
     (DATE(end_time)>='2022-08-31' AND DATE_TRUNC(start_time,month) ='2022-08-01') 
    THEN '2022-08-01'

Step 3: Final Query

Now that we have the WHEN statements for each month, let’s put it all together in a single SQL query.

SELECT
    region,
    STRING(DATE(DATE_TRUNC(start_time,month))) AS month,
    COUNT(DISTINCT customer_id) AS active_customers
FROM customers_table 
GROUP BY region, month

Step 4: Explanation

In this final query, we group the data by region and month, using the DATE_TRUNC function to standardize the date ranges. We then count the distinct customer_id values for each group.

Note that when dealing with overlapping promotions, it’s essential to consider how these overlaps impact your analysis. In this case, our query only counts customers who have an active promotion during a specific month, without regard to overlap.

Conclusion

In conclusion, finding active customers per month in BigQuery using SQL involves creating a complex CASE statement that filters data based on specific date ranges. By using the DATE_TRUNC function and strategically crafting the WHEN statements, we can accurately count active customers for each region and month.


Last modified on 2025-01-02