Filtering Records Based on Specific Conditions in SQL Using BigQuery Standard SQL and CTEs

Filtering Records Based on Specific Conditions in SQL

======================================================

SQL is a powerful language used to manage and manipulate data in relational databases. When working with large datasets, it’s essential to be able to filter records based on specific conditions. In this article, we’ll explore how to do just that using SQL.

Problem Statement


Suppose you have a table named ticket_lc containing information about tickets. The table has several columns, including ticket_id, status, and others. You want to retrieve only the records where the status column is either “assigned”, “closed”, or “resolved”.

Using BigQuery Standard SQL


BigQuery is a cloud-based data warehouse that uses Standard SQL for querying. The following example demonstrates how to use BigQuery Standard SQL to filter records based on specific conditions.

#standardSQL
SELECT ticket_id
FROM `project.dataset.ticket_lc`
GROUP BY ticket_id
HAVING COUNT(DISTINCT status) = 3
AND COUNTIF(LOWER(status) NOT IN ('assigned', 'closed', 'resolved')) = 0 

In this example, we’re using the following SQL keywords and functions:

  • COUNT(DISTINCT status) counts the number of unique values in the status column.
  • COUNTIF() is a function that counts the number of rows where the condition specified in the argument is true. In this case, it checks if the status value is not in the list of excluded values.
  • LOWER(status) converts the status value to lowercase to ensure the comparison is case-insensitive.

Explanation


The query works as follows:

  1. It groups the records by the ticket_id column.
  2. For each group, it counts the number of unique status values using COUNT(DISTINCT status).
  3. It checks if this count is equal to 3, meaning all three possible statuses are present in the group.
  4. If the condition in step 3 is true, it then checks if there are any records with a status value that is not in the list of excluded values using COUNTIF().
  5. If this count is also zero, it means all records in the group have a valid status.

The result set contains only one row for each ticket_id group that meets both conditions.

Example Use Case


Suppose we have the following sample data:

#standardSQL
WITH `project.dataset.ticket_lc` AS (
  SELECT 101 ticket_id, 'Assigned' status UNION ALL
  SELECT 101, 'Pending' UNION ALL
  SELECT 101, 'Resolved' UNION ALL
  SELECT 101, 'Closed' UNION ALL
  SELECT 102, 'Assigned' UNION ALL
  SELECT 102, 'Resolved' UNION ALL
  SELECT 102, 'Closed' UNION ALL
  SELECT 103, 'Assigned' UNION ALL
  SELECT 103, 'Pending' UNION ALL
  SELECT 103, 'Pending' UNION ALL
  SELECT 103, 'Assigned' UNION ALL
  SELECT 103, 'Resolved' UNION ALL
  SELECT 103, 'Closed'
)
SELECT ticket_id
FROM `project.dataset.ticket_lc`
GROUP BY ticket_id
HAVING COUNT(DISTINCT status) = 3
AND COUNTIF( LOWER(status) NOT IN ('assigned', 'closed', 'resolved')) = 0 

Running this query would return only the record with ticket_id equal to 102, since it’s the only group that meets both conditions.

Alternative Approach using Common Table Expressions (CTEs)


For more complex queries, we can use Common Table Expressions (CTEs) to simplify the code and improve readability. Here’s an example:

#standardSQL
WITH valid_status AS (
  SELECT 'assigned', 'closed', 'resolved' as status
),
invalid_status AS (
  SELECT 'pending'
)
SELECT t.ticket_id
FROM `project.dataset.ticket_lc` t
WHERE t.status IN (valid_status.status)

In this example, we define two CTEs: valid_status and invalid_status. The valid_status CTE contains the three valid statuses, while the invalid_status CTE contains a single invalid status. We then join these CTEs with the original table using an IN clause to check if each record’s status is in the list of valid statuses.

This approach can be useful when dealing with complex conditions or multiple conditions that need to be evaluated together.

Conclusion


Filtering records based on specific conditions is a common requirement in data analysis and reporting. By understanding how to use SQL keywords and functions, we can efficiently retrieve relevant data from large datasets. In this article, we’ve explored how to use BigQuery Standard SQL to filter records using COUNT(DISTINCT status), COUNTIF(), and Common Table Expressions (CTEs).


Last modified on 2025-03-25