Filtering Records Based on Specific Conditions in SQL
======================================================
SQL is a powerful language used to manage and manipulate data in relational databases. When working with large datasets, it’s essential to be able to filter records based on specific conditions. In this article, we’ll explore how to do just that using SQL.
Problem Statement
Suppose you have a table named ticket_lc
containing information about tickets. The table has several columns, including ticket_id
, status
, and others. You want to retrieve only the records where the status
column is either “assigned”, “closed”, or “resolved”.
Using BigQuery Standard SQL
BigQuery is a cloud-based data warehouse that uses Standard SQL for querying. The following example demonstrates how to use BigQuery Standard SQL to filter records based on specific conditions.
#standardSQL
SELECT ticket_id
FROM `project.dataset.ticket_lc`
GROUP BY ticket_id
HAVING COUNT(DISTINCT status) = 3
AND COUNTIF(LOWER(status) NOT IN ('assigned', 'closed', 'resolved')) = 0
In this example, we’re using the following SQL keywords and functions:
COUNT(DISTINCT status)
counts the number of unique values in thestatus
column.COUNTIF()
is a function that counts the number of rows where the condition specified in the argument is true. In this case, it checks if thestatus
value is not in the list of excluded values.LOWER(status)
converts thestatus
value to lowercase to ensure the comparison is case-insensitive.
Explanation
The query works as follows:
- It groups the records by the
ticket_id
column. - For each group, it counts the number of unique
status
values usingCOUNT(DISTINCT status)
. - It checks if this count is equal to 3, meaning all three possible statuses are present in the group.
- If the condition in step 3 is true, it then checks if there are any records with a
status
value that is not in the list of excluded values usingCOUNTIF()
. - If this count is also zero, it means all records in the group have a valid status.
The result set contains only one row for each ticket_id
group that meets both conditions.
Example Use Case
Suppose we have the following sample data:
#standardSQL
WITH `project.dataset.ticket_lc` AS (
SELECT 101 ticket_id, 'Assigned' status UNION ALL
SELECT 101, 'Pending' UNION ALL
SELECT 101, 'Resolved' UNION ALL
SELECT 101, 'Closed' UNION ALL
SELECT 102, 'Assigned' UNION ALL
SELECT 102, 'Resolved' UNION ALL
SELECT 102, 'Closed' UNION ALL
SELECT 103, 'Assigned' UNION ALL
SELECT 103, 'Pending' UNION ALL
SELECT 103, 'Pending' UNION ALL
SELECT 103, 'Assigned' UNION ALL
SELECT 103, 'Resolved' UNION ALL
SELECT 103, 'Closed'
)
SELECT ticket_id
FROM `project.dataset.ticket_lc`
GROUP BY ticket_id
HAVING COUNT(DISTINCT status) = 3
AND COUNTIF( LOWER(status) NOT IN ('assigned', 'closed', 'resolved')) = 0
Running this query would return only the record with ticket_id
equal to 102, since it’s the only group that meets both conditions.
Alternative Approach using Common Table Expressions (CTEs)
For more complex queries, we can use Common Table Expressions (CTEs) to simplify the code and improve readability. Here’s an example:
#standardSQL
WITH valid_status AS (
SELECT 'assigned', 'closed', 'resolved' as status
),
invalid_status AS (
SELECT 'pending'
)
SELECT t.ticket_id
FROM `project.dataset.ticket_lc` t
WHERE t.status IN (valid_status.status)
In this example, we define two CTEs: valid_status
and invalid_status
. The valid_status
CTE contains the three valid statuses, while the invalid_status
CTE contains a single invalid status. We then join these CTEs with the original table using an IN
clause to check if each record’s status is in the list of valid statuses.
This approach can be useful when dealing with complex conditions or multiple conditions that need to be evaluated together.
Conclusion
Filtering records based on specific conditions is a common requirement in data analysis and reporting. By understanding how to use SQL keywords and functions, we can efficiently retrieve relevant data from large datasets. In this article, we’ve explored how to use BigQuery Standard SQL to filter records using COUNT(DISTINCT status)
, COUNTIF()
, and Common Table Expressions (CTEs).
Last modified on 2025-03-25