How to Fix Unexpected Results Using SQL Partitioning and COALESCE

Understanding the Difference Between Two Groups of Numbers Using SQL and Partitioning

In this article, we’ll delve into the world of SQL partitioning and explore how to use the SUM() function with a partition by clause to find the difference between two groups of numbers. We’ll examine a specific example from Stack Overflow where the author is using a join to combine data from two tables and applies a complex calculation to determine the burn-down percentage for each campaign.

Table Structure

Before we dive into the SQL query, let’s take a look at the table structure:

CREATE TABLE adjusted_loaded_count (
  calendardate DATE,
  campaign_name VARCHAR(255),
  loaded_count INT
);

CREATE TABLE adjusted_daily_calls_completed (
  calendardate DATE,
  campaign_name VARCHAR(255),
  completed_call_count INT
);

The adjusted_loaded_count table contains data on the number of calls made by each campaign, while the adjusted_daily_calls_completed table contains similar information for completed calls.

SQL Query

The author’s original query looks like this:

SELECT 
  lc.calendardate,
  lc.campaign_name,
  lc.loaded_count,
  dcc.calendardate,
  dcc.campaign_name,
  SUM(cast(lc.loaded_count as int) - cast(dcc.completed_call_count as int)) OVER (PARTITION BY lc.campaign_name ORDER BY lc.calendardate ASC) AS burn_down
FROM 
  adjusted_loaded_count as lc
  LEFT JOIN adjusted_daily_calls_completed as dcc ON 
    lc.calendardate = dcc.calendardate AND 
    lc.campaign_name = dcc.campaign_name
WHERE 
  lc.campaign_name IS NOT NULL;

This query joins the two tables on the calendardate and campaign_name columns, then applies a complex calculation to determine the burn-down percentage for each campaign.

The Issue

The author is getting unexpected results in the burn_down column. Instead of the expected values, they’re getting -54 and -55. This seems counterintuitive, given that the original query should be calculating the difference between the two groups of numbers.

The Problem with NULL Values

To understand why this might be happening, let’s take a closer look at how SQL handles NULL values. When a column is NULL, it means that there is no value present in that cell. In the context of our query, when we’re joining the two tables, if a row has a NULL value in one of the columns, the entire row will be considered NULL.

Treating NULL Values as Zeroes

The solution to this problem lies in treating NULL values as zeroes. We can do this using the COALESCE function, which returns the first non-NULL argument.

The Correct Query

Here’s the corrected query:

SELECT 
  lc.calendardate,
  lc.campaign_name,
  lc.loaded_count,
  dcc.calendardate,
  dcc.campaign_name,
  SUM(
    COALESCE(CAST(lc.loaded_count as int), 0) - 
    COALESCE(CAST(dcc.completed_call_count as int), 0)
  ) OVER (PARTITION BY lc.campaign_name ORDER BY lc.calendardate ASC) AS burn_down
FROM 
  adjusted_loaded_count as lc
  LEFT JOIN adjusted_daily_calls_completed as dcc ON 
    lc.calendardate = dcc.calendardate AND 
    lc.campaign_name = dcc.campaign_name
WHERE 
  lc.campaign_name IS NOT NULL;

By using COALESCE, we ensure that if a row has a NULL value in either the loaded_count or completed_call_count column, it will be treated as zero instead.

The Intermediate Results

To illustrate how this works, let’s take a look at some intermediate results:

+------------+-------------+----------+------------+-------------+--------+
| calendardate | campaign_name | loaded_count | calendardate | campaign_name | completed_call_count |
+------------+-------------+----------+------------+-------------+--------+
| 2023-09-05 | Ex-000010   | 62        | NULL       | NULL         | NULL      |
+------------+-------------+----------+------------+-------------+--------+

+------------+-------------+----------+------------+-------------+--------+
| calendardate | campaign_name | loaded_count | calendardate | campaign_name | completed_call_count |
+------------+-------------+----------+------------+-------------+--------+
| 2023-09-06 | Ex-000010   | 0          | 2023-09-06 | Ex-000010   | 54        |
+------------+-------------+----------+------------+-------------+--------+

+------------+-------------+----------+------------+-------------+--------+
| calendardate | campaign_name | loaded_count | calendardate | campaign_name | completed_call_count |
+------------+-------------+----------+------------+-------------+--------+
| 2023-09-08 | Ex-000010   | 0          | 2023-09-08 | Ex-000010   | 1          |
+------------+-------------+----------+------------+-------------+--------+

The Final Results

Now, let’s see what happens when we apply the SUM() function with a partition by clause:

+------------+-------------+----------+------------+-------------+--------+
| calendardate | campaign_name | burned_down |
+------------+-------------+----------+------------+-------------+--------+
| 2023-09-05 | Ex-000010   | NULL       | NULL      |
+------------+-------------+----------+------------+-------------+--------+

+------------+-------------+----------+------------+-------------+--------+
| calendardate | campaign_name | burned_down |
| 2023-09-06 | Ex-000010   | -54        |
+------------+-------------+----------+------------+-------------+--------+

+------------+-------------+----------+------------+-------------+--------+
| calendardate | campaign_name | burned_down |
| 2023-09-08 | Ex-000010   | -55        |
+------------+-------------+----------+------------+-------------+--------+

As we can see, the burned_down column is now calculated correctly. The value for row 1 is still NULL, but that’s because there was no corresponding row in the second table.

Conclusion

In this article, we’ve explored how to use SQL partitioning and the SUM() function with a partition by clause to find the difference between two groups of numbers. We’ve also discussed how NULL values can affect the results of our query and how we can treat them as zeroes using the COALESCE function.

By following these steps, you should be able to write your own SQL queries that take into account the differences between two groups of numbers, even when dealing with NULL values. Remember to always check for NULL values and consider using the COALESCE function to ensure accurate results.


Last modified on 2023-06-30