Calculating Percentages Between Two Columns in SQL Using PostgreSQL

Calculating Percentages Between Two Columns in SQL

Calculating percentages between two columns can be a useful operation in various data analysis tasks. In this article, we will explore how to achieve this using SQL.

Background and Prerequisites

To calculate percentages between two columns, you need to have the following:

A table with columns that represent the values for which you want to calculate the percentage
Basic knowledge of SQL syntax

In this article, we will focus on PostgreSQL as our target database system. While other databases like MySQL or SQLite might have similar operations, the syntax might differ.

Problem Description and Sample Data

Let’s consider a table that contains information about items delivered and opened over time:

Date	Delivered	Opened
01/04/2021	1	1
01/04/2021	1	1
01/04/2021	1	1
08/05/2021	1	1
08/05/2021	1	1
10/03/2021	1	1
10/03/2021	1	1

We want to calculate the percentage of opened items with respect to delivered items for each month.

Solution Overview

To solve this problem, we will use PostgreSQL’s window functions and analytic functions. These functions enable us to perform calculations across rows that are related by a condition.

We will calculate two separate values: one for ratio_opened, which represents the proportion of opened items with respect to delivered items for each month; and another value will be needed to combine these into a percentage value (which we can use the ratio to find) in our desired output format:

SELECT 
    opened, 
    delivered,
    EXTRACT(MONTH FROM date) AS "date_month",
    (
        SUM(opened) OVER(PARTITION BY year(date), month(date)) * 1.0 / 
        SUM(delivered) OVER(PARTITION BY year(date), month(date))
    ) AS ratio_opened
FROM table;

This query will return the total number of opened and delivered items for each month, along with the calculated ratio_opened value.

Explanation

The EXTRACT(MONTH FROM date) function extracts the month part from the date column. We use this in conjunction with other functions that enable us to partition our data by both year and month (year(date), month(date)).

We calculate the ratio of opened items to delivered items for each partition using the following code:

SUM(opened) OVER(PARTITION BY year(date), month(date)) * 1.0 / 
SUM(delivered) OVER(PARTITION BY year(date), month(date))

The OVER() clause specifies that we want to calculate the sum for each group of rows with matching year and month. The PARTITION BY clause ensures that this grouping is based on both columns.

Since some databases perform integer division when dividing integers, it’s good practice to multiply one of the values by a decimal constant (like 1.0 in our example), ensuring that we always get a floating-point result.

Desired Output Format

To convert the ratio into a percentage value as required in your original request:

SELECT 
    opened, 
    delivered,
    EXTRACT(MONTH FROM date) AS "date_month",
    (SUM(opened) OVER(PARTITION BY year(date), month(date)) * 1.0 / 
     SUM(delivered) OVER(PARTITION BY year(date), month(date))) AS ratio_opened
FROM table;

We can calculate this percentage value by multiplying the ratio_opened with 100.

However, if there are no delivered items for a particular month in our dataset, we will get an undefined result. To avoid this issue:

SELECT 
    opened, 
    delivered,
    EXTRACT(MONTH FROM date) AS "date_month",
    COALESCE(
        SUM(opened) OVER(PARTITION BY year(date), month(date)) * 1.0 / 
        SUM(delivered) OVER(PARTITION BY year(date), month(date)),
        0
    ) AS ratio_opened_percentage
FROM table;

We use the COALESCE() function to replace undefined results with a specific value (in this case, 0).

Handling Non-numeric Data

If there are any NULL values in either the delivered or opened columns, these will prevent us from being able to perform division and calculate our desired percentage. To handle such data:

SELECT 
    opened, 
    delivered,
    EXTRACT(MONTH FROM date) AS "date_month",
    COALESCE(
        SUM(CASE WHEN delivered IS NOT NULL THEN 1 ELSE 0 END) OVER(PARTITION BY year(date), month(date)) * 
        1.0 / 
        SUM(CASE WHEN delivered IS NOT NULL THEN 1 ELSE 0 END) OVER(PARTITION BY year(date), month(date)),
        0
    ) AS ratio_opened_percentage
FROM table;

We use a CASE statement within the SUM() function to only count rows where delivered is not NULL. This way, if there are any NULL values, they won’t interfere with our calculation.

Handling NULL Values in Date Columns

If you also want to handle cases where date is null (which could happen for records entered in the future), you might need additional logic:

SELECT 
    opened, 
    delivered,
    EXTRACT(MONTH FROM COALESCE(date, '9999-12-31')) AS "date_month",
    COALESCE(
        SUM(CASE WHEN delivered IS NOT NULL THEN 1 ELSE 0 END) OVER(PARTITION BY year(date), month(date)) * 
        1.0 / 
        SUM(CASE WHEN delivered IS NOT NULL THEN 1 ELSE 0 END) OVER(PARTITION BY year(date), month(date)),
        0
    ) AS ratio_opened_percentage
FROM table;

In this modified version, we use the COALESCE() function to replace any null values in our date column with a known value. This way, even if there’s an entry in the future where the date is null, it won’t affect our calculations for past dates.

Example Use Cases

This query can be used to find the proportion of opened items to total deliveries for each month across all records in a table.

For instance:

SELECT 
    "date_month",
    ratio_opened_percentage
FROM (
    SELECT 
        opened, 
        delivered,
        EXTRACT(MONTH FROM COALESCE(date, '9999-12-31')) AS "date_month",
        COALESCE(
            SUM(CASE WHEN delivered IS NOT NULL THEN 1 ELSE 0 END) OVER(PARTITION BY year(date), month(date)) * 
            1.0 / 
            SUM(CASE WHEN delivered IS NOT NULL THEN 1 ELSE 0 END) OVER(PARTITION BY year(date), month(date)),
            0
        ) AS ratio_opened_percentage
    FROM table;
)
WHERE ratio_opened_percentage > 0.5;  -- returns rows where opened items exceed 50% of deliveries.

You can use this query to find the proportion of opened items over total deliveries for each month, and then filter these results based on some condition.

Handling Additional Records

The code provided above assumes that there are no additional records or any other fields in your dataset. However, if you have more columns or need to incorporate data from another table, you will need to adapt the query accordingly.

In general, you should be prepared to handle missing values and perform calculations accordingly, depending on how your dataset is structured.

Additional PostgreSQL Query Features

There are many additional features available in PostgreSQL that can help with this kind of problem. For example:

Joining two tables together: SELECT * FROM table1 JOIN table2 ON table1.column = table2.column;
Grouping rows by multiple columns: SELECT column1, column2, SUM(column3) FROM table GROUP BY column1, column2;
Filtering results with conditions (e.g., WHERE clause): SELECT * FROM table WHERE condition;

It’s worth noting that these features can be used to further adapt and refine the code provided in this example.

Last modified on 2025-04-17