Calculating Percentages Between Two Columns in SQL
Calculating percentages between two columns can be a useful operation in various data analysis tasks. In this article, we will explore how to achieve this using SQL.
Background and Prerequisites
To calculate percentages between two columns, you need to have the following:
- A table with columns that represent the values for which you want to calculate the percentage
- Basic knowledge of SQL syntax
In this article, we will focus on PostgreSQL as our target database system. While other databases like MySQL or SQLite might have similar operations, the syntax might differ.
Problem Description and Sample Data
Let’s consider a table that contains information about items delivered and opened over time:
Date | Delivered | Opened |
---|---|---|
01/04/2021 | 1 | 1 |
01/04/2021 | 1 | 1 |
01/04/2021 | 1 | 1 |
08/05/2021 | 1 | 1 |
08/05/2021 | 1 | 1 |
10/03/2021 | 1 | 1 |
10/03/2021 | 1 | 1 |
We want to calculate the percentage of opened items with respect to delivered items for each month.
Solution Overview
To solve this problem, we will use PostgreSQL’s window functions and analytic functions. These functions enable us to perform calculations across rows that are related by a condition.
We will calculate two separate values: one for ratio_opened
, which represents the proportion of opened items with respect to delivered items for each month; and another value will be needed to combine these into a percentage value (which we can use the ratio to find) in our desired output format:
SELECT
opened,
delivered,
EXTRACT(MONTH FROM date) AS "date_month",
(
SUM(opened) OVER(PARTITION BY year(date), month(date)) * 1.0 /
SUM(delivered) OVER(PARTITION BY year(date), month(date))
) AS ratio_opened
FROM table;
This query will return the total number of opened and delivered items for each month, along with the calculated ratio_opened
value.
Explanation
The EXTRACT(MONTH FROM date)
function extracts the month part from the date
column. We use this in conjunction with other functions that enable us to partition our data by both year and month (year(date)
, month(date)
).
We calculate the ratio of opened items to delivered items for each partition using the following code:
SUM(opened) OVER(PARTITION BY year(date), month(date)) * 1.0 /
SUM(delivered) OVER(PARTITION BY year(date), month(date))
The OVER()
clause specifies that we want to calculate the sum for each group of rows with matching year
and month
. The PARTITION BY
clause ensures that this grouping is based on both columns.
Since some databases perform integer division when dividing integers, it’s good practice to multiply one of the values by a decimal constant (like 1.0 in our example), ensuring that we always get a floating-point result.
Desired Output Format
To convert the ratio into a percentage value as required in your original request:
SELECT
opened,
delivered,
EXTRACT(MONTH FROM date) AS "date_month",
(SUM(opened) OVER(PARTITION BY year(date), month(date)) * 1.0 /
SUM(delivered) OVER(PARTITION BY year(date), month(date))) AS ratio_opened
FROM table;
We can calculate this percentage value by multiplying the ratio_opened
with 100.
However, if there are no delivered items for a particular month in our dataset, we will get an undefined result. To avoid this issue:
SELECT
opened,
delivered,
EXTRACT(MONTH FROM date) AS "date_month",
COALESCE(
SUM(opened) OVER(PARTITION BY year(date), month(date)) * 1.0 /
SUM(delivered) OVER(PARTITION BY year(date), month(date)),
0
) AS ratio_opened_percentage
FROM table;
We use the COALESCE()
function to replace undefined results with a specific value (in this case, 0).
Handling Non-numeric Data
If there are any NULL values in either the delivered
or opened
columns, these will prevent us from being able to perform division and calculate our desired percentage. To handle such data:
SELECT
opened,
delivered,
EXTRACT(MONTH FROM date) AS "date_month",
COALESCE(
SUM(CASE WHEN delivered IS NOT NULL THEN 1 ELSE 0 END) OVER(PARTITION BY year(date), month(date)) *
1.0 /
SUM(CASE WHEN delivered IS NOT NULL THEN 1 ELSE 0 END) OVER(PARTITION BY year(date), month(date)),
0
) AS ratio_opened_percentage
FROM table;
We use a CASE
statement within the SUM()
function to only count rows where delivered
is not NULL. This way, if there are any NULL values, they won’t interfere with our calculation.
Handling NULL Values in Date Columns
If you also want to handle cases where date
is null (which could happen for records entered in the future), you might need additional logic:
SELECT
opened,
delivered,
EXTRACT(MONTH FROM COALESCE(date, '9999-12-31')) AS "date_month",
COALESCE(
SUM(CASE WHEN delivered IS NOT NULL THEN 1 ELSE 0 END) OVER(PARTITION BY year(date), month(date)) *
1.0 /
SUM(CASE WHEN delivered IS NOT NULL THEN 1 ELSE 0 END) OVER(PARTITION BY year(date), month(date)),
0
) AS ratio_opened_percentage
FROM table;
In this modified version, we use the COALESCE()
function to replace any null values in our date column with a known value. This way, even if there’s an entry in the future where the date is null, it won’t affect our calculations for past dates.
Example Use Cases
This query can be used to find the proportion of opened items to total deliveries for each month across all records in a table.
For instance:
SELECT
"date_month",
ratio_opened_percentage
FROM (
SELECT
opened,
delivered,
EXTRACT(MONTH FROM COALESCE(date, '9999-12-31')) AS "date_month",
COALESCE(
SUM(CASE WHEN delivered IS NOT NULL THEN 1 ELSE 0 END) OVER(PARTITION BY year(date), month(date)) *
1.0 /
SUM(CASE WHEN delivered IS NOT NULL THEN 1 ELSE 0 END) OVER(PARTITION BY year(date), month(date)),
0
) AS ratio_opened_percentage
FROM table;
)
WHERE ratio_opened_percentage > 0.5; -- returns rows where opened items exceed 50% of deliveries.
You can use this query to find the proportion of opened items over total deliveries for each month, and then filter these results based on some condition.
Handling Additional Records
The code provided above assumes that there are no additional records or any other fields in your dataset. However, if you have more columns or need to incorporate data from another table, you will need to adapt the query accordingly.
In general, you should be prepared to handle missing values and perform calculations accordingly, depending on how your dataset is structured.
Additional PostgreSQL Query Features
There are many additional features available in PostgreSQL that can help with this kind of problem. For example:
- Joining two tables together:
SELECT * FROM table1 JOIN table2 ON table1.column = table2.column;
- Grouping rows by multiple columns:
SELECT column1, column2, SUM(column3) FROM table GROUP BY column1, column2;
- Filtering results with conditions (e.g.,
WHERE
clause):SELECT * FROM table WHERE condition;
It’s worth noting that these features can be used to further adapt and refine the code provided in this example.
Last modified on 2025-04-17