SQL Athena: Counting Business Days Between Two Dates
Introduction
In this article, we’ll explore how to count business days between two dates in Amazon Athena, a fully managed data warehouse service. We’ll use SQL queries to achieve this, along with some background information and explanations of key concepts.
Background Information
Amazon Athena is a serverless query engine that’s designed for fast and cost-effective analysis of data stored in Amazon S3. It supports a wide range of data formats, including CSV, JSON, Parquet, and ORC. When it comes to querying dates and time intervals, Athena has several built-in functions that can help us achieve our goals.
Day_of_Week Function
One of the most important functions for date-related queries is the day_of_week
function. This function returns the day of the week as an integer (1 = Sunday, 2 = Monday, …, 7 = Saturday). We’ll use this function to filter out non-business days from our sequence.
Sequence Function
Another essential function for our query is the sequence
function. This function generates a sequence of dates between two given dates and an interval. In our case, we want to generate a sequence of business days between two dates.
Cardinality Function
The cardinality
function is used to count the number of elements in a set or array. We’ll use this function to count the number of business days in our sequence.
Querying Business Days
To query business days, we need to generate a sequence of dates between two given dates and an interval, filter out non-business days using the day_of_week
function, and then count the remaining business days using the cardinality
function. Let’s take a look at the SQL code:
WITH dataset(start_date, end_date) AS (
values
(date '2021-10-01', date '2021-10-05'),
(date '2021-10-01', date '2021-10-03'),
(date '2021-10-02', date '2021-10-10'),
(date '2021-10-02', date '2021-10-08'),
(date '2021-10-02', date '2021-10-05')
)
-- query
select start_date,
end_date,
cardinality(filter(
sequence(start_date, end_date, interval '1' day),
d -> day_of_week(d) not in (6,7)
)) business_days
from dataset;
Explanation
In this code snippet:
- We define a
dataset
CTE with five rows, each containing two dates: start date and end date. - The query selects the start date, end date, and counts the number of business days using the
cardinality
function. - Inside the
filter
clause:- The
sequence
function generates a sequence of dates between the start date and end date with an interval of one day. - We use the
day_of_week
function to get the day of the week for each date in the sequence. - We exclude non-business days by using a boolean expression (
d -> day_of_week(d) not in (6,7)
).
- The
- The final result set contains three columns: start date, end date, and business days.
Counting Working Days
Counting working days is indeed harder than counting business days, as it requires an external dictionary table to cover holidays and special events. This approach can be useful for more complex use cases, but it also adds complexity to the query.
Conclusion
In this article, we explored how to count business days between two dates in Amazon Athena using SQL queries. We discussed key concepts, such as the day_of_week
function, sequence function, and cardinality function. By combining these functions, we can efficiently generate a sequence of business days and count them accurately.
Common Use Cases
- Counting working days: This approach is more complex, but it’s useful for scenarios where you need to exclude holidays and special events.
- Counting weekend days: If you only want to count weekends (Saturday and Sunday), you can modify the boolean expression in the
filter
clause to include (6,7) instead of excluding them.
Best Practices
- Use meaningful variable names for clarity and readability.
- Consider caching intermediate results to avoid redundant computations.
- Optimize queries by using efficient data types, indexes, and query optimization techniques.
Last modified on 2024-09-14