Function to Add Weekdays from Date in Databricks using SQL
Introduction
In this article, we’ll explore how to create a generic function in Databricks that adds a number of weekdays to a date. We’ll delve into the challenges of referencing outer query expressions outside of WHERE/HAVING clauses and provide solutions to overcome these limitations.
Main Issue
The main issue here is that Databricks does not support referencing dt_initial
directly in the WHERE clause when it’s not already present in the table being filtered.
CREATE OR REPLACE FUNCTION add_weekdays(dt_initial DATE, num_days INT)
RETURNS DATE
RETURN (
WITH days AS (
SELECT EXPLODE(SEQUENCE(0, num_days * 2)) AS day_offset
),
valid_days AS (
SELECT DATE_ADD(dt_initial, day_offset) AS business_date,
ROW_NUMBER() OVER (ORDER BY day_offset) AS rn
FROM days
WHERE WEEKDAY(DATE_ADD(dt_initial, day_offset)) BETWEEN 0 AND 4
)
SELECT business_date FROM valid_days WHERE rn = num_days + 1 limit 1
);
This approach will not work due to the error UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.CORRELATED_REFERENCE
which indicates that expressions referencing the outer query are not supported outside of WHERE/HAVING clauses.
Solution
One way to get around this limitation is by including an additional common table expression and performing a join so that all necessary columns are present in the table being filtered.
CREATE OR REPLACE FUNCTION ADD_WEEKDAYS(dt_initial DATE, num_days INT)
RETURNS DATE
RETURN (
WITH initials AS (
SELECT dt_initial AS initial_date
),
days AS (
SELECT EXPLODE(SEQUENCE(0, num_days * 2)) AS day_offset
),
valid_days AS (
SELECT DATE_ADD(initial_date, day_offset) AS business_date,
ROW_NUMBER() OVER (ORDER BY day_offset) AS rn
-- perform a join so that all columns are present
FROM initials, days
WHERE WEEKDAY(DATE_ADD(initial_date, day_offset)) BETWEEN 0 AND 4
)
SELECT MAX(business_date) FROM valid_days WHERE rn = num_days + 1
);
Additional Considerations
To make the function more robust, we should also consider the following:
- Add support for adding zero days to handle cases where no weekdays need to be added.
- Include a mechanism to skip weekends when adding weekdays to the date.
- Perform extensive testing with various input scenarios.
Bonus: Python Scalar Function Implementation
Here’s an example of how you can implement this function in Python, which is not limited by the SQL restrictions:
CREATE OR REPLACE FUNCTION ADD_WEEKDAYS(
X DATE COMMENT 'DATE TO ADD N WEEKDAYS',
N INT COMMENT 'INTEGER WEEKDAYS TO ADD TO X'
)
RETURNS DATE
LANGUAGE PYTHON
COMMENT 'ADDS N WEEKDAYS TO A DATE (SKIPS WEEKENDS)'
AS $$
from datetime import timedelta
def add_weekdays(x, n):
"""Adds n weekdays to a date (skips weekends)"""
current_date = x
added_days = 0
while added_days < n:
current_date += timedelta(days=1)
if current_date.weekday() < 5: # Monday is weekday 0 and Sunday is weekday 6
added_days += 1
return current_date
return add_weekdays(X, N) if X and N else None
$$
Conclusion
In this article, we explored how to create a function in Databricks that adds weekdays to a date. We discussed the challenges of referencing outer query expressions outside of WHERE/HAVING clauses and provided solutions using common table expressions with joins. Additionally, we examined how implementing the same functionality in Python can provide more flexibility and robustness.
Last modified on 2024-04-05