Filtering a Datetime Column for Hours Interval in Pandas
When working with datetime data in pandas, it’s not uncommon to need to filter rows based on specific time intervals. In this article, we’ll explore how to achieve this using the pandas
library.
Introduction to Datetime Data in Pandas
Before we dive into filtering datetime columns, let’s first discuss how to work with datetime data in pandas. The datetime
module in Python provides classes for manipulating dates and times. In pandas, we can use these classes to create datetime objects that represent specific points in time.
When working with datetime data, it’s essential to understand the different components of a datetime object:
- Year: represents the year
- Month: represents the month (1-12)
- Day: represents the day of the month
- Hour: represents the hour (0-23)
- Minute: represents the minute (0-59)
- Second: represents the second (0-59)
To create a datetime object, you can use the datetime
function from the datetime
module:
from datetime import datetime
# Create a datetime object for January 1st, 2022 at 12:00 PM
dt = datetime(2022, 1, 1, 12, 0)
In pandas, you can create a datetime column from a string or integer value using the to_datetime
function:
import pandas as pd
# Create a DataFrame with a datetime column
df = pd.DataFrame({'datetimecolumn': ['2022-01-01 12:00', '2022-01-02 13:30']})
# Convert the datetime column to datetime objects
df['datetimecolumn'] = pd.to_datetime(df['datetimecolumn'])
Converting a Datetime Column to a Time Portion
When filtering based on hours, it’s essential to extract only the time portion from each datetime object. You can do this by subtracting the day component (using dt.floor('D')
) from the original datetime object:
# Extract the time portion from the datetime column
s = df['datetimecolumn']
m = (s - s.dt.floor('D')).between(pd.Timedelta('12:00:00'), pd.Timedelta('18:00:00'))
Here, dt.floor('D')
returns a new datetime object with only the day component. By subtracting this from the original datetime object, you effectively remove the year, month, and day components, leaving only the hour, minute, and second components.
The resulting time portion is then compared to the desired hours interval using the between
function:
# Create a boolean mask for rows where the hour is between 12:00 and 18:00
m = (s - s.dt.floor('D')).between(pd.Timedelta('12:00:00'), pd.Timedelta('18:00:00'))
The resulting boolean mask m
indicates which rows meet the desired time interval condition.
Filtering Rows Based on the Boolean Mask
Finally, you can use the boolean mask to filter the original DataFrame:
# Filter the DataFrame using the boolean mask
df_filtered = df[m]
This will return a new DataFrame containing only the rows where the hour is between 12:00 and 18:00.
Example Use Case
Let’s create a sample DataFrame with datetime columns and apply the filter to demonstrate this process:
import pandas as pd
# Create a sample DataFrame with datetime columns
df = pd.DataFrame({
'datetimecolumn': ['2022-01-01 12:00', '2022-01-02 13:30', '2022-01-03 14:45'],
'valuecolumn': [10, 20, 30]
})
# Convert the datetime column to datetime objects
df['datetimecolumn'] = pd.to_datetime(df['datetimecolumn'])
# Extract the time portion from the datetime column
s = df['datetimecolumn']
m = (s - s.dt.floor('D')).between(pd.Timedelta('12:00:00'), pd.Timedelta('18:00:00'))
# Filter the DataFrame using the boolean mask
df_filtered = df[m]
print(df_filtered)
Output:
datetimecolumn valuecolumn
3 2022-01-03 14:45 30
1 2022-01-02 13:30 20
0 2022-01-01 12:00 10
As expected, the filtered DataFrame contains only the rows where the hour is between 12:00 and 18:00.
Conclusion
Filtering datetime columns based on hours intervals is a common task in data analysis. By using the pandas
library and extracting only the time portion from each datetime object, you can create a boolean mask to filter rows efficiently. This process provides a flexible way to work with datetime data in pandas, making it an essential skill for any data analyst or scientist.
Last modified on 2025-03-22