Change Point Detection using ChangerFind: A Python Implementation
Change point detection is a statistical technique used to identify significant changes or anomalies in a time series data. In this blog post, we will explore how to implement change point detection using the ChangerFind library in Python.
Introduction to ChangerFind
ChangerFind is an open-source library for change point detection in Python. It allows users to detect changes in a time series data with high accuracy and speed. The library uses a novel algorithm that combines the strengths of traditional statistical methods with machine learning techniques.
Assumptions
Before we dive into the implementation, let’s make some assumptions:
- We have a time series dataset
y
that we want to analyze. - We assume that the data is normally distributed and has no missing values.
- We are interested in detecting changes in the mean or variance of the data.
Installing ChangerFind
To use ChangerFind, you need to install it first. You can do this using pip:
pip install changefinder
Importing Libraries
Before we start implementing change point detection, let’s import the necessary libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from changefinder import ChangeFinder
Loading Data
Let’s assume that we have a CSV file templeture_data_.csv
containing our time series data. We can load it using pandas:
df_templeture = pd.read_csv('templeture_data_.csv')
y = pd.Series(df_templeture.templeture.values, index=pd.date_range(start='2019-11-11 22:00:00', periods=len(df_templeture),freq='min'))
Preprocessing Data
Before we apply change point detection, let’s preprocess our data by calculating the first-order difference:
y_diff = y.diff()
Applying Change Point Detection
Now it’s time to apply change point detection using ChangerFind. We will use the ChangeFinder
class from the library and specify the order of differences we want to detect (in this case, 1):
cf = changefinder.ChangeFinder(r=0.01, order=1, smooth=7)
We can then apply change point detection to our data using the update
method:
result = np.empty(len(y_diff))
for i, d in enumerate(y_diff):
result[i] = cf.update(d)
Visualizing Results
Finally, let’s visualize our results by plotting the original data and the detected changes:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(result, label="score")
ax2 = ax.twinx()
ax2.plot(y_diff, alpha=0.3, label="observation")
plt.show()
Discussion
The resulting plot shows the original data on the left and the detected changes on the right. The changes are represented by a score value that indicates how likely it is that there was a change in the data.
However, we notice that the plot only shows a few data points, which seems incorrect. This is because the update
method returns an empty array for most of the values due to the smooth=7
parameter, which means that the algorithm will only detect changes when the score value falls below a certain threshold.
To fix this issue, we need to adjust the parameters of the ChangeFinder
class. In particular, we can try reducing the order
parameter to 1 or even 0 to reduce the sensitivity of the algorithm:
cf = changefinder.ChangeFinder(r=0.01, order=12, smooth=7)
By doing so, we will get a more accurate result that shows all the changes in the data.
Conclusion
In this blog post, we demonstrated how to implement change point detection using ChangerFind in Python. We discussed the assumptions and prerequisites for implementing change point detection, installed the necessary libraries, loaded our data, preprocessed it by calculating the first-order difference, applied change point detection using ChangerFind, visualized the results, and discussed potential issues and solutions.
By following these steps and adjusting the parameters of the ChangeFinder
class, you should be able to detect changes in your time series data with high accuracy.
Last modified on 2025-01-27