Filtering Conditionally, Where If a Value is Exceeded in Column A, Further Observations Beyond the Respective Timestamp (Column B) are Dropped
Introduction
When dealing with time-series data, it’s not uncommon to come across situations where a threshold value needs to be met for a particular condition. In this case, we’re working with fish telemetry data, and our goal is to filter out observations that occur after the first instance of predation. We’ll explore various approaches using R programming language, leveraging libraries like tidyr
to simplify our code.
Dataset Overview
The provided dataset consists of fish telemetry data, including timestamp of detection, unique ID for the acoustic tag implanted in a fish (TAG
), unique sensor ID (SENSOR.ID
), recorded temperature or depth value (SENSOR.VALUE
), and the type of sensor used (SENSOR
). Our task is to create a filter that removes observations from both temperature and depth sensors once the threshold value of 30C is exceeded.
Step 1: Tagging Predation Values
To begin, we’ll tag the predation values by creating a new column called predated
in our dataset. This column will be set to "predated"
whenever the sensor type is “temp” and the temperature value exceeds 30C, otherwise it will remain as NA
.
library(tidyverse)
fishdat <- tibble::tribble(
~DATE.TIME, ~FISH.TAG, ~SENSOR.ID, ~SENSOR.VALUE, ~SENSOR,
"2019-06-18 20:19:41", 1, 65, 9, "temp",
"2019-06-18 20:20:51", 1, 65, 37, "temp",
"2019-06-18 20:19:22", 1, 66, 1, "depth",
"2019-06-18 20:21:16", 1, 66, 0, "depth",
"2019-06-18 22:27:40", 2, 21, 35, "temp",
"2019-06-18 22:33:57", 2, 21, 38, "temp",
"2019-06-18 22:27:10", 2, 22, 0, "depth",
"2019-06-19 3:18:17", 2, 22, 13, "depth"
)
fishdat_marked <-
fishdat %>%
mutate(predated = ifelse(SENSOR == "temp" & SENSOR.VALUE > 30,
"predated",
NA_character_))
print(fishdat_marked)
Step 2: Cascading Down Predation Marker
Next, we’ll cascade down the predation marker by creating a new dataset where each observation is marked as predated
if its preceding observation was also predated
. This ensures that only the first instance of predation is recorded.
fishdat_filled <-
fishdat_marked %>%
group_by(FISH.TAG) %>% ## for each fish
arrange(DATE.TIME, .by_group = T) %>%
fill(predated, .direction = "down")
print(fishdat_filled)
Step 3: Filtering Observations
Finally, we’ll filter out the observations that occur after the first instance of predation. We can achieve this by filtering the predated
column to only include NA
values.
fishdat_filtered <-
fishdat_filled %>%
filter(is.na(predated))
print(fishdat_filtered)
Conclusion
In this article, we explored various approaches to filtering observations based on a threshold value exceeded in one column. By leveraging the tidyr
library and its fill
function, we simplified our code and made it more efficient. The final dataset provides us with only the first instance of predation for each fish.
Code Used
tibble::tribble
: used to create a sample datasettidyverse
: leveraged for data manipulation using pipes (%>%
)mutate
: used to create new columns in the datasetgroup_by
andarrange
: used to group observations by fish ID and arrange them chronologicallyfill
: used to cascade down the predation markerfilter
: used to remove observations after the first instance of predation
Last modified on 2024-10-27