Filtering Conditionally: Dropping Observations Beyond the First Predation Timestamp in R

Filtering Conditionally, Where If a Value is Exceeded in Column A, Further Observations Beyond the Respective Timestamp (Column B) are Dropped

Introduction

When dealing with time-series data, it’s not uncommon to come across situations where a threshold value needs to be met for a particular condition. In this case, we’re working with fish telemetry data, and our goal is to filter out observations that occur after the first instance of predation. We’ll explore various approaches using R programming language, leveraging libraries like tidyr to simplify our code.

Dataset Overview

The provided dataset consists of fish telemetry data, including timestamp of detection, unique ID for the acoustic tag implanted in a fish (TAG), unique sensor ID (SENSOR.ID), recorded temperature or depth value (SENSOR.VALUE), and the type of sensor used (SENSOR). Our task is to create a filter that removes observations from both temperature and depth sensors once the threshold value of 30C is exceeded.

Step 1: Tagging Predation Values

To begin, we’ll tag the predation values by creating a new column called predated in our dataset. This column will be set to "predated" whenever the sensor type is “temp” and the temperature value exceeds 30C, otherwise it will remain as NA.

library(tidyverse)

fishdat <- tibble::tribble(
  ~DATE.TIME,        ~FISH.TAG, ~SENSOR.ID, ~SENSOR.VALUE, ~SENSOR,
  "2019-06-18 20:19:41",   1,      65,            9,     "temp",
  "2019-06-18 20:20:51",   1,      65,            37,    "temp",
  "2019-06-18 20:19:22",   1,      66,            1,    "depth",
  "2019-06-18 20:21:16",   1,      66,            0,    "depth",
  "2019-06-18 22:27:40",   2,      21,           35,     "temp",
  "2019-06-18 22:33:57",   2,      21,           38,     "temp",
  "2019-06-18 22:27:10",   2,      22,            0,    "depth",
  "2019-06-19 3:18:17",    2,      22,           13,    "depth"
)

fishdat_marked <- 
  fishdat %>% 
  mutate(predated = ifelse(SENSOR == "temp" & SENSOR.VALUE > 30,
                           "predated",
                           NA_character_))

print(fishdat_marked)

Step 2: Cascading Down Predation Marker

Next, we’ll cascade down the predation marker by creating a new dataset where each observation is marked as predated if its preceding observation was also predated. This ensures that only the first instance of predation is recorded.

fishdat_filled <- 
  fishdat_marked %>% 
  group_by(FISH.TAG) %>% ## for each fish
  arrange(DATE.TIME, .by_group = T)  %>% 
  fill(predated, .direction = "down")

print(fishdat_filled)

Step 3: Filtering Observations

Finally, we’ll filter out the observations that occur after the first instance of predation. We can achieve this by filtering the predated column to only include NA values.

fishdat_filtered <- 
  fishdat_filled %>% 
  filter(is.na(predated))

print(fishdat_filtered)

Conclusion

In this article, we explored various approaches to filtering observations based on a threshold value exceeded in one column. By leveraging the tidyr library and its fill function, we simplified our code and made it more efficient. The final dataset provides us with only the first instance of predation for each fish.

Code Used

  • tibble::tribble: used to create a sample dataset
  • tidyverse: leveraged for data manipulation using pipes (%>%)
  • mutate: used to create new columns in the dataset
  • group_by and arrange: used to group observations by fish ID and arrange them chronologically
  • fill: used to cascade down the predation marker
  • filter: used to remove observations after the first instance of predation

Last modified on 2024-10-27