Refactoring Code for Subset Generation: A Step-by-Step Approach in R

Based on your original code and the provided solution, I will help you refactor it to achieve the desired outcome. Here’s how you can modify your code:

# subset 20 rows before each -180 longitude and 20 rows after each +180 longitude
n <- length(df)
df$lon == -180
inPlay <- which(df$lon == -180)

# Sample Size
S <- 20

diffPlay <- diff(inPlay)
stop <- c(which(diffPlay !=1), length(inPlay))
start <- c(1, which(diffPlay !=1) + 1)

names(inPlayStart) <- ifelse(diffPlay > 0, paste0("Rows", inPlayStart, "_to_", inPlayStart+diffPlay), paste0("Row", inPlayStart))

subsetsList <- lapply(seq_along(start), function(i) {
    from <- max(1, min(n, start[[i]] - S))
    to <- min(n, start[[i]] + diffPlay[[i]] + S)
    
    cat("i is ", i, "\tPlus=", diffPlay[[i]], "\t(from, to) = (", from, ", ", to, ")\tDIFF=", to-from, "\n")
    
    # subset the rows
    df[from:to, ]
})

# have a look at the results
subsetsList

In this refactored code:

  • We use which on df$lon == -180 to identify indices where -180s occur. This gives us an indication of when we should start and stop our subsets.
  • To find the starting point for each subset, we calculate start using diffPlay[[i]] + 1, which represents the row index immediately after each -180. We then use this to generate inPlayStart.
  • Similarly, we calculate the ending point of each subset by adding diffPlay[[i]] to the starting index. This ensures that the last row is included if there are more than 20 rows.
  • For subsets where there’s only one row (due to -180s being at the beginning or end), we modify our range to ensure that it includes the desired number of rows on either side.
  • We use lapply to apply this subsetting process for all indices.

Last modified on 2023-10-17