Modifying the keySearch() Function to Handle NAs in R and O*NET Database Search

Understanding the Issue with Modifying a Keyword Search Function to Handle NAs

In this blog post, we’ll delve into the technical details of modifying a keyword search function to either ignore or print NaN (Not a Number) values when a row does not contain a job title.

The problem arises from the fact that the original keySearch() function returns an error when it encounters a row with missing data. To address this issue, we’ll need to modify the function to handle these cases correctly.

Background and Context

To understand the context of this problem, let’s first explore the ONETr package and its functions. The ONETr package provides access to the Occupational Information Network (O*NET) database, which contains information about various occupations in the United States.

The keySearch() function is used to search for a specific keyword or occupational title within the O*NET database. It returns an XML response containing information about the job title and its corresponding SOC (Standard Occupational Classification) code.

In the original onet.sum function, we create a list of lists using the as.list() function and then extract the “title” and “code” columns from this list using indexing (obj1[["title"]][1] and obj1[["code"]][1], respectively). We then combine these values into a data frame using the cbind() function.

However, when we include NA values in the input vector x, the keySearch() function returns an error. This is because the function expects a valid keyword or occupational title as input.

Modifying the keySearch() Function to Handle NAs

To handle this issue, we need to modify the keySearch() function to ignore NaN values when searching for keywords. Here’s an updated version of the function:

### Updated keySearch() Function
jobSearch <- function(x, type = "keyword") {
  output <- getURL(paste("https://services.onetcenter.org/ws/mnm/search?keyword=", 
      x, sep = ""), userpwd = paste(get("creds", envir = cacheEnv)[[1]],
      ":", get("creds", envir = cacheEnv)[[2]], sep = ""), 
      httpauth = 1L)
  
  if (grepl("Authorization Error", output)) {
    message("Your API credentials are invalid. Please enter valid HTTPS credentials using setCreds().")
  }
  else if (grepl("total=\"0\"", output)) {
    message("Your keyword returned no results. Please try another keyword or occupational title.")
  }
  else {
    output <- xmlParse(output)
    keyOutput <- xmlToDataFrame(nodes = getNodeSet(output, "//career"))
    
    # Ignore NaN values in the keyOutput data frame
    if (nrow(keyOutput) > 0) {
      keyOutput$code[is.na(keyOutput$code)] <- "NA"
      return(keyOutput[, 1:2])
    } else {
      message("No results found for keyword:", x)
      return(NULL)
    }
  }
}

In this updated version, we added a check to see if the keyOutput data frame has any NaN values. If it does, we replace these values with “NA” using the is.na() function and return the data frame without the missing values.

We also added an additional check to ensure that the keyOutput data frame is not empty before returning its contents.

Updated onet.sum Function

With the updated keySearch() function, we can modify the original onet.sum function to handle NA values correctly. Here’s the updated version:

### Updated onet.sum Function
onet.sum <- function(x) {
  obj1 <- as.list(jobSearch(x))
  job.title <- obj1[["title"]][1] # pull best-matching title
  soc.code <- obj1[["code"]][1] # pull best matching title's SOC code
  
  # If the job title is NA, replace it with "N/A"
  if (is.na(job.title)) {
    job.title <- "N/A"
  }
  
  obj4 <- as.data.frame(cbind(job.title, soc.code))
  
  return(obj4)
}

# Now test it
library(ONETr)
library(tidyverse)

setCreds("user", "pass")

final_data <- lapply(c("psychologist", "social worker", NA), onet.sum) %>% bind_rows

In this updated version, we added a check to see if the job.title variable is NA. If it is, we replace it with “N/A” using the is.na() function.

We then return the data frame without missing values.

Conclusion

By modifying the keySearch() function to handle NaN values correctly, we can ensure that our onet.sum function produces accurate results even when dealing with rows containing missing data. This updated version of the function provides a robust and reliable way to search for keywords within the O*NET database.

In addition to this technical discussion, we’ve also provided instructions on how to install the dev version of the ONETr package and test the updated functions using sample code.


Last modified on 2025-01-10