Extracting Dynamic JSON Attributes from BigQuery with Temporary Functions

BigQuery Dynamic JSON attributes as columnar data

In this article, we will explore how to extract dynamic JSON attributes from a table in Google BigQuery. We will discuss the challenges of working with nested JSON objects and present a solution using dynamic JSON path extraction.

Problem Statement

Suppose you have a table with one of the columns containing JSON data. The goal is to extract additional columns from this JSON data, without knowing the key names in advance. This requires a flexible approach that can dynamically generate the necessary queries.

The provided Stack Overflow post presents an approach using temporary functions to achieve this goal.

Background

BigQuery supports various data types, including JSON and nested arrays. When working with JSON data, we often need to extract specific attributes or values without knowing their structure in advance.

To accomplish this task, we can use BigQuery’s JSON_EXTRACT function, which extracts a value from an object using a JSON path expression. However, when the key names are not known in advance, this approach becomes challenging.

Solution Overview

The solution presented in the Stack Overflow post uses two temporary functions to achieve dynamic JSON attribute extraction:

  1. json_keys Function: This function takes an input string and returns an array of keys present in the JSON object.
  2. json_path Function: This function takes a JSON string, a JSON path expression, and returns the value extracted from the JSON object using the specified path.

json_keys Function

The json_keys function uses JavaScript to parse the input JSON string and extract an array of keys. The implementation is as follows:

create temp function json_keys(input string) returns array<string> language js as """
  return Object.keys(JSON.parse(input));
"""

This function works by:

  • Parsing the input JSON string into a JavaScript object using JSON.parse.
  • Using the Object.keys method to extract an array of keys from the parsed object.
  • Returning this array of keys.

json_path Function

The json_path function takes a JSON string and a JSON path expression as inputs. It uses JavaScript to parse the input JSON string, apply the specified path expression, and return the extracted value.

The implementation is as follows:

create temp function json_path(json string, json_path string)
returns string
language js as """
    try { var parsed = JSON.parse(json);
        return JSON.stringify(jsonPath(parsed, json_path));
    } catch (e) { return null }
"""

This function works by:

  • Parsing the input JSON string into a JavaScript object using JSON.parse.
  • Using a recursive approach to apply the specified JSON path expression.
  • If the expression is valid, returning the extracted value as a string. Otherwise, returning null.

Query Implementation

The query implementation involves two main steps:

  1. Unnesting and Unnaming: The first part of the query uses UNNEST to expand the JSON array into separate rows. Then, it uses UNNAMES to rename the columns.
  2. Dynamic JSON Path Extraction: The second part of the query applies the dynamic JSON path extraction using the json_path function.

The implementation is as follows:

select t.* except(col_c), key, trim(json_path(col_c, '$.' || key), '"[]') value
from your_table t,
unnest(json_keys(col_c)) key 

This query works by:

  • Iterating over the array of keys extracted from the col_c column using UNNEST.
  • Applying the dynamic JSON path extraction using the json_path function.
  • Returning the original row data (t.* except(col_c)) and the extracted value.

Example Use Case

Suppose we have a table with the following structure:

+---------+--------+-----------------------+
| Col_A  | Col_B | Col_C                 |
+---------+--------+-----------------------+
|    1   | Abc1  | '{"a": "a_val", "b":...|
|    2   | Abc2  | '{"a": "a_val2", "c"...|
|    3   | Abc3  | '{"b": "b_val3", "c":"...|
+---------+--------+-----------------------+

The query implementation can extract the additional columns A, B, and C from the JSON data using dynamic path extraction.

Conclusion

In this article, we explored the challenge of extracting dynamic JSON attributes from a table in Google BigQuery. We presented a solution using temporary functions to achieve flexible JSON path extraction. This approach enables you to dynamically generate queries based on real keys present in your JSON data.

While this solution assumes knowledge of key names in advance, it can be extended to accommodate cases where the key names are not known.


Last modified on 2023-11-09