BigQuery Dynamic JSON attributes as columnar data
In this article, we will explore how to extract dynamic JSON attributes from a table in Google BigQuery. We will discuss the challenges of working with nested JSON objects and present a solution using dynamic JSON path extraction.
Problem Statement
Suppose you have a table with one of the columns containing JSON data. The goal is to extract additional columns from this JSON data, without knowing the key names in advance. This requires a flexible approach that can dynamically generate the necessary queries.
The provided Stack Overflow post presents an approach using temporary functions to achieve this goal.
Background
BigQuery supports various data types, including JSON and nested arrays. When working with JSON data, we often need to extract specific attributes or values without knowing their structure in advance.
To accomplish this task, we can use BigQuery’s JSON_EXTRACT
function, which extracts a value from an object using a JSON path expression. However, when the key names are not known in advance, this approach becomes challenging.
Solution Overview
The solution presented in the Stack Overflow post uses two temporary functions to achieve dynamic JSON attribute extraction:
json_keys
Function: This function takes an input string and returns an array of keys present in the JSON object.json_path
Function: This function takes a JSON string, a JSON path expression, and returns the value extracted from the JSON object using the specified path.
json_keys
Function
The json_keys
function uses JavaScript to parse the input JSON string and extract an array of keys. The implementation is as follows:
create temp function json_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
"""
This function works by:
- Parsing the input JSON string into a JavaScript object using
JSON.parse
. - Using the
Object.keys
method to extract an array of keys from the parsed object. - Returning this array of keys.
json_path
Function
The json_path
function takes a JSON string and a JSON path expression as inputs. It uses JavaScript to parse the input JSON string, apply the specified path expression, and return the extracted value.
The implementation is as follows:
create temp function json_path(json string, json_path string)
returns string
language js as """
try { var parsed = JSON.parse(json);
return JSON.stringify(jsonPath(parsed, json_path));
} catch (e) { return null }
"""
This function works by:
- Parsing the input JSON string into a JavaScript object using
JSON.parse
. - Using a recursive approach to apply the specified JSON path expression.
- If the expression is valid, returning the extracted value as a string. Otherwise, returning
null
.
Query Implementation
The query implementation involves two main steps:
- Unnesting and Unnaming: The first part of the query uses
UNNEST
to expand the JSON array into separate rows. Then, it usesUNNAMES
to rename the columns. - Dynamic JSON Path Extraction: The second part of the query applies the dynamic JSON path extraction using the
json_path
function.
The implementation is as follows:
select t.* except(col_c), key, trim(json_path(col_c, '$.' || key), '"[]') value
from your_table t,
unnest(json_keys(col_c)) key
This query works by:
- Iterating over the array of keys extracted from the
col_c
column usingUNNEST
. - Applying the dynamic JSON path extraction using the
json_path
function. - Returning the original row data (
t.* except(col_c)
) and the extracted value.
Example Use Case
Suppose we have a table with the following structure:
+---------+--------+-----------------------+
| Col_A | Col_B | Col_C |
+---------+--------+-----------------------+
| 1 | Abc1 | '{"a": "a_val", "b":...|
| 2 | Abc2 | '{"a": "a_val2", "c"...|
| 3 | Abc3 | '{"b": "b_val3", "c":"...|
+---------+--------+-----------------------+
The query implementation can extract the additional columns A
, B
, and C
from the JSON data using dynamic path extraction.
Conclusion
In this article, we explored the challenge of extracting dynamic JSON attributes from a table in Google BigQuery. We presented a solution using temporary functions to achieve flexible JSON path extraction. This approach enables you to dynamically generate queries based on real keys present in your JSON data.
While this solution assumes knowledge of key names in advance, it can be extended to accommodate cases where the key names are not known.
Last modified on 2023-11-09