Creating a JSON List from Multiple Table Rows in BigQuery Using Array Aggregation and Struct

Creating a JSON List from Multiple Table Rows

Table of Contents

  1. Introduction
  2. Understanding the Problem
  3. BigQuery SQL: A Solution for Converting Tables to JSON Lists
  4. Example Walkthrough
  5. Error Handling: What Happens When the Data Doesn’t Fit?
  6. Conclusion

Introduction

BigQuery, a popular data warehousing platform from Google, offers a powerful way to store and process large datasets. However, extracting specific data in the desired format can sometimes be challenging, especially when working with complex queries that involve multiple tables.

In this article, we’ll explore how to create a JSON list from multiple table rows using BigQuery SQL. We’ll delve into the intricacies of grouping rows by order number and utilizing array aggregation functions like ARRAY_AGG and STRUCT.

Understanding the Problem

The question presents a scenario where an ecommerce dataset is stored in BigQuery, with each row representing an item in an order. The goal is to create a JSON list for every order, containing product information (e.g., product_id and quantity) in the following format:

[ {
  "product_id": "ABC",
  "quantity": 1
}, {
  "product_id": "DEF",
  "quantity": 2
} ]

This requires breaking down each row into its constituent parts, grouping them by order number, and then converting the resulting rows to a JSON list.

BigQuery SQL: A Solution for Converting Tables to JSON Lists

Grouping Rows by Order Number

The first step in creating a JSON list is to group the rows by order number. This can be achieved using the GROUP BY clause.

SELECT 
  order_number, 
  TO_JSON_STRING(ARRAY_AGG(STRUCT(product as product_id, quantity))) json_value 
FROM orders
GROUP BY order_number;

Using Array Aggregation and Struct

The heart of this query lies in the use of array aggregation functions like ARRAY_AGG and STRUCT. Here’s how it works:

  • SELECT TO_JSON_STRING(ARRAY_AGG(STRUCT(product as product_id, quantity))) json_value: This line aggregates all the rows for each order number into an array. Each element in this array represents a single row from the original data.
  • The STRUCT function is used to create a new table with two columns: product_id and quantity. These values are then added to the array.

However, if you want to get multiple rows back, it might be more convenient to first group by order number and product. Then you can use GROUP BY on that result again:

SELECT 
  order_number, 
  product_id, 
  quantity
FROM orders
GROUP BY order_number, product_id, quantity;

Then you can convert the resulting table into a JSON array.

Example Walkthrough

Let’s examine how this query works using an example dataset. Suppose we have the following data:

Order NumberProductQuantity
001ABC1
001DEF2
002GHI3

If we run the query:

SELECT 
  order_number, 
  TO_JSON_STRING(ARRAY_AGG(STRUCT(product as product_id, quantity))) json_value 
FROM orders
GROUP BY order_number;

The result would be:

Order Numberjson_value
001[ {“product_id”: “ABC”, “quantity”: 1}, {“product_id”: “DEF”, “quantity”: 2} ]
002[ {“product_id”: “GHI”, “quantity”: 3} ]

Error Handling: What Happens When the Data Doesn’t Fit?

One potential issue with this approach is what happens when an order doesn’t have any products. In that case, you’d get a NULL value in your JSON array because there would be no elements to aggregate.

For example, if we had:

Order NumberProductQuantity
001ABC1
002GHI3

The query might return an empty JSON array for the order 001 because it doesn’t have any products.

To avoid this issue, we can add a check to make sure that there are products before trying to aggregate them:

SELECT 
  order_number, 
  TO_JSON_STRING(ARRAY_AGG(CASE WHEN product IS NOT NULL THEN STRUCT(product as product_id, quantity) ELSE NULL END)) json_value 
FROM orders
GROUP BY order_number;

This version of the query will return an empty array instead of NULL for an order without products.

Conclusion

In this article, we explored how to create a JSON list from multiple table rows using BigQuery SQL. We delved into the intricacies of grouping rows by order number and utilizing array aggregation functions like ARRAY_AGG and STRUCT.


Last modified on 2025-03-12