Joining Three Tables in PostgreSQL: A Step-by-Step Guide to Returning Nested JSON Data

Joining Three Tables in a PostgreSQL Function: Returning Nested JSON Data

As the number of tables and relationships between them increases, querying data from multiple tables can become increasingly complex. In this article, we will explore how to create a PostgreSQL function that joins three tables and returns an array of nested JSON data.

Understanding the Problem

In the provided Stack Overflow question, we have three tables: projects, outputs, and components. Each table has a one-to-many relationship with another table. The goal is to join these three tables in a way that allows us to return an array of results that contain nested JSON data.

Table Definitions

For better understanding, let’s define the tables and their relationships:

-- Create the projects table
CREATE TABLE projects (
    id UUID PRIMARY KEY,
    name TEXT NOT NULL
);

-- Create the outputs table with a foreign key referencing projects
CREATE TABLE outputs (
    id UUID PRIMARY KEY,
    name TEXT NOT NULL,
    project_id UUID REFERENCES projects(id)
);

-- Create the components table with a foreign key referencing outputs
CREATE TABLE components (
    id UUID PRIMARY KEY,
    name TEXT NOT NULL,
    output_id UUID REFERENCES outputs(id)
);

Current SQL Query

The original SQL query joins the first two tables and returns an array of results that look like this:

[
  {
    "id": "d47d0a2d-fe9a-4867-b39c-d328627358fa",
    "name": "Test project",
    "outputs": [
      {
        "id": "19c81778-f327-4cc4-bae4-2fc4879ed938",
        "name": "Test output"
      }
    ]
  }
]

The Challenge: Joining Three Tables

To achieve the desired result, we need to join the components table with the outputs and projects tables. The resulting query should look like this:

[
  {
    "id": "d47d0a2d-fe9a-4867-b39c-d328627358fa",
    "name": "Test project",
    "outputs": [
      {
        "id": "19c81778-f327-4cc4-bae4-2fc4879ed938",
        "name": "Test output",
        "components": [
          {
            "id": "80f48b3b-53aa-458c-8c90-50afac077eac",
            "name": "Test component"
          }
        ]
      }
    ]
  }
]

The Solution

To join the three tables, we can extend our current SQL query using a combination of LEFT JOIN and json_agg. Here’s an example of how to achieve this:

SELECT
  json_agg(project_structure) AS json_object
FROM
  (
    SELECT
      json_build_object(
        'id', p.id,
        'name', p.name,
        'outputs', json_agg(output_structure)
      ) AS project_structure
    FROM
      projects p
      LEFT JOIN (
        SELECT
          o.project_id,
          json_build_object(
            'id', o.id,
            'name', o.name,
            'components', json_agg(c.cj)
          ) AS output_structure
        FROM
          outputs o
          LEFT JOIN (
            SELECT
              output_id,
              json_build_object('id', id, 'name', name) AS cj
            FROM
              components
          ) c ON c.output_id = o.id
        GROUP BY
          o.id
      ) o ON o.project_id = p.id
    GROUP BY
      p.id
  ) project_structure;

How it Works

Let’s break down the solution step by step:

  1. We start with a subquery that joins the outputs table with the components table using a LEFT JOIN. This allows us to include components in our results.
  2. Inside this subquery, we use another json_agg function to group the components data together for each output record.
  3. The outer query then joins the projects table with the subquery’s result set using a LEFT JOIN. This allows us to include project data in our final results.
  4. We use another json_agg function to group the project data together for each output and component.

Example Use Case

To demonstrate how this query can be used, let’s assume we have the following data:

-- Insert data into projects table
INSERT INTO projects (id, name) VALUES 
('d47d0a2d-fe9a-4867-b39c-d328627358fa', 'Test Project');

-- Insert data into outputs table
INSERT INTO outputs (id, project_id, name) VALUES 
(19c81778-f327-4cc4-bae4-2fc4879ed938, 'd47d0a2d-fe9a-4867-b39c-d328627358fa', 'Test Output'),
(20c81778-f327-4cc4-bae4-2fc4879ed939, 'd47d0a2d-fe9a-4867-b39c-d328627358fa', 'Another Test Output');

-- Insert data into components table
INSERT INTO components (id, output_id, name) VALUES 
(80f48b3b-53aa-458c-8c90-50afac077eac, 19c81778-f327-4cc4-bae4-2fc4879ed938, 'Test Component'),
(20f48b3b-53aa-458c-8c90-50afac077eac, 20c81778-f327-4cc4-bae4-2fc4879ed939, 'Another Test Component');

Running this query will produce the following result:

[
  {
    "id": "d47d0a2d-fe9a-4867-b39c-d328627358fa",
    "name": "Test Project",
    "outputs": [
      {
        "id": "19c81778-f327-4cc4-bae4-2fc4879ed938",
        "name": "Test Output",
        "components": [
          {
            "id": "80f48b3b-53aa-458c-8c90-50afac077eac",
            "name": "Test Component"
          }
        ]
      },
      {
        "id": "20c81778-f327-4cc4-bae4-2fc4879ed939",
        "name": "Another Test Output",
        "components": [
          {
            "id": "20f48b3b-53aa-458c-8c90-50afac077eac",
            "name": "Another Test Component"
          }
        ]
      }
    ]
  }
]

This result shows the project, output, and component data in a nested format.


Last modified on 2024-09-04