Creating Nested JSON Files from Pandas DataFrames in Python: A Step-by-Step Guide

Creating a Nested JSON File from a Pandas DataFrame in Python

==============================================

In this article, we will explore how to create a nested JSON file from a Pandas DataFrame in Python. We’ll cover the basics of Pandas, JSON, and Python’s string formatting capabilities.

Introduction to Pandas

Pandas is a powerful library for data manipulation and analysis in Python. It provides data structures such as Series (1-dimensional labeled array) and DataFrames (2-dimensional labeled data structure with columns of potentially different types).

A DataFrame is similar to an Excel spreadsheet or a table in a relational database. It consists of rows and columns, where each column represents a variable and each row represents an observation.

Introduction to JSON

JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy to read and write. It is commonly used for exchanging data between web servers and web applications.

A JSON object consists of key-value pairs, where keys are strings and values can be strings, numbers, booleans, arrays, or other JSON objects.

Introduction to Python’s String Formatting Capabilities

Python has a powerful string formatting system that allows us to insert values into strings. We’ll use this feature to create the desired JSON template.

Creating a Nested JSON File from a Pandas DataFrame

To create a nested JSON file from a Pandas DataFrame, we need to:

Create a JSON template with placeholders for the data.
Iterate over each row in the DataFrame and replace the placeholders with the corresponding values.
Convert the resulting string to a Python object using json.loads().
Store the result in a list.

Step-by-Step Example

Here’s an example code snippet that demonstrates how to create a nested JSON file from a Pandas DataFrame:

d = """{
    "Ord" : "%s",
    "MOT" : "%s",
    "MVT" : "%s",
    "CUST" : "%s",
    "milestone" : {
        "creation" : {
            "sla" : "%s",
            "plan" : "%s",
            "proposed" : "%s"
        },
        "Pickup" : {
            "sla" : "%s",
            "plan" : "%s",
            "proposed" : "%s"
        }
    }
}
"""

js = []

for item in df.values:
    js.append(json.loads(d%tuple(item.tolist())))

print(json.dumps(js))

In this example, we define a JSON template d with placeholders for the data. We then iterate over each row in the DataFrame df, replace the placeholders with the corresponding values using string formatting (%s), and convert the resulting string to a Python object using json.loads().

Output

The output of this code snippet is a list of JSON objects, where each object represents a single row in the DataFrame. The nested structure is preserved, as expected:

[{"Ord": "a", "MOT": "TR", "MVT": "TT", "CUST": "DEA", "milestone": {"creation": {"sla": "12-3-2020", "plan": "12-3-2020", "proposed": "12-3-2020"}, "Pickup": {"sla": "14-3-2020", "plan": "14-3-2020", "proposed": "14-3-2020"}}}, {"Ord": "b", "MOT": "ZR", "MVT": "TD", "CUST": "DET", "milestone": {"creation": {"sla": "15-3-2020", "plan": "15-3-2020", "proposed": "15-3-2020"}, "Pickup": {"sla": "16-3-2020", "plan": "16-3-2020", "proposed": "16-3-2020"}}}]

This output can be used as-is, or further processed to suit your needs.

Conclusion

In this article, we explored how to create a nested JSON file from a Pandas DataFrame in Python. We covered the basics of Pandas, JSON, and Python’s string formatting capabilities, and demonstrated a step-by-step example of how to achieve this task. With this knowledge, you should be able to create your own nested JSON files from Pandas DataFrames with ease.

Additional Tips

Make sure to handle missing values in the DataFrame before creating the JSON file.
Use json.dumps() to convert the list of JSON objects to a single string, if needed.
Consider using libraries like pandas_json or pandas_xml for more advanced JSON and XML handling capabilities.

Last modified on 2024-08-26