Translating API JSON to pandas DataFrame: A Step-by-Step Guide

Translating API JSON to DataFrame

Overview of the Problem

The problem presented is how to translate an API’s JSON response into a pandas DataFrame, specifically dealing with nested data structures. The API in question has a complex JSON structure that contains various lists and dictionaries.

Background Information

To tackle this issue, it’s crucial to understand the basics of JSON, pandas DataFrames, and the json_normalize function from pandas. JSON (JavaScript Object Notation) is a lightweight data interchange format that’s widely used for transferring data between systems or applications. Pandas DataFrames are a data structure designed to efficiently store and manipulate tabular data in Python. The json_normalize function is used to flatten nested dictionaries into rows of a DataFrame.

API JSON Response

The provided API JSON response contains the following main elements:

total_results: An integer that represents the total number of results returned.
page_size and page_number: Variables indicating the current page size and page number, respectively.
offers: A list containing dictionaries representing individual offers.

Each offer dictionary has a variety of fields, including but not limited to:

tsin_id, offer_id, sku, barcode, and so forth, which are typically used as identifiers or labels.
Some fields contain nested structures like lists (leadtime_stock, stock_at_takealot, stock_on_way) that consist of dictionaries themselves.

Current Implementation

The provided code attempts to convert the JSON response into a pandas DataFrame. However, it faces challenges when dealing with the nested structure of certain fields.

import pandas as pd
from pandas import json_normalize
import requests as rq
import json
from datetime import datetime

# API information
url = "https://seller-api.takealot.com/v2"
endpoint = "/offers?"
api_key = "Key xyz"

header = {
        'Authorization': api_key
}

full_url = url + endpoint
    
response = rq.get(full_url, headers=header)
    
# convert to dataframe
data = response.text
info = json.loads(data)

df = json_normalize(info["offers"])

print(datetime.now().strftime('%H:%M:%S'))

Solution Overview

To address the issue of translating nested fields into a uniform DataFrame format, we will leverage the json_normalize function and adjust its parameters to accommodate the nested structures encountered in the API response.

Solution Steps

Step 1: Identify Nested Fields

First, identify which fields contain nested data. In this case, leadtime_stock, stock_at_takealot, and stock_on_way are listed inside dictionaries themselves.

Step 2: Adjust json_normalize Parameters

To properly flatten these nested structures into a DataFrame, we need to adjust the record_path and meta parameters of json_normalize. The record_path parameter specifies the path that follows each element in the input data. It must match the structure of your nested dictionaries.

Step 3: Use json_normalize with Correct Parameters

pd.json_normalize(data['offers'], record_path=['tsin_id', 'offer_id', 'sku', 'barcode', 
                                                   'product_label_number', 'selling_price', 'rrp', 'leadtime_days'],
                   meta=['merchant_warehouse', 'quantity_available'])

Note that we are normalizing the fields inside record_path as separate rows in the DataFrame, and we’re also including the metadata (merchant_warehouse, quantity_available) in the resulting DataFrame.

However, this would not directly align with the desired output format. Instead of flattening everything into separate columns, we can normalize it such that each nested structure is converted into a row at the same time.

Step 4: Adjust for Desired Output Format

To get the desired output format (as seen in the example) where tsin_id to stock_cover_days are all column headers with corresponding values, we need to ensure that the columns align properly. We can achieve this by specifying each nested dictionary’s elements as separate record paths but grouped under a common identifier or by indicating which keys should be used for grouping.

For instance, to get the format where everything lines up vertically:

pd.json_normalize(data['offers'], record_path=['tsin_id', 'offer_id', 'sku', 
                                                   ['leadtime_stock', 'merchant_warehouse'], 
                                                   ['stock_at_takealot', 'quantity_available']],
                   meta=['stock_on_way'])

This tells json_normalize to split each nested dictionary into separate rows, where each row contains the values from one of these structures. The meta parameter ensures that certain metadata fields are included as columns.

Step 5: Final Adjustments

After normalizing, you might need to adjust column names or data types if necessary for your final analysis or visualization tasks.

Conclusion

Translating JSON into a pandas DataFrame can be challenging when dealing with nested structures. By carefully choosing the parameters of json_normalize, it’s possible to transform complex API responses into DataFrames that are easily manipulable and understandable.

Last modified on 2024-03-20