Translating API JSON to DataFrame
Overview of the Problem
The problem presented is how to translate an API’s JSON response into a pandas DataFrame, specifically dealing with nested data structures. The API in question has a complex JSON structure that contains various lists and dictionaries.
Background Information
To tackle this issue, it’s crucial to understand the basics of JSON, pandas DataFrames, and the json_normalize
function from pandas. JSON (JavaScript Object Notation) is a lightweight data interchange format that’s widely used for transferring data between systems or applications. Pandas DataFrames are a data structure designed to efficiently store and manipulate tabular data in Python. The json_normalize
function is used to flatten nested dictionaries into rows of a DataFrame.
API JSON Response
The provided API JSON response contains the following main elements:
- total_results: An integer that represents the total number of results returned.
- page_size and page_number: Variables indicating the current page size and page number, respectively.
- offers: A list containing dictionaries representing individual offers.
Each offer dictionary has a variety of fields, including but not limited to:
- tsin_id, offer_id, sku, barcode, and so forth, which are typically used as identifiers or labels.
- Some fields contain nested structures like lists (
leadtime_stock
,stock_at_takealot
,stock_on_way
) that consist of dictionaries themselves.
Current Implementation
The provided code attempts to convert the JSON response into a pandas DataFrame. However, it faces challenges when dealing with the nested structure of certain fields.
import pandas as pd
from pandas import json_normalize
import requests as rq
import json
from datetime import datetime
# API information
url = "https://seller-api.takealot.com/v2"
endpoint = "/offers?"
api_key = "Key xyz"
header = {
'Authorization': api_key
}
full_url = url + endpoint
response = rq.get(full_url, headers=header)
# convert to dataframe
data = response.text
info = json.loads(data)
df = json_normalize(info["offers"])
print(datetime.now().strftime('%H:%M:%S'))
Solution Overview
To address the issue of translating nested fields into a uniform DataFrame format, we will leverage the json_normalize
function and adjust its parameters to accommodate the nested structures encountered in the API response.
Solution Steps
Step 1: Identify Nested Fields
First, identify which fields contain nested data. In this case, leadtime_stock, stock_at_takealot, and stock_on_way are listed inside dictionaries themselves.
Step 2: Adjust json_normalize Parameters
To properly flatten these nested structures into a DataFrame, we need to adjust the record_path
and meta
parameters of json_normalize
. The record_path
parameter specifies the path that follows each element in the input data. It must match the structure of your nested dictionaries.
Step 3: Use json_normalize with Correct Parameters
pd.json_normalize(data['offers'], record_path=['tsin_id', 'offer_id', 'sku', 'barcode',
'product_label_number', 'selling_price', 'rrp', 'leadtime_days'],
meta=['merchant_warehouse', 'quantity_available'])
Note that we are normalizing the fields inside record_path
as separate rows in the DataFrame, and we’re also including the metadata (merchant_warehouse
, quantity_available
) in the resulting DataFrame.
However, this would not directly align with the desired output format. Instead of flattening everything into separate columns, we can normalize it such that each nested structure is converted into a row at the same time.
Step 4: Adjust for Desired Output Format
To get the desired output format (as seen in the example) where tsin_id
to stock_cover_days
are all column headers with corresponding values, we need to ensure that the columns align properly. We can achieve this by specifying each nested dictionary’s elements as separate record paths but grouped under a common identifier or by indicating which keys should be used for grouping.
For instance, to get the format where everything lines up vertically:
pd.json_normalize(data['offers'], record_path=['tsin_id', 'offer_id', 'sku',
['leadtime_stock', 'merchant_warehouse'],
['stock_at_takealot', 'quantity_available']],
meta=['stock_on_way'])
This tells json_normalize
to split each nested dictionary into separate rows, where each row contains the values from one of these structures. The meta
parameter ensures that certain metadata fields are included as columns.
Step 5: Final Adjustments
After normalizing, you might need to adjust column names or data types if necessary for your final analysis or visualization tasks.
Conclusion
Translating JSON into a pandas DataFrame can be challenging when dealing with nested structures. By carefully choosing the parameters of json_normalize
, it’s possible to transform complex API responses into DataFrames that are easily manipulable and understandable.
Last modified on 2024-03-20