Creating Structured Data Frame from Multiple Arrays and Lists
In this article, we will explore how to create a structured data frame using multiple arrays and lists in Python. We’ll use the pandas library to achieve this.
Introduction
When working with large datasets, it’s common to have multiple arrays or lists that need to be combined into a single structure. This can be especially challenging when dealing with different data types and formats. In this article, we’ll demonstrate how to create a structured data frame from multiple arrays and lists using the pandas library.
Sample Data
To illustrate this concept, let’s consider an example where we have three lists:
distributors
: a list of distributor namesproducts
: a list of product namestips
: a list of tip categories (e.g., “fruit”, “vegetables”, etc.)- Two arrays:
actual_prix
andprix_prox_year
, which represent the actual prices and predicted prices for each product
import numpy as np
distributors = ['d1', 'd2', 'd3', 'd4', 'd5']
products = ['apple', 'carrot', 'potato', 'avocado', 'pumkie', 'banana',
'kiwi', 'lettuce', 'tomato', 'pees', 'pear', 'berries', 'strawberries',
'blueberries', 'boxes']
tips = ['fruit', 'vegetables', 'random']
actual_prix = np.arange(15*5).reshape(15,5)
prix_prox_year = np.random.rand(15,5)
Creating the Data Frame
To create a structured data frame from these arrays and lists, we can use the product
function from the itertools
library. This function generates all possible combinations of elements from the input iterables.
from itertools import product
import pandas as pd
df = (pd.DataFrame([*product(products, tips, distributors)],
columns=['Products', 'Type', 'Distributor'])
.assign(Actual = np.tile(actual_prix, len(tips)).ravel(),
Next_year = np.tile(prix_prox_year, len(tips)).ravel()))
Here’s a breakdown of what happens in the code:
- We import the necessary libraries:
pandas
for data manipulation andnumpy
for numerical operations. - We define the input lists and arrays.
- We use the
product
function to generate all possible combinations of product names, tip categories, and distributor names. - We create a pandas DataFrame from these combinations using the
pd.DataFrame()
constructor. - We assign column names to the resulting DataFrame.
Assigning Additional Columns
To complete our data frame, we need to add two additional columns: Actual
and Next_year
. These represent the actual prices and predicted prices for each product, respectively.
df = (pd.DataFrame([*product(products, tips, distributors)],
columns=['Products', 'Type', 'Distributor'])
.assign(Actual = np.tile(actual_prix, len(tips)).ravel(),
Next_year = np.tile(prix_prox_year, len(tips)).ravel())
Printing the Data Frame
Finally, we can print the resulting data frame to verify its contents.
print(df)
The output will be a structured data frame with all four columns: Products
, Type
, Distributor
, Actual
, and Next_year
.
Example Output
Here’s an example of what the output might look like:
Products | Type | Distributor | Actual | Next_year |
---|---|---|---|---|
apple | fruit | d1 | 0 | 0.391903 |
apple | fruit | d2 | 1 | 0.378865 |
apple | fruit | d3 | 2 | 0.056134 |
apple | fruit | d4 | 3 | 0.623146 |
apple | fruit | d5 | 4 | 0.879184 |
… (and so on for all combinations of products, tips, and distributors)
Conclusion
In this article, we demonstrated how to create a structured data frame using multiple arrays and lists in Python. We used the pandas library to achieve this, leveraging its powerful data manipulation capabilities. By following these steps, you can easily create your own data frames from large datasets and start exploring new insights and patterns.
Last modified on 2024-05-02