Reshaping a DataFrame with Added Attributes
Pandas DataFrames are powerful data structures in Python used to store and manipulate tabular data. They provide an efficient way to perform various operations, such as filtering, sorting, grouping, and merging, on datasets.
In this article, we will explore how to reshape a DataFrame by adding new attributes. We will use the pandas_datareader
library to fetch stock closing prices from Yahoo Finance, which we will then manipulate using Pandas functions.
Background
The pandas_datareader
library is used to retrieve financial and economic data from various sources, including Yahoo Finance, Quandl, and Google Finance. The DataReader
function takes two parameters: a list of tickers (the symbols of the stocks to fetch) and the date range for which we want to fetch the data.
Once we have fetched the data, we can store it in a Pandas DataFrame using the stocks = data.DataReader(tickers,'yahoo',start_date,end_date)
function. We then access the closing prices using the ['Close']
key.
The Problem
We are given a sample DataFrame with stock closing prices from Yahoo Finance:
Date | Symbols | Close |
---|---|---|
2020-04-01 | AAPL | 240.91 |
2020-04-01 | AMZN | 1907.70 |
… |
The issue is that the date and symbols are currently stored as a single column, which is not suitable for our needs.
Solution
To reshape the DataFrame, we can use the stack
function to create a multi-index pandas DataFrame. The stack
function takes one parameter: the axis along which to stack the values (in this case, 0). This creates a new DataFrame with two levels of indexing: the first level is the original index (Date), and the second level is the symbols.
We can then reset the index using the reset_index
function to flatten the multi-index into separate columns. Finally, we can rename the columns to match our desired output format.
Code
Here is the code that performs these steps:
from pandas_datareader import data
# Define the tickers and date range
tickers = ['FB', 'AAPL', 'AMZN', 'RSP']
start_date, end_date = "2020-04-01", "2020-04-10"
# Fetch the data from Yahoo Finance
stocks = data.DataReader(tickers,'yahoo',start_date,end_date)
# Access the closing prices
stocks_close = stocks['Close']
# Stack the values to create a multi-index pandas DataFrame
s1 = stocks_close.stack(0) # multi-index pandas dataframe
# Reset the index to flatten the multi-index into separate columns
s2 = s1.reset_index() # no index, flatted and repeated
print(s2)
Output
The output will be:
Date Symbols 0
0 2020-04-01 AAPL 240.910004
1 2020-04-01 AMZN 1907.699951
2 2020-04-01 FB 159.600006
3 2020-04-01 RSP 79.879997
4 2020-04-02 AAPL 244.929993
5 2020-04-02 AMZN 1918.829956
6 2020-04-02 FB 158.190002
7 2020-04-02 RSP 81.139999
8 2020-04-03 AAPL 241.410004
9 2020-04-03 AMZN 1906.589966
10 2020-04-03 FB 154.179993
11 2020-04-03 RSP 79.830002
12 2020-04-06 AAPL 262.470001
13 2020-04-06 AMZN 1997.589966
14 2020-04-06 FB 165.550003
15 2020-04-06 RSP 85.870003
16 2020-04-07 AAPL 259.429993
17 2020-04-07 AMZN 2011.599976
18 2020-04-07 FB 168.830002
19 2020-04-07 RSP 86.559998
20 2020-04-08 AAPL 266.070007
21 2020-04-08 AMZN 2043.000000
22 2020-04-08 FB 174.279999
23 2020-04-08 RSP 90.139999
24 2020-04-09 AAPL 267.989990
25 2020-04-09 AMZN 2042.760010
26 2020-04-09 FB 175.190002
27 2020-04-09 RSP 92.220001
Renaming the Columns
To rename the columns, we can use the columns
attribute and assign new names to the existing ones:
s2.columns = ['Dates', 'Symbols', 'Close']
print(s2)
This will output:
Dates Symbols Close
0 2020-04-01 AAPL 240.910004
1 2020-04-01 AMZN 1907.699951
2 2020-04-01 FB 159.600006
3 2020-04-01 RSP 79.879997
4 2020-04-02 AAPL 244.929993
5 2020-04-02 AMZN 1918.829956
6 2020-04-02 FB 158.190002
7 2020-04-02 RSP 81.139999
8 2020-04-03 AAPL 241.410004
9 2020-04-03 AMZN 1906.589966
10 2020-04-03 FB 154.179993
11 2020-04-03 RSP 79.830002
12 2020-04-06 AAPL 262.470001
13 2020-04-06 AMZN 1997.589966
14 2020-04-06 FB 165.550003
15 2020-04-06 RSP 85.870003
16 2020-04-07 AAPL 259.429993
17 2020-04-07 AMZN 2011.599976
18 2020-04-07 FB 168.830002
19 2020-04-07 RSP 86.559998
20 2020-04-08 AAPL 266.070007
21 2020-04-08 AMZN 2043.000000
22 2020-04-08 FB 174.279999
23 2020-04-08 RSP 90.139999
24 2020-04-09 AAPL 267.989990
25 2020-04-09 AMZN 2042.760010
26 2020-04-09 FB 175.190002
27 2020-04-09 RSP 92.220001
Conclusion
In this article, we explored how to reshape a DataFrame by adding new attributes using Pandas functions. We used the stack
function to create a multi-index pandas DataFrame and then reset the index to flatten the multi-index into separate columns. Finally, we renamed the columns to match our desired output format.
Last modified on 2025-03-04