Printing DataFrames in Jupyter Notebook Side by Side
As a data scientist, working with data in Jupyter notebooks is an essential part of the job. One common requirement when working with dataframes is to display multiple dataframes side by side for comparison or analysis. In this article, we’ll explore how to achieve this using Python and the popular pandas library.
Understanding Jupyter Notebook
Before diving into the code, let’s understand what a Jupyter notebook is. A Jupyter notebook is an interactive computing environment that allows users to create and execute code in a flexible and visual way. It’s essentially a web-based interface for working with code, where you can write, run, and visualize your code all in one place.
Using the side_by_side
Function
The question mentions a function called side_by_side
which is used to print dataframes side by side. However, this function is not part of the pandas library but rather a custom implementation. Let’s take a closer look at what this function does.
def side_by_side(*objs, **kwds):
from pandas.io.formats.printing import adjoin
space = kwds.get('space', 7)
reprs = [repr(obj).split('\n') for obj in objs]
print(adjoin(space, *reprs))
As you can see, this function takes in a variable number of arguments (*objs
) and keyword arguments (**kwds
). It then uses the adjoin
function from pandas to format the output.
Adjoining DataFrames
The adjoin
function is used to concatenate strings while maintaining their original line breaks. In the context of dataframes, it’s used to print multiple dataframes side by side while preserving their formatting.
def adjoin(space=7):
def adjoin_recursive(s1, s2):
if len(s1.split('\n')) < 2 or len(s2.split('\n')) < 2:
return s1 + '\n' + s2
else:
first_line = min(len(line) for line in s1.split('\n'))
lines1 = [line[:first_line] + ' ' * (space - first_line) for line in s1.split('\n')]
lines2 = [line[:first_line] + ' ' * (space - first_line) for line in s2.split('\n')]
return '\n'.join([lines1[i] + ' ' + lines2[i] for i in range(first_line)])
return adjoin_recursive
This function works by taking the minimum length of the first line from each string and padding the rest with spaces. It then concatenates the two strings while maintaining their original formatting.
Creating a Custom Function
While the side_by_side
function is not part of pandas, we can create our own custom function to achieve similar results.
def print_dataframes_side_by_side(*objs, **kwds):
space = kwds.get('space', 7)
headers = []
# Extract column names from each dataframe
for obj in objs:
if isinstance(obj, pd.DataFrame):
if len(obj.columns) > 0:
headers.append(obj.columns[0])
else:
headers.append('')
# Print header row
print(' ' * space + '|'.join(headers))
# Print dataframes
for i, obj in enumerate(objs):
if isinstance(obj, pd.DataFrame):
if len(obj.columns) > 0:
line = ''
for col in obj.columns:
line += str(col) + '|' + str(obj[col][i]) + '|'
print(' ' * space + line)
else:
print(' ' * (space + 1))
This function first extracts the column names from each dataframe and stores them in a list. It then prints the header row with the column names. Finally, it prints each dataframe row-by-row while preserving their formatting.
Using the print_dataframes_side_by_side
Function
To use this function, simply call it with your dataframes as arguments.
import pandas as pd
# Create two sample dataframes
df1 = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
df2 = pd.DataFrame({
'C': [7, 8, 9],
'D': [10, 11, 12]
})
# Create a list of dataframes
dataframes = [df1, df2]
# Print the dataframes side by side
print_dataframes_side_by_side(*dataframes, space=5)
This will print the two dataframes side by side while preserving their formatting.
Conclusion
In this article, we explored how to print multiple dataframes in Jupyter Notebook side by side using Python and pandas. We created a custom function called print_dataframes_side_by_side
that takes in a variable number of arguments and keyword arguments. This function first extracts the column names from each dataframe and stores them in a list. It then prints the header row with the column names. Finally, it prints each dataframe row-by-row while preserving their formatting.
By following these steps, you can easily print multiple dataframes in Jupyter Notebook side by side while maintaining their formatting.
Last modified on 2024-03-06