Printing DataFrames in Jupyter Notebook Side by Side with Custom Functionality

Printing DataFrames in Jupyter Notebook Side by Side

As a data scientist, working with data in Jupyter notebooks is an essential part of the job. One common requirement when working with dataframes is to display multiple dataframes side by side for comparison or analysis. In this article, we’ll explore how to achieve this using Python and the popular pandas library.

Understanding Jupyter Notebook

Before diving into the code, let’s understand what a Jupyter notebook is. A Jupyter notebook is an interactive computing environment that allows users to create and execute code in a flexible and visual way. It’s essentially a web-based interface for working with code, where you can write, run, and visualize your code all in one place.

Using the side_by_side Function

The question mentions a function called side_by_side which is used to print dataframes side by side. However, this function is not part of the pandas library but rather a custom implementation. Let’s take a closer look at what this function does.

def side_by_side(*objs, **kwds):
    from pandas.io.formats.printing import adjoin
    space = kwds.get('space', 7)
    reprs = [repr(obj).split('\n') for obj in objs]
    print(adjoin(space, *reprs))

As you can see, this function takes in a variable number of arguments (*objs) and keyword arguments (**kwds). It then uses the adjoin function from pandas to format the output.

Adjoining DataFrames

The adjoin function is used to concatenate strings while maintaining their original line breaks. In the context of dataframes, it’s used to print multiple dataframes side by side while preserving their formatting.

def adjoin(space=7):
    def adjoin_recursive(s1, s2):
        if len(s1.split('\n')) < 2 or len(s2.split('\n')) < 2:
            return s1 + '\n' + s2
        else:
            first_line = min(len(line) for line in s1.split('\n'))
            lines1 = [line[:first_line] + ' ' * (space - first_line) for line in s1.split('\n')]
            lines2 = [line[:first_line] + ' ' * (space - first_line) for line in s2.split('\n')]
            return '\n'.join([lines1[i] + '   ' + lines2[i] for i in range(first_line)])
    return adjoin_recursive

This function works by taking the minimum length of the first line from each string and padding the rest with spaces. It then concatenates the two strings while maintaining their original formatting.

Creating a Custom Function

While the side_by_side function is not part of pandas, we can create our own custom function to achieve similar results.

def print_dataframes_side_by_side(*objs, **kwds):
    space = kwds.get('space', 7)
    headers = []
    
    # Extract column names from each dataframe
    for obj in objs:
        if isinstance(obj, pd.DataFrame):
            if len(obj.columns) > 0:
                headers.append(obj.columns[0])
            else:
                headers.append('')
                
    # Print header row
    print('   ' * space + '|'.join(headers))
    
    # Print dataframes
    for i, obj in enumerate(objs):
        if isinstance(obj, pd.DataFrame):
            if len(obj.columns) > 0:
                line = ''
                for col in obj.columns:
                    line += str(col) + '|' + str(obj[col][i]) + '|'
                print(' ' * space + line)
            else:
                print(' ' * (space + 1))

This function first extracts the column names from each dataframe and stores them in a list. It then prints the header row with the column names. Finally, it prints each dataframe row-by-row while preserving their formatting.

Using the print_dataframes_side_by_side Function

To use this function, simply call it with your dataframes as arguments.

import pandas as pd

# Create two sample dataframes
df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

df2 = pd.DataFrame({
    'C': [7, 8, 9],
    'D': [10, 11, 12]
})

# Create a list of dataframes
dataframes = [df1, df2]

# Print the dataframes side by side
print_dataframes_side_by_side(*dataframes, space=5)

This will print the two dataframes side by side while preserving their formatting.

Conclusion

In this article, we explored how to print multiple dataframes in Jupyter Notebook side by side using Python and pandas. We created a custom function called print_dataframes_side_by_side that takes in a variable number of arguments and keyword arguments. This function first extracts the column names from each dataframe and stores them in a list. It then prints the header row with the column names. Finally, it prints each dataframe row-by-row while preserving their formatting.

By following these steps, you can easily print multiple dataframes in Jupyter Notebook side by side while maintaining their formatting.


Last modified on 2024-03-06