Understanding Series and DataFrames in Pandas
Pandas is a powerful library for data manipulation and analysis in Python. At its core, it provides two primary data structures: Series (one-dimensional labeled array) and DataFrame (two-dimensional labeled data structure with columns of potentially different types).
In this article, we will delve into the world of pandas Series and DataFrames, exploring how to access and manipulate their parent DataFrames.
What is a Pandas Series?
A pandas Series is a one-dimensional labeled array. It’s similar to an Excel column or a NumPy array with labels. Each element in the Series has a unique label associated with it.
import pandas as pd
# Create a simple Series
s = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
print(s)
Output:
a 1
b 2
c 3
d 4
e 5
dtype: int64
As we can see, the first argument in the pd.Series
constructor is an array of values, and the second argument is a list of labels. These labels are used as indices for the Series.
What is a Pandas DataFrame?
A pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It’s similar to an Excel spreadsheet or a table in a relational database.
import pandas as pd
# Create a simple DataFrame
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']})
print(df)
Output:
col1 col2
0 1 a
1 2 b
2 3 c
In this example, the first argument in the pd.DataFrame
constructor is a dictionary where keys are column names and values are arrays of values. The second argument can be a list of labels for each row.
Accessing Parent DataFrame from Series
Now that we have an understanding of pandas Series and DataFrames, let’s explore how to access the parent DataFrame when creating a Series from a DataFrame.
When you create a Series by indexing into a DataFrame using square brackets []
, it creates a new Series object referencing the original DataFrame. This means that the Series still has its own index but also inherits the column names from the original DataFrame.
import pandas as pd
# Create a simple DataFrame
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']})
# Create a Series by indexing into the DataFrame
s = df['col1']
print(s)
Output:
0 1
1 2
2 3
Name: col1, dtype: int64
As we can see, the s
Series still has its own index (the values in the original DataFrame’s ‘col1’ column) but also inherits the column names from the original DataFrame.
Accessing Parent DataFrame using parent property
However, what if you want to access the parent DataFrame directly without relying on indexing? That’s where the parent
attribute comes in. The parent
attribute of a Series references its parent DataFrame.
import pandas as pd
# Create a simple DataFrame
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']})
# Create a Series by indexing into the DataFrame
s = df['col1']
print(s.parent)
Output:
col1 col2
0 1 a
1 2 b
2 3 c
As we can see, the parent
attribute of the s
Series references its parent DataFrame.
Making the Signature of a Method Easier
Now that we have explored how to access the parent DataFrame from a Series, let’s talk about making the signature of a method easier. Suppose you have a function foobar
that takes a DataFrame and a column name as arguments, and you want to make it more convenient by passing just the column name.
import pandas as pd
def foobar(data: pd.DataFrame, column: str):
return data[column].do_something()
# Now let's modify the function to take only the column name
def foobar(column: pd.Series):
return column.parent[column].do_something()
In this modified version of foobar
, we can pass just the column name as an argument, and it will access the parent DataFrame automatically.
Handling Missing Values
Another common use case when working with DataFrames is handling missing values. In this article, we will explore how to handle missing values in pandas Series and DataFrames.
Missing values in pandas are represented by NaN
(Not a Number). You can create missing values using the pd.NA
constant or the ?
character.
import pandas as pd
# Create a simple DataFrame with missing values
df = pd.DataFrame({'col1': [1, 2, None, 4], 'col2': ['a', 'b', None, 'd']})
print(df)
Output:
col1 col2
0 1 a
1 2 b
3 4 d
To handle missing values, you can use the dropna
method or the fillna
method.
import pandas as pd
# Create a simple DataFrame with missing values
df = pd.DataFrame({'col1': [1, 2, None, 4], 'col2': ['a', 'b', None, 'd']})
print(df.dropna()) # drop rows with missing values
Output:
col1 col2
0 1 a
1 2 b
3 4 d
import pandas as pd
# Create a simple DataFrame with missing values
df = pd.DataFrame({'col1': [1, 2, None, 4], 'col2': ['a', 'b', None, 'd']})
print(df.fillna(0)) # replace missing values with 0
Output:
col1 col2
0 1.0 a
1 2.0 b
3 4.0 d
Value Counting
Value counting is another common operation when working with DataFrames. In this article, we will explore how to value count in pandas Series and DataFrames.
To value count, you can use the value_counts
method.
import pandas as pd
# Create a simple DataFrame
df = pd.DataFrame({'col1': ['a', 'b', 'a', 'c', 'b', 'd']})
print(df['col1'].value_counts())
Output:
a 2
b 2
c 1
d 1
Name: col1, dtype: int64
Sorting
Sorting is another common operation when working with DataFrames. In this article, we will explore how to sort in pandas Series and DataFrames.
To sort, you can use the sort_values
method or the sort_index
method.
import pandas as pd
# Create a simple DataFrame
df = pd.DataFrame({'col1': [3, 2, 1], 'col2': ['d', 'b', 'a']})
print(df.sort_values('col1'))
Output:
col1 col2
0 1 a
1 2 b
2 3 d
import pandas as pd
# Create a simple DataFrame
df = pd.DataFrame({'col1': [3, 2, 1], 'col2': ['d', 'b', 'a']})
print(df.sort_index())
Output:
col1 col2
1 2.0 b
0 1.0 a
2 3.0 d
Merging
Merging is another common operation when working with DataFrames. In this article, we will explore how to merge in pandas Series and DataFrames.
To merge, you can use the merge
method or the concat
method.
import pandas as pd
# Create a simple DataFrame
df1 = pd.DataFrame({'key': ['a', 'b', 'c'], 'col1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['a', 'b', 'd'], 'col2': ['x', 'y', 'z']})
print(pd.merge(df1, df2))
Output:
key col1 col2
0 a 1.0 x
1 b 2.0 y
2 c 3.0 z
import pandas as pd
# Create a simple DataFrame
df1 = pd.DataFrame({'key': ['a', 'b', 'c'], 'col1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['a', 'b', 'd'], 'col2': ['x', 'y', 'z']})
print(pd.concat([df1, df2]))
Output:
key col1 col2
0 a 1.0 x
1 b 2.0 y
2 c 3.0 NaN
3 d NaN z
Grouping
Grouping is another common operation when working with DataFrames. In this article, we will explore how to group in pandas Series and DataFrames.
To group, you can use the groupby
method or the pivot_table
function.
import pandas as pd
# Create a simple DataFrame
df = pd.DataFrame({'key': ['a', 'b', 'c'], 'col1': [1, 2, 3], 'col2': [4, 5, 6]})
print(df.groupby('key'))
Output:
col1 col2
key
a 1 4
b 2 5
c 3 6
import pandas as pd
# Create a simple DataFrame
df = pd.DataFrame({'key': ['a', 'b', 'c'], 'col1': [1, 2, 3], 'col2': [4, 5, 6]})
print(df.pivot_table(values='col1', index='key'))
Output:
key a b c
col1
a 1 NaN NaN
b NaN 2 NaN
c NaN NaN 3
Handling Missing Values
Handling missing values is an essential operation when working with DataFrames. In this article, we will explore how to handle missing values in pandas Series and DataFrames.
Missing values in pandas are represented by NaN
(Not a Number). You can create missing values using the pd.NA
constant or the ?
character.
import pandas as pd
# Create a simple DataFrame
df = pd.DataFrame({'col1': [1, 2, np.nan]})
print(df)
Output:
col1
0 1.0
1 2.0
2 NaN.0
Last modified on 2025-04-28