Searching an Entire DataFrame for a Specific Value
When working with dataframes in pandas, it’s not uncommon to need to search for specific values within the dataframe. In this article, we’ll explore how to achieve this using the contains
function and return the value next to each match.
Understanding the Problem
Let’s start by looking at the sample dataset provided:
Protocol Number: xx-yzm2
Section Major Task Budget
1 Study Setup 25303.18
2 Study Setup Per-Location 110037.8
3 Site Identified by CRO 29966.25
4 Pre-study Site Visit (PSSV) 130525.92
We want to search for the keyword 'protocol'
and return the value next to each match. This means we need to find all rows where the value in a specific column contains the keyword.
The Solution
One way to achieve this is by using the apply
function along with a lambda function that checks if the specified column contains the keyword. We’ll also use the np.where
function to get the indices of the matching rows and columns.
Here’s how you can do it:
import pandas as pd
import numpy as np
data = {0: ['Protocol Number:', np.nan, 'Section Major', '1', '2', '3', '4'],
1: ['xx-yzm2', np.nan, 'Task', 'Study Setup', 'Study Setup Per-Location',
'Site Identified by CRO', 'Pre-study Site Visit (PSSV)'],
2: [np.nan, np.nan, 'Budget', '25303.18', '110037.8', '29966.25', '130525.92']}
df = pd.DataFrame(data)
0 1 2
0 Protocol Number: xx-yzm2 NaN
1 NaN NaN NaN
2 Section Major Task Budget
3 1 Study Setup 25303.18
4 2 Study Setup Per-Location 110037.8
5 3 Site Identified by CRO 29966.25
6 4 Pre-study Site Visit (PSSV) 130525.92
keyword = 'protocol'
# case-insensitive: case=False
# row: array([0], dtype=int64), col: array([0], dtype=int64)
row, col = np.where(df.apply(lambda x: x.astype(str).str.contains(keyword, case=False)))
result = df.iat[row[0],col[0]+1]
print(result)
# xx-yzm2
Why case=False
?
In the lambda function, we’re using the str.contains
method to check if the specified column contains the keyword. By default, this method is case-sensitive, meaning it will only match if the exact same characters appear in both strings (ignoring case). To make it case-insensitive, we set case=False
. This allows us to find matches regardless of whether they’re uppercase or lowercase.
Handling Multiple Matches
If you want to get all matches instead of just the first one, you can use a loop. Here’s how:
for i in range(len(row)):
if not col[i]+1 == len(df.columns):
print(df.iat[row[i],col[i]+1])
else:
# error handle, you're keyword was found in last column,
# i.e. there is no `next` col
pass
This code will print the value next to each match found in the dataframe.
Error Handling
As you can see, if the keyword is found in a column that has only one element (i.e., it’s not followed by another column), this code will throw an error because col[i]+1
would be equal to the total number of columns. To handle such cases, we could add some basic error checking:
for i in range(len(row)):
if col[i] == len(df.columns)-1:
print(df.iat[row[i],col[i]])
Conclusion
In this article, we explored how to search an entire dataframe for a specific value. We used the contains
function and returned the value next to each match by using the apply
, np.where
, and indexing functions in pandas.
Last modified on 2024-12-11