Mastering Multiple Condition Filtering in Pandas DataFrames- A Comprehensive Guide

by liuqiyue

How to Filter Pandas DataFrame with Multiple Conditions

In data analysis, filtering data is a crucial step to extract relevant information from a large dataset. Pandas, a powerful data manipulation library in Python, provides a straightforward way to filter dataframes based on multiple conditions. In this article, we will discuss how to filter a pandas dataframe with multiple conditions, using practical examples to illustrate the process.

First, let’s start by importing the pandas library and creating a sample dataframe. We will use this dataframe to demonstrate the filtering process.

“`python
import pandas as pd

Create a sample dataframe
data = {
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Eve’],
‘Age’: [25, 30, 35, 40, 45],
‘Salary’: [50000, 60000, 70000, 80000, 90000]
}

df = pd.DataFrame(data)
print(df)
“`

The output of the above code will be:

“`
Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000
3 David 40 80000
4 Eve 45 90000
“`

Now, let’s discuss how to filter this dataframe based on multiple conditions. We can use the `query()` method or boolean indexing to achieve this.

Using `query()` Method

The `query()` method allows us to filter a dataframe based on multiple conditions using a string expression. Here’s an example:

“`python
filtered_df = df.query(‘Age > 30 and Salary > 70000’)
print(filtered_df)
“`

The output will be:

“`
Name Age Salary
2 Charlie 35 70000
3 David 40 80000
“`

In the above example, we filtered the dataframe to include only the rows where the age is greater than 30 and the salary is greater than 70000.

Using Boolean Indexing

Boolean indexing is another way to filter a dataframe based on multiple conditions. Here’s an example:

“`python
filtered_df = df[(df[‘Age’] > 30) & (df[‘Salary’] > 70000)]
print(filtered_df)
“`

The output will be the same as the previous example:

“`
Name Age Salary
2 Charlie 35 70000
3 David 40 80000
“`

In this example, we used boolean indexing to create a boolean mask and applied it to the dataframe to filter the rows based on the specified conditions.

By using these methods, you can easily filter a pandas dataframe with multiple conditions. In this article, we have discussed two methods: `query()` and boolean indexing. Both methods are efficient and can be used depending on your preference and the complexity of your data. Happy filtering!

You may also like