How to Delete Rows in Pandas Based on Condition
Deleting rows in a pandas DataFrame based on a specific condition is a common task in data analysis. Pandas, being a powerful data manipulation library in Python, provides various methods to filter and delete rows that meet certain criteria. In this article, we will discuss different approaches to delete rows in a pandas DataFrame based on a condition.
One of the most straightforward methods to delete rows in a pandas DataFrame is by using the `drop()` function. This function allows you to specify the conditions under which rows should be removed from the DataFrame. Here’s an example:
“`python
import pandas as pd
Create a sample DataFrame
data = {‘Name’: [‘John’, ‘Jane’, ‘Alice’, ‘Bob’, ‘Charlie’],
‘Age’: [25, 30, 35, 40, 45],
‘City’: [‘New York’, ‘Los Angeles’, ‘Chicago’, ‘Houston’, ‘Phoenix’]}
df = pd.DataFrame(data)
Delete rows where Age is greater than 35
df = df[df[‘Age’] <= 35]
print(df)
```
In the above example, we have a DataFrame with columns 'Name', 'Age', and 'City'. We want to delete rows where the 'Age' is greater than 35. By using the `drop()` function, we can filter out the rows that meet the condition.
Another method to delete rows based on a condition is by using boolean indexing. This approach involves creating a boolean mask and applying it to the DataFrame. Here's an example:
```python
import pandas as pd
Create a sample DataFrame
data = {'Name': ['John', 'Jane', 'Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35, 40, 45],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']}
df = pd.DataFrame(data)
Create a boolean mask for rows where Age is greater than 35
mask = df['Age'] <= 35
Delete rows based on the boolean mask
df = df[mask]
print(df)
```
In this example, we create a boolean mask `mask` for rows where the 'Age' is less than or equal to 35. Then, we apply this mask to the DataFrame using the `drop()` function to delete the rows that meet the condition.
Additionally, you can use the `query()` function in pandas to delete rows based on a condition. This function provides a more readable way to filter data. Here's an example:
```python
import pandas as pd
Create a sample DataFrame
data = {'Name': ['John', 'Jane', 'Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35, 40, 45],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']}
df = pd.DataFrame(data)
Delete rows where Age is greater than 35 using query function
df = df.query('Age <= 35')
print(df)
```
In this example, we use the `query()` function to filter out the rows where the 'Age' is greater than 35. The resulting DataFrame will only contain the rows that meet the condition.
In conclusion, deleting rows in a pandas DataFrame based on a condition can be achieved using different methods such as `drop()`, boolean indexing, and `query()`. Each method has its own advantages and can be used depending on the specific requirements of your data analysis task.