In R’s dplyr, the count()
function is a convenient way to count occurrences of unique values in columns. Pandas offers several methods to achieve the same functionality, with some variations in approach
.groupby().size()
Basic Counting
import pandas as pd
# Create sample dataframe
df = pd.DataFrame({
'category' : ['A' , 'B' , 'A' , 'B' , 'A' , 'C' ],
'subcategory' : ['X' , 'X' , 'Y' , 'Z' , 'X' , 'Z' ]
})
# Count occurrences of 'category'
count_df = df.groupby('category' ).size().reset_index(name= 'count' )
count_df
Counting Multiple Columns
You can count by multiple columns, just like with dplyr::count(fruit, color)
:
# Count by both 'category' and 'subcategory'
count_multi = df.groupby(['category' , 'subcategory' ]).size().reset_index(name= 'count' )
count_multi
0
A
X
2
1
A
Y
1
2
B
X
1
3
B
Z
1
4
C
Z
1
.value_counts()
For a single column, .value_counts()
offers a more concise approach:
# Using value_counts() for single column
count_series = df['category' ].value_counts().reset_index()
count_series
df.value_counts().reset_index()
0
A
X
2
1
A
Y
1
2
B
X
1
3
B
Z
1
4
C
Z
1