19  Counting in Pandas

In R’s dplyr, the count() function is a convenient way to count occurrences of unique values in columns. Pandas offers several methods to achieve the same functionality, with some variations in approach

19.1 .groupby().size()

19.1.1 Basic Counting

import pandas as pd

# Create sample dataframe
df = pd.DataFrame({
    'category': ['A', 'B', 'A', 'B', 'A', 'C'],
    'subcategory': ['X', 'X', 'Y', 'Z', 'X', 'Z']
})

# Count occurrences of 'category'
count_df = df.groupby('category').size().reset_index(name='count')
count_df
category count
0 A 3
1 B 2
2 C 1

19.1.2 Counting Multiple Columns

You can count by multiple columns, just like with dplyr::count(fruit, color):

# Count by both 'category' and 'subcategory'
count_multi = df.groupby(['category', 'subcategory']).size().reset_index(name='count')
count_multi
category subcategory count
0 A X 2
1 A Y 1
2 B X 1
3 B Z 1
4 C Z 1

19.2 .value_counts()

For a single column, .value_counts() offers a more concise approach:

# Using value_counts() for single column
count_series = df['category'].value_counts().reset_index()
count_series
category count
0 A 3
1 B 2
2 C 1
df.value_counts().reset_index()
category subcategory count
0 A X 2
1 A Y 1
2 B X 1
3 B Z 1
4 C Z 1