19 Counting in Pandas

In R’s dplyr, the count() function is a convenient way to count occurrences of unique values in columns. Pandas offers several methods to achieve the same functionality, with some variations in approach

19.1 `.groupby().size()`

19.1.1 Basic Counting

import pandas as pd

# Create sample dataframe
df = pd.DataFrame({
    'category': ['A', 'B', 'A', 'B', 'A', 'C'],
    'subcategory': ['X', 'X', 'Y', 'Z', 'X', 'Z']
})

# Count occurrences of 'category'
count_df = df.groupby('category').size().reset_index(name='count')
count_df

	category	count
0	A	3
1	B	2
2	C	1

19.1.2 Counting Multiple Columns

You can count by multiple columns, just like with dplyr::count(fruit, color):

# Count by both 'category' and 'subcategory'
count_multi = df.groupby(['category', 'subcategory']).size().reset_index(name='count')
count_multi

	category	subcategory	count
0	A	X	2
1	A	Y	1
2	B	X	1
3	B	Z	1
4	C	Z	1

19.2 `.value_counts()`

For a single column, .value_counts() offers a more concise approach:

# Using value_counts() for single column
count_series = df['category'].value_counts().reset_index()
count_series

	category	count
0	A	3
1	B	2
2	C	1

df.value_counts().reset_index()

	category	subcategory	count
0	A	X	2
1	A	Y	1
2	B	X	1
3	B	Z	1
4	C	Z	1