11 Pandas - Mutate

import pandas as pd

Key pandas techniques to mimic dplyr::mutate():

Direct Column Assignment
assign() Method
apply() Method
Vectorized Operations

11.1 Direct Column Assignment

# Sample DataFrame
df = pd.DataFrame({
    'col1': [1, 2, 3, 4],
    'col2': [5, 6, 7, 8]
})

# Create a new column by adding existing columns (similar to dplyr::mutate)
df['new_col'] = df['col1'] + df['col2']
df

	col1	col2	new_col
0	1	5	6
1	2	6	8
2	3	7	10
3	4	8	12

11.2 Using the `assign()` Method

# Using assign() to add or modify multiple columns (similar to dplyr::mutate)
df = df.assign(
    new_col = df['col1'] + df['col2'],
    new_col2 = df['col1'] * df['col2']
)
df

	col1	col2	new_col	new_col2
0	1	5	6	5
1	2	6	8	12
2	3	7	10	21
3	4	8	12	32

11.3 Using the `apply()` Method

If you need to apply a more complex function to rows or columns, you can use apply(). This is useful when you need to perform row-wise operations or apply a custom function that isn’t easily vectorized.

# Apply a custom function to create a new column (similar to dplyr::mutate)
df['new_col'] = df.apply(lambda row: row['col1'] + row['col2'] * 2, axis=1)
df

	col1	col2	new_col	new_col2
0	1	5	11	5
1	2	6	14	12
2	3	7	17	21
3	4	8	20	32

11.4 Using Vectorized Operations for Efficiency

Pandas supports vectorized operations, meaning you can perform element-wise operations across columns without needing to use apply() or loops. This is more efficient and often used in cases where the operations are arithmetic or straightforward.

# Vectorized operations to create a new column
df['new_col'] = df['col1'] / df['col2']
df

	col1	col2	new_col	new_col2
0	1	5	0.200000	5
1	2	6	0.333333	12
2	3	7	0.428571	21
3	4	8	0.500000	32

11.5 Pipe

# Chaining operations with pipe (similar to R's %>%)
df = (df
      .pipe(lambda x: x.assign(new_col=x['col1'] + x['col2']))
      .pipe(lambda x: x.assign(new_col2=x['col1'] * x['col2']))
     )

df

	col1	col2	new_col	new_col2
0	1	5	6	5
1	2	6	8	12
2	3	7	10	21
3	4	8	12	32