11  Pandas - Mutate

import pandas as pd

Key pandas techniques to mimic dplyr::mutate():

  1. Direct Column Assignment
  2. assign() Method
  3. apply() Method
  4. Vectorized Operations

11.1 Direct Column Assignment

# Sample DataFrame
df = pd.DataFrame({
    'col1': [1, 2, 3, 4],
    'col2': [5, 6, 7, 8]
})

# Create a new column by adding existing columns (similar to dplyr::mutate)
df['new_col'] = df['col1'] + df['col2']
df
col1 col2 new_col
0 1 5 6
1 2 6 8
2 3 7 10
3 4 8 12

11.2 Using the assign() Method

# Using assign() to add or modify multiple columns (similar to dplyr::mutate)
df = df.assign(
    new_col = df['col1'] + df['col2'],
    new_col2 = df['col1'] * df['col2']
)
df
col1 col2 new_col new_col2
0 1 5 6 5
1 2 6 8 12
2 3 7 10 21
3 4 8 12 32

11.3 Using the apply() Method

If you need to apply a more complex function to rows or columns, you can use apply(). This is useful when you need to perform row-wise operations or apply a custom function that isn’t easily vectorized.

# Apply a custom function to create a new column (similar to dplyr::mutate)
df['new_col'] = df.apply(lambda row: row['col1'] + row['col2'] * 2, axis=1)
df
col1 col2 new_col new_col2
0 1 5 11 5
1 2 6 14 12
2 3 7 17 21
3 4 8 20 32

11.4 Using Vectorized Operations for Efficiency

Pandas supports vectorized operations, meaning you can perform element-wise operations across columns without needing to use apply() or loops. This is more efficient and often used in cases where the operations are arithmetic or straightforward.

# Vectorized operations to create a new column
df['new_col'] = df['col1'] / df['col2']
df
col1 col2 new_col new_col2
0 1 5 0.200000 5
1 2 6 0.333333 12
2 3 7 0.428571 21
3 4 8 0.500000 32

11.5 Pipe

# Chaining operations with pipe (similar to R's %>%)
df = (df
      .pipe(lambda x: x.assign(new_col=x['col1'] + x['col2']))
      .pipe(lambda x: x.assign(new_col2=x['col1'] * x['col2']))
     )

df
col1 col2 new_col new_col2
0 1 5 6 5
1 2 6 8 12
2 3 7 10 21
3 4 8 12 32