import numpy as np
import pandas as pd13 How to Vectorize
To vectorize a non-vectorized function in Python, especially when working with pandas or numpy, there are several approaches you can use to apply the function efficiently to whole arrays or pandas columns instead of looping through elements one by one. Vectorized operations are generally much faster because they leverage optimized C or Fortran code under the hood
13.1 Use numpy.vectorize()
numpy.vectorize() essentially wraps the function to allow it to operate element-wise on arrays, making it behave like a vectorized function.
# Non-vectorized function
def my_func(x):
return f"{x}^2 = {x**2}"
# Use numpy.vectorize() to vectorize the function
vectorized_func = np.vectorize(my_func)# Sample DataFrame
df = pd.DataFrame({
'col1': [1, 2, 3, 4],
})
df["new_col"] = vectorized_func(df["col1"])
df| col1 | new_col | |
|---|---|---|
| 0 | 1 | 1^2 = 1 |
| 1 | 2 | 2^2 = 4 |
| 2 | 3 | 3^2 = 9 |
| 3 | 4 | 4^2 = 16 |
13.2 Use pandas .apply() with axis=0
# Non-vectorized function
def my_func(x):
return f"{x}^2 = {x**2}"
df = pd.DataFrame({
'col1': [1, 2, 3, 4],
})# Apply the function element-wise using apply()
df['new_col'] = df['col1'].apply(my_func)
df| col1 | new_col | |
|---|---|---|
| 0 | 1 | 1^2 = 1 |
| 1 | 2 | 2^2 = 4 |
| 2 | 3 | 3^2 = 9 |
| 3 | 4 | 4^2 = 16 |
# With Pipe
df = (df
.pipe(lambda df: df.assign(new_col2 = df['col1'].apply(my_func))))
df| col1 | new_col | new_col2 | |
|---|---|---|---|
| 0 | 1 | 1^2 = 1 | 1^2 = 1 |
| 1 | 2 | 2^2 = 4 | 2^2 = 4 |
| 2 | 3 | 3^2 = 9 | 3^2 = 9 |
| 3 | 4 | 4^2 = 16 | 4^2 = 16 |
13.3 Use numpy.where() for conditional logic
If your non-vectorized function contains conditional logic, you can often replace it with numpy.where(), which is a vectorized alternative to if-else statements.
def my_func2(x):
if x > 2:
return x ** 2
else:
return x + 2You can refactor this into a vectorized version using numpy.where():
# Vectorized conditional logic with numpy.where()
df['new_col'] = np.where(df['col1'] > 2, df['col1'] ** 2, df['col1'] + 2)
df| col1 | new_col | new_col2 | |
|---|---|---|---|
| 0 | 1 | 3 | 1^2 = 1 |
| 1 | 2 | 4 | 2^2 = 4 |
| 2 | 3 | 9 | 3^2 = 9 |
| 3 | 4 | 16 | 4^2 = 16 |