16  Nested DF

To unnest a column containing nested pandas Series (or lists, dictionaries, or similar structures) into individual rows or columns, you can use various pandas methods. This operation is similar to R’s tidyr::unnest() function.

Here’s how you can unnest Series in a pandas DataFrame:


16.1 Unnesting DF contained Series or Dict

import pandas as pd

# Sample DataFrame with nested Series
data = {
    "id": [1, 2, 3],
    "nested_series": [pd.Series([10, 20, 30]), pd.Series([40, 50]), pd.Series([60])]
}

df = pd.DataFrame(data)
df
id nested_series
0 1 0 10 1 20 2 30 dtype: int64
1 2 0 40 1 50 dtype: int64
2 3 0 60 dtype: int64
data2 = {
    "id": [1, 2, 3],
    "nested_dict": [{"A": 1, "B": 2}, {"A": 2, "B": 4},
    {"A": 5, "B": 10}]
}

df2 = pd.DataFrame(data2)
df2
id nested_dict
0 1 {'A': 1, 'B': 2}
1 2 {'A': 2, 'B': 4}
2 3 {'A': 5, 'B': 10}

16.1.1 Unnest into Rows

To unnest the nested_series column into rows, use explode(). It expands each nested structure into separate rows.

# Unnest the nested_series into separate rows
df_unnested = df.explode('nested_series').reset_index(drop=True)
df_unnested
id nested_series
0 1 10
1 1 20
2 1 30
3 2 40
4 2 50
5 3 60
# Unnest the nested_series into separate rows
df_unnested2 = df2.explode('nested_dict').reset_index(drop=True)
df_unnested2
id nested_dict
0 1 A
1 1 B
2 2 A
3 2 B
4 3 A
5 3 B

Explanation:

  • explode() splits the Series in the nested_series column into individual rows.
  • reset_index() is used to reindex the resulting DataFrame for cleaner output.

16.1.2 Unnest into Columns

16.1.2.1 Series

If you want to unnest the Series into multiple columns, convert them into a DataFrame using pd.concat or .apply(pd.Series).

# Unnest the nested_series into separate columns
df_unnested_cols = pd.concat(
    [df.drop(columns='nested_series'), df['nested_series'].apply(pd.Series)], axis=1
    )
df_unnested_cols
id 0 1 2
0 1 10.0 20.0 30.0
1 2 40.0 50.0 NaN
2 3 60.0 NaN NaN

16.1.2.2 Dict

pd.concat(
    [df2.drop(columns="nested_dict"),
    df2['nested_dict'].apply(pd.Series)], axis=1
)
id A B
0 1 1 2
1 2 2 4
2 3 5 10

Explanation:

  • df['nested_series'].apply(pd.Series) converts the Series in the column into a DataFrame with one column per value.
  • pd.concat() merges this with the rest of the original DataFrame.

16.1.3 Unnest into a Long Format

You can unnest into a long format with identifiers using explode() and create an additional identifier column.

# Add an identifier column for position in nested Series
df_long = df.explode('nested_series').reset_index(drop=True)
df_long['position'] = df_long.groupby('id').cumcount() + 1

df_long
id nested_series position
0 1 10 1
1 1 20 2
2 1 30 3
3 2 40 1
4 2 50 2
5 3 60 1

Explanation:

  • explode() unnests the Series into rows.
  • groupby('id').cumcount() adds a position identifier for the unnesting, mimicking the long format with row-wise detail.

16.1.4 Summary of Methods:

Goal Method Example
Unnest into rows explode() df.explode('column')
Unnest into columns apply(pd.Series) + pd.concat() pd.concat([...], axis=1)
Unnest into long format explode() + groupby().cumcount() df.explode('column').reset_index()

Let me know if you have more specific scenarios or need additional examples!