1  Data Type

Here is the mapping of R data types to their equivalent Python and pandas data types:

R Data Type Description Python Data Type pandas Data Type
character String values str object or string
factor Categorical data with fixed levels str or Categorical Categorical
double Decimal numbers (floating-point) float float64
integer Whole numbers int int64
date Date values (without time) datetime.date datetime64[ns] (with date only)
POSIXct / date.time Date-time values datetime.datetime datetime64[ns]
logical Boolean values (TRUE/FALSE) bool bool
NA / NULL Missing or null values None, numpy.nan NaN, None

Below are detailed examples and explanations for each data type.


1.1 Character → String (str or object)

  • R: "Hello", "abc"
  • Python: str
  • pandas: By default, string columns in pandas are stored as object (a generic type that can hold anything). However, starting with pandas 1.0, you can explicitly use the string dtype.
import pandas as pd

df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie']})
print(df.dtypes)  # dtype will show 'object'
name    object
dtype: object

You can also use the newer string dtype:

df = pd.DataFrame({'name': pd.Series(['Alice', 'Bob', 'Charlie'], dtype='string')})
print(df.dtypes)  # dtype will show 'string'
name    string[python]
dtype: object

1.2 Factor → Categorical (Categorical)

  • R: factor(c("low", "medium", "high"))
  • Python / pandas: pd.Categorical

In Python, categorical data is represented using the Categorical data type. This is useful for values that take a limited set of distinct values (e.g., “low”, “medium”, “high”).

df = pd.DataFrame({
    'rating': pd.Categorical(['low', 'medium', 'high', 'medium'], categories=['low', 'medium', 'high'], ordered=True)
})

print(df)
print(df.dtypes)  # dtype will show 'category'
   rating
0     low
1  medium
2    high
3  medium
rating    category
dtype: object

1.3 Double → Float (float64)

  • R: 3.14, 2.718
  • Python: float
  • pandas: float64

Floating-point numbers in pandas are represented as float64 by default.

df = pd.DataFrame({'pi': [3.14, 2.718, 1.618]})
print(df.dtypes)  # dtype will show 'float64'
pi    float64
dtype: object

1.4 Integer → Integer (int64)

  • R: 1, 2, 3
  • Python: int
  • pandas: int64

In pandas, integer columns are represented as int64.

df = pd.DataFrame({'count': [1, 2, 3, 4]})
print(df.dtypes)  # dtype will show 'int64'
count    int64
dtype: object

1.5 Date → datetime.date / datetime64[ns]

  • R: as.Date("2021-01-01")
  • Python: datetime.date
  • pandas: datetime64[ns] (can store both date and time, but you can ignore the time part if you only need dates).
df = pd.DataFrame({'date': pd.to_datetime(['2021-01-01', '2021-02-01', '2021-03-01'])})
print(df)
print(df.dtypes)  # dtype will show 'datetime64[ns]'
        date
0 2021-01-01
1 2021-02-01
2 2021-03-01
date    datetime64[ns]
dtype: object

If you need to store just the date part without the time, you can format it accordingly:

df['date_only'] = df['date'].dt.date
print(df['date_only'].head())  # dtype remains 'datetime64[ns]', but only the date part is used.
0    2021-01-01
1    2021-02-01
2    2021-03-01
Name: date_only, dtype: object

1.6 POSIXct / Date-Time → datetime.datetime / datetime64[ns]

  • R: as.POSIXct("2021-01-01 12:34:56")
  • Python: datetime.datetime
  • pandas: datetime64[ns]

For datetime values, pandas uses the datetime64[ns] dtype, which can store both date and time.

df = pd.DataFrame({'timestamp': pd.to_datetime(['2021-01-01 12:34:56', '2021-02-01 14:30:00'])})
print(df)
print(df.dtypes)  # dtype will show 'datetime64[ns]'
            timestamp
0 2021-01-01 12:34:56
1 2021-02-01 14:30:00
timestamp    datetime64[ns]
dtype: object

1.7 Logical → Boolean (bool)

  • R: TRUE, FALSE
  • Python: bool
  • pandas: bool

In pandas, boolean values are represented with the bool dtype.

df = pd.DataFrame({'is_active': [True, False, True]})
print(df.dtypes)  # dtype will show 'bool'
is_active    bool
dtype: object

1.8 8. NA / NULL → None or numpy.nan

  • R: NA, NULL
  • Python: None or numpy.nan
  • pandas: NaN or None

In pandas, missing values are typically represented by numpy.nan for numeric data and None for object or string data.

import numpy as np

df = pd.DataFrame({'numbers': [1, np.nan, 3], 'names': ['Alice', None, 'Charlie']})
print(df)
print(df.dtypes)  # 'numbers' is 'float64', 'names' is 'object'
   numbers    names
0      1.0    Alice
1      NaN     None
2      3.0  Charlie
numbers    float64
names       object
dtype: object