import pandas as pd
df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie']})
print(df.dtypes) # dtype will show 'object'name object
dtype: object
Here is the mapping of R data types to their equivalent Python and pandas data types:
| R Data Type | Description | Python Data Type | pandas Data Type |
|---|---|---|---|
character |
String values | str |
object or string |
factor |
Categorical data with fixed levels | str or Categorical |
Categorical |
double |
Decimal numbers (floating-point) | float |
float64 |
integer |
Whole numbers | int |
int64 |
date |
Date values (without time) | datetime.date |
datetime64[ns] (with date only) |
POSIXct / date.time |
Date-time values | datetime.datetime |
datetime64[ns] |
logical |
Boolean values (TRUE/FALSE) | bool |
bool |
NA / NULL |
Missing or null values | None, numpy.nan |
NaN, None |
Below are detailed examples and explanations for each data type.
str or object)"Hello", "abc"strpandas are stored as object (a generic type that can hold anything). However, starting with pandas 1.0, you can explicitly use the string dtype.import pandas as pd
df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie']})
print(df.dtypes) # dtype will show 'object'name object
dtype: object
You can also use the newer string dtype:
df = pd.DataFrame({'name': pd.Series(['Alice', 'Bob', 'Charlie'], dtype='string')})
print(df.dtypes) # dtype will show 'string'name string[python]
dtype: object
Categorical)factor(c("low", "medium", "high"))pd.CategoricalIn Python, categorical data is represented using the Categorical data type. This is useful for values that take a limited set of distinct values (e.g., “low”, “medium”, “high”).
df = pd.DataFrame({
'rating': pd.Categorical(['low', 'medium', 'high', 'medium'], categories=['low', 'medium', 'high'], ordered=True)
})
print(df)
print(df.dtypes) # dtype will show 'category' rating
0 low
1 medium
2 high
3 medium
rating category
dtype: object
float64)3.14, 2.718floatfloat64Floating-point numbers in pandas are represented as float64 by default.
df = pd.DataFrame({'pi': [3.14, 2.718, 1.618]})
print(df.dtypes) # dtype will show 'float64'pi float64
dtype: object
int64)1, 2, 3intint64In pandas, integer columns are represented as int64.
df = pd.DataFrame({'count': [1, 2, 3, 4]})
print(df.dtypes) # dtype will show 'int64'count int64
dtype: object
datetime.date / datetime64[ns]as.Date("2021-01-01")datetime.datedatetime64[ns] (can store both date and time, but you can ignore the time part if you only need dates).df = pd.DataFrame({'date': pd.to_datetime(['2021-01-01', '2021-02-01', '2021-03-01'])})
print(df)
print(df.dtypes) # dtype will show 'datetime64[ns]' date
0 2021-01-01
1 2021-02-01
2 2021-03-01
date datetime64[ns]
dtype: object
If you need to store just the date part without the time, you can format it accordingly:
df['date_only'] = df['date'].dt.date
print(df['date_only'].head()) # dtype remains 'datetime64[ns]', but only the date part is used.0 2021-01-01
1 2021-02-01
2 2021-03-01
Name: date_only, dtype: object
datetime.datetime / datetime64[ns]as.POSIXct("2021-01-01 12:34:56")datetime.datetimedatetime64[ns]For datetime values, pandas uses the datetime64[ns] dtype, which can store both date and time.
df = pd.DataFrame({'timestamp': pd.to_datetime(['2021-01-01 12:34:56', '2021-02-01 14:30:00'])})
print(df)
print(df.dtypes) # dtype will show 'datetime64[ns]' timestamp
0 2021-01-01 12:34:56
1 2021-02-01 14:30:00
timestamp datetime64[ns]
dtype: object
bool)TRUE, FALSEboolboolIn pandas, boolean values are represented with the bool dtype.
df = pd.DataFrame({'is_active': [True, False, True]})
print(df.dtypes) # dtype will show 'bool'is_active bool
dtype: object
None or numpy.nanNA, NULLNone or numpy.nanNaN or NoneIn pandas, missing values are typically represented by numpy.nan for numeric data and None for object or string data.
import numpy as np
df = pd.DataFrame({'numbers': [1, np.nan, 3], 'names': ['Alice', None, 'Charlie']})
print(df)
print(df.dtypes) # 'numbers' is 'float64', 'names' is 'object' numbers names
0 1.0 Alice
1 NaN None
2 3.0 Charlie
numbers float64
names object
dtype: object