import pandas as pd
= pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie']})
df print(df.dtypes) # dtype will show 'object'
name object
dtype: object
Here is the mapping of R data types to their equivalent Python and pandas
data types:
R Data Type | Description | Python Data Type | pandas Data Type |
---|---|---|---|
character |
String values | str |
object or string |
factor |
Categorical data with fixed levels | str or Categorical |
Categorical |
double |
Decimal numbers (floating-point) | float |
float64 |
integer |
Whole numbers | int |
int64 |
date |
Date values (without time) | datetime.date |
datetime64[ns] (with date only) |
POSIXct / date.time |
Date-time values | datetime.datetime |
datetime64[ns] |
logical |
Boolean values (TRUE/FALSE) | bool |
bool |
NA / NULL |
Missing or null values | None , numpy.nan |
NaN , None |
Below are detailed examples and explanations for each data type.
str
or object
)"Hello"
, "abc"
str
pandas
are stored as object
(a generic type that can hold anything). However, starting with pandas 1.0, you can explicitly use the string
dtype.import pandas as pd
= pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie']})
df print(df.dtypes) # dtype will show 'object'
name object
dtype: object
You can also use the newer string
dtype:
= pd.DataFrame({'name': pd.Series(['Alice', 'Bob', 'Charlie'], dtype='string')})
df print(df.dtypes) # dtype will show 'string'
name string[python]
dtype: object
Categorical
)factor(c("low", "medium", "high"))
pd.Categorical
In Python, categorical data is represented using the Categorical
data type. This is useful for values that take a limited set of distinct values (e.g., “low”, “medium”, “high”).
= pd.DataFrame({
df 'rating': pd.Categorical(['low', 'medium', 'high', 'medium'], categories=['low', 'medium', 'high'], ordered=True)
})
print(df)
print(df.dtypes) # dtype will show 'category'
rating
0 low
1 medium
2 high
3 medium
rating category
dtype: object
float64
)3.14
, 2.718
float
float64
Floating-point numbers in pandas
are represented as float64
by default.
= pd.DataFrame({'pi': [3.14, 2.718, 1.618]})
df print(df.dtypes) # dtype will show 'float64'
pi float64
dtype: object
int64
)1
, 2
, 3
int
int64
In pandas
, integer columns are represented as int64
.
= pd.DataFrame({'count': [1, 2, 3, 4]})
df print(df.dtypes) # dtype will show 'int64'
count int64
dtype: object
datetime.date
/ datetime64[ns]
as.Date("2021-01-01")
datetime.date
datetime64[ns]
(can store both date and time, but you can ignore the time part if you only need dates).= pd.DataFrame({'date': pd.to_datetime(['2021-01-01', '2021-02-01', '2021-03-01'])})
df print(df)
print(df.dtypes) # dtype will show 'datetime64[ns]'
date
0 2021-01-01
1 2021-02-01
2 2021-03-01
date datetime64[ns]
dtype: object
If you need to store just the date part without the time, you can format it accordingly:
'date_only'] = df['date'].dt.date
df[print(df['date_only'].head()) # dtype remains 'datetime64[ns]', but only the date part is used.
0 2021-01-01
1 2021-02-01
2 2021-03-01
Name: date_only, dtype: object
datetime.datetime
/ datetime64[ns]
as.POSIXct("2021-01-01 12:34:56")
datetime.datetime
datetime64[ns]
For datetime values, pandas
uses the datetime64[ns]
dtype, which can store both date and time.
= pd.DataFrame({'timestamp': pd.to_datetime(['2021-01-01 12:34:56', '2021-02-01 14:30:00'])})
df print(df)
print(df.dtypes) # dtype will show 'datetime64[ns]'
timestamp
0 2021-01-01 12:34:56
1 2021-02-01 14:30:00
timestamp datetime64[ns]
dtype: object
bool
)TRUE
, FALSE
bool
bool
In pandas
, boolean values are represented with the bool
dtype.
= pd.DataFrame({'is_active': [True, False, True]})
df print(df.dtypes) # dtype will show 'bool'
is_active bool
dtype: object
None
or numpy.nan
NA
, NULL
None
or numpy.nan
NaN
or None
In pandas
, missing values are typically represented by numpy.nan
for numeric data and None
for object or string data.
import numpy as np
= pd.DataFrame({'numbers': [1, np.nan, 3], 'names': ['Alice', None, 'Charlie']})
df print(df)
print(df.dtypes) # 'numbers' is 'float64', 'names' is 'object'
numbers names
0 1.0 Alice
1 NaN None
2 3.0 Charlie
numbers float64
names object
dtype: object