Let’s take this as an excuse to take a quick look at what happens when we try a task in both systems.
For our task we picked the painful exercise of directly reading a 50,000,000 row by 50 column data set into memory on a machine with only 8GB of ram.
Pandas package takes around 6 minutes to read the data, and then one is ready to work.
readr::read_csv() fail with out of memory messages. So if your view of
R is “
base R only”, or “
base R plus
tidyverse only”, or “
tidyverse only”: reading this file is a “hard task.”
With the above narrow view one would have no choice but to move to
Python if one wants to get the job done.
Or, we could remember
data.table is obviously not part of the
data.table has been a best-practice in
R for around 12 years. It can read the data and is ready to work in
R in under a minute.
In conclusion, to get things done in a pinch: learn
Python or learn
data.table. And, in my opinion, “
tidyverse first teaching” (commonly code for “
tidyverse only teaching”) may not serve the
R community well in the long run.
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.