Menu Home

A Comment on Data Science Integrated Development Environments

A point that differs from our experience struck us in the recent note regarding doing data science in Python:

A development environment [for Python] specifically tailored to the data science sector on the level of RStudio, for example, does not (yet) exist.

“What’s the Best Statistical Software? A Comparison of R, Python, SAS, SPSS and STATA” Amit Ghosh

Actually, Python has a large number of very capable integrated development environments, some of which are specifically tailored for data science. Please read on for a small list of tools, and my recommendations for a specific data science in Python toolchain.

Off the top of my head I remember the following Python tools:

  • PyCharm, both Community Edition “The Python IDE for Professional Developers”, and Professional Edition “For both Scientific and Web Python development. With HTML, JS, and SQL support”. This IDE has amazing re-factoring and completion abilities, and automatically criticizes your code relative the PEP8 code style recommendations.
  • Black “The uncompromising code formatter”.
  • JupyterLab “a web-based interactive development environment for Jupyter notebooks, code, and data” (the successor to Jupyter Notebook and IPython Notebook).
  • The Anaconda distribution, a great package set and package manager.
  • Spyder “a powerful scientific environment written in Python”.
  • Apache Zeppelin “Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more”.
  • VS Code for Python a Python IDE based on Visual Studio.
  • PyDev an Eclipse based Python IDE.
  • elpy “Emacs Python Development Environment”.
  • Dash “a framework for building analytical web applications”.

My current “data science in Python” goto tools are: PyCharm, JupyterLab, Black, and Anaconda. PyCharm is one of the best IDEs I have seen, JupyterLab notebooks are good for capturing reproducible research and mixing documentation and code, Black greatly improves your code, and Anaconda makes environment management easy.

Categories: Opinion

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

4 replies

  1. I use both R/R Studio and Python/Spyder/Jupyter Notebook/Anaconda. I definitely agree that R Studio is a much easier environment to work in. In one place I get literate programming (R Markdown with previews), an environment viewer, full package/library management, and file management. To do the same for Python requires Spyder, Jupyter Notebook, AND Anaconda navigator (plus the OS file manager). If any tool integrates this (i.e. an IDE) I would love to know this, because I am trying to convert a data analysis course from R to Python in the next couple of years, and this fact (and that I really do have to teach a lot of Python up front) is going to causing me to cut coverage and capability.

    1. I would suggest giving PyCharm a try. It unfortunately doesn’t emphasize the literate programming integration. And you would still lose some time to having multiple tools up. But your class would likely get the time back due to PyCharm’s more powerful help, completion, checking, and code editing/re-factoring tools. I’ve found PyCharm to be a more capable IDE than RStudio, once you take the time to get learn the features.

      I would say the RStudio offering is a bit quicker to pick up, and is a more integrated broader offering. However, the Python offerings have the possibility being deeper and more capable on some of the tasks. In particular PyCharm has a lot going for it once you get familiar with it. That being said, I am still a bit more comfortable with RMarkdown than with JupyterLab, but for me they are close enough.

  2. The python ides are just not good for eda imo. I use jupyter but am then left with the extra step of converting the notebook to a script when it’s time to productionalize. It’s also nice to have package management, git and the other goodies in rstudio.

    1. For EDA projects I tend to prefer R. I also prefer RMarkdown to JupyterLab.

      For supervised machine learning in production, I’ve had good luck with Python. For working on code I prefer PyCharm over RStudio.

      It is great to have both systems.

%d bloggers like this: