Menu Home

More on Parameterized Jupyter

I’d like to share a great new feature in the wvpy package (available at PyPi).

This package is useful in converting Jupiter notebooks to/from python, and also in rendering many parameterized notebooks. The idea is to make Jupyter notebook easier to use in production.

The latest feature is an extension of notebook parameterization. In addition to the init_code and output_suffix features, which allow adding arbitrary code to notebooks and saving multiple renders of the same notebook under different (non-coliding!) names. The new sheet_vars feature allows the insertion of arbitrary data into notebook renders (in addition to the earlier code insertion facility).

Let’s work through this with an example. We start with a notebook we wish to render with different parameters. For example, suppose each notebook is processing a few files; and we want to break the processing up into many renders to parallelize the task. Our example task notebook is here:





The notebook refers to an, at this point, undefined variable named sheet_vars. To debug this notebook we would define this variable in run the notebook in JupyterLab, VSCode, or other tools. When moving to production we would remove the debug setting and use wvpy to run the processes.

We would then use a process similar to the following notebook to run our jobs.





The user’s job is to define the “Jupyter tasks” and the rest is handled by wvpy. The first task renders as follows.





The data is moved from the driver to the task notebook through a temporary pickle file. The wvpy package inserts the pickle loading code at the top of the notebook. Notice this notebook processes "fname_1.txt" and "fname_2.txt". In production we are likely running notebooks largely for their side effects (reading, processing, and writing data) not for the HTML results.

However, if we want cleaner HTML results, one can turn off input cell rendering and get a cleaner result, as we see in the second result here:





All of the above examples are available here. I have used this lightweight system successfully in a number of projects, and hope you find it useful in your work.

Categories: Coding Exciting Techniques Tutorials

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

%d bloggers like this: