Jupyter Notebooks

Jupyter notebooks are a convenient way to develop and debug machine learning models, visualize the behavior of trained models, or even manage the training lifecycle of a model manually. PEDL makes it easy to launch and manage notebooks. By default, each notebook is assigned a single GPU, but this can easily be changed -- CPU-Only Notebooks.

PEDL will schedule a Jupyter notebook in a containerized environment on the cluster and proxy HTTP requests to and from the notebook container through the PEDL master. The lifecycle management of Jupyter notebooks in PEDL is left up to the user -- once a Jupyter notebook has been scheduled onto the cluster, it will remain scheduled indefinitely until the user explicitly shuts down the notebook. Once a notebook has been terminated, it is not possible to reactivate it. However, new notebooks can easily be configured to restore the state of a previous notebook -- see Saving and Restoring Notebook State for more information.

Quick Start

To launch a notebook, start by installing the PEDL command line interface on a development machine.

Once the CLI is installed, try launching your first notebook with the pedl notebook start command:

$ pedl notebook start
Scheduling notebook unique-oyster (id: 5b2a9ea4-a6bb-4d2b-b42b-25e4064a3220)...
[DOCKER BUILD 🔨] Step 1/11 : FROM nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04
[DOCKER BUILD 🔨]  ---> 9918ba890dca
[DOCKER BUILD 🔨] Step 2/11 : RUN rm /etc/apt/sources.list.d/*
[DOCKER BUILD 🔨] Successfully tagged nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04-73bf63cc864088137a477ce62f39ffe8
[PEDL] 2019-04-04T17:53:22.076591700Z [I 17:53:22.075 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[PEDL] 2019-04-04T17:53:23.067911400Z [W 17:53:23.067 NotebookApp] All authentication is disabled.  Anyone who can connect to this server will be able to run code.
[PEDL] 2019-04-04T17:53:23.073644300Z [I 17:53:23.073 NotebookApp] Serving notebooks from local directory: /
disconnecting websocket
Jupyter Notebook is running at: http://localhost:8080/proxy/5b2a9ea4-a6bb-4d2b-b42b-25e4064a3220-notebook-0/lab/tree/Notebook.ipynb?reset

After the notebook has been scheduled onto the cluster, the PEDL CLI will open a web browser window pointed to that notebook's URL. Back in the terminal, you can use the pedl notebook list command to see that this notebook is one of those currently RUNNING on the PEDL cluster:

$ pedl notebook list
 Id                                   | Entry Point                                            | Registered Time              | State
 0f519413-2411-4b3c-adbc-9b1b60c96156 | ['jupyter', 'notebook', '--config', '/etc/jupyter.py'] | 2019-04-04T17:52:48.1961129Z | RUNNING
 5b2a9ea4-a6bb-4d2b-b42b-25e4064a3220 | ['jupyter', 'notebook', '--config', '/etc/jupyter.py'] | 2019-04-04T17:53:20.387903Z  | RUNNING
 66da599e-62d2-4c2d-91c4-01a04045e4ab | ['jupyter', 'notebook', '--config', '/etc/jupyter.py'] | 2019-04-04T17:52:58.4573214Z | RUNNING

Since the lifecycle management of Jupyter notebooks in PEDL is left up to the user, this notebook will continue running until it is explicitly shut down. To terminate the notebook, you can use the pedl notebook kill command:

$ pedl notebook kill 5b2a9ea4-a6bb-4d2b-b42b-25e4064a3220
Killed notebook 5b2a9ea4-a6bb-4d2b-b42b-25e4064a3220

Notebook Configuration

PEDL makes it easy to modify the dependencies that are installed into the notebook's environment:

$ pedl notebook start --config environment.tensorflow=1.13.1

More generally, notebooks may be supplied an optional notebook configuration to configure the notebook's enviornment. In addition to the --config flag, configuration may also be supplied via a YAML file (--config-file):

$ cat > config.yaml << EOL
description: test-notebook
  slots: 2
  python: "3.6.9"
  tensorflow: "1.13.1"
  keras: "2.2.4"
  - host_path: /data/notebook_scratch
    container_path: /scratch
$ pedl notebook start --config-file config.yaml

See the Notebook Configuration section for full documentation of the supported configuration options. Note that notebooks share the same configuration schema as commands.

CPU-Only Notebooks

By default, each notebook is assigned a single GPU. This is useful for some uses of notebook (e.g., training a deep learning model) but unnecessary for other notebook tasks (e.g., analyzing the training metrics of a previously trained model). To launch a notebook that does not use any GPUs, set resources.slots to 0:

$ pedl notebook start --config resources.slots=0

CPU-only notebooks will be scheduled on a randomly chosen agent.

Using the CLI in Notebooks

The PEDL CLI is installed into notebook containers by default. This allows users to interact with PEDL from inside a notebook -- e.g., to launch new deep learning workloads or examine the metrics from an active or historical PEDL experiment. For example, to list PEDL experiments from inside a notebook, run the notebook command !pedl experiment list.

Saving and Restoring Notebook State


It is only possible to save and restore notebook state on PEDL clusters that are configured with a shared filesystem available to all agents.

To ensure that your work is saved even if your notebook gets terminated, it is recommended to launch all notebooks with a shared filesystem directory bind-mounted into the notebook container and work on files inside of the bind mounted directory. For example, a user jimmy with a shared filesystem home directory at /shared/home/jimmy could use the following configuration to launch a notebook:

$ cat > config.yaml << EOL
  - host_path: /shared/home/jimmy
    container_path: /shared/home/jimmy
$ pedl notebook start --config-file config.yaml

Working on a notebook file within the shared bind mounted directory will ensure that your code and Jupyter checkpoints are saved on the shared filesystem as opposed to on an ephemeral container filesystem. If your notebook gets terminated, launching another notebook and loading the previous notebook file will effectively restore the session of your previous notebook. To restore the full notebook state (in addition to code), you can use Jupyter's File > Revert to Checkpoint functionality.


By default, Jupyter Lab will take a checkpoint every 120 seconds in an .ipynb_checkpoints folder in the same directory as the notebook file. To modify this setting, click on Settings > Advanced Settings Editor, and change the value of "autosaveInternal" under Document Manager.