Jupyter Notebooks are a convenient way to develop and debug machine learning models, visualize the behavior of trained models, or even manage the training lifecycle of a model manually. Determined makes it easy to launch and manage notebooks.
Determined will schedule a Jupyter notebook in a containerized environment on the cluster and proxy HTTP requests to and from the notebook container through the Determined master.
The lifecycle management of Jupyter notebooks in Determined is left up to the user—once a Jupyter notebook has been scheduled onto the cluster, it will remain scheduled indefinitely until the user explicitly shuts down the notebook. Once a notebook has been terminated, it is not possible to reactivate it.
Because Notebooks use a containerized environment, unless they are configured to use a shared file system, they cannot persist files that are modified within the container. See Saving and Restoring Notebook State for more information.
Notebooks do not persist files by default. If a failure occurs (e.g., the agent hosting the notebook crashes), the content of the Notebook will not be saved.
Working with Notebooks¶
There are two ways to access notebooks in Determined: the command-line interface (CLI) and the WebUI. To install the CLI, see Install Determined CLI.
The following command will automatically start a notebook with a single GPU and open it in your browser.
det notebook start
--context option adds a folder or file to the notebook environment, allowing its contents to
be accessed from within the notebook.
det notebook start --context folder/file
--config-file option can be used to create a notebook with an environment specified by a
det notebook start --config-file config.yaml
For more information on how to write the notebook configuration file, see Notebook Configuration.
Other Useful Commands¶
A full list of notebook-related commands can be found by running:
det notebook --help
To view all running notebooks:
det notebook list
To kill a notebook, you need its ID, which can be found using the
det notebook kill <id>
Notebooks can also be started from the WebUI. You can click the “Tasks” tab to take you to a list of the tasks currently running on the cluster.
From here, you can find running notebooks. You can reopen, kill, or view logs for each notebook.
To create a new notebook, click “Launch Notebook”. If you would like to use a CPU-only notebook, click the dropdown arrow and select “Launch CPU-only Notebook”.
Notebooks may be supplied an optional notebook configuration to control aspects of the notebook’s environment. For example, to launch a notebook that uses two GPUs:
$ det notebook start --config resources.slots=2
In addition to the
--config flag, configuration may also be supplied via a YAML file
$ cat > config.yaml <<EOL description: test-notebook resources: slots: 2 bind_mounts: - host_path: /data/notebook_scratch container_path: /scratch EOL $ det notebook start --config-file config.yaml
See Determined Task Configuration for details on the supported configuration options.
Finally, to configure notebooks to run a predefined set of commands at startup, you can use a
startup hook along with the
$ mkdir my_context_dir $ echo "pip3 install pandas" > my_context_dir/startup-hook.sh $ det notebook start --context my_context_dir
Example: CPU-Only Notebooks¶
By default, each notebook is assigned a single GPU. This is appropriate for some uses of notebooks
(e.g., training a deep learning model) but unnecessary for other tasks (e.g., analyzing the training
metrics of a previously trained model). To launch a notebook that does not use any GPUs, set
$ det notebook start --config resources.slots=0
Saving and Restoring Notebook State¶
It is only possible to save and restore notebook state on Determined clusters that are configured with a shared filesystem available to all agents.
To ensure that your work is saved even if your notebook gets terminated, it is recommended to launch all notebooks with a shared filesystem directory bind-mounted into the notebook container and work on files inside of the bind mounted directory.
By default, clusters that are launched by
det deploy aws/gcp up create a Network file system
that is shared by all the agents and automatically mounted into Notebook containers.
For example, a user
jimmy with a shared filesystem home directory at
could use the following configuration to launch a notebook:
$ cat > config.yaml << EOL bind_mounts: - host_path: /shared/home/jimmy container_path: /shared/home/jimmy EOL $ det notebook start --config-file config.yaml
To launch a notebook with
det deploy local cluster-up, a user can add the
flag, which mounts the user’s home directory into the task containers by default:
$ det deploy local cluster-up --auto-bind-mount="/shared/home/jimmy" $ det notebook start
Working on a notebook file within the shared bind mounted directory will ensure that your code and
Jupyter checkpoints are saved on the shared filesystem rather than an ephemeral container
filesystem. If your notebook gets terminated, launching another notebook and loading the previous
notebook file will effectively restore the session of your previous notebook. To restore the full
notebook state (in addition to code), you can use Jupyter’s
Revert to Checkpoint
By default, JupyterLab will take a checkpoint every 120 seconds in an
folder in the same directory as the notebook file. To modify this setting, click on
Advanced Settings Editor and change the value of