Notebooks¶
Jupyter Notebooks are a convenient way to develop and debug machine learning models, visualize the behavior of trained models, or even manage the training lifecycle of a model manually. Determined makes it easy to launch and manage notebooks.
Determined will schedule a Jupyter notebook in a containerized environment on the cluster and proxy HTTP requests to and from the notebook container through the Determined master. The lifecycle management of Jupyter notebooks in Determined is left up to the user—once a Jupyter notebook has been scheduled onto the cluster, it will remain scheduled indefinitely until the user explicitly shuts down the notebook. Once a notebook has been terminated, it is not possible to reactivate it. However, new notebooks can easily be configured to restore the state of a previous notebook—see Saving and Restoring Notebook State for more information.
Working with Notebooks¶
There are two ways to access notebooks in Determined: the command-line interface (CLI) and the WebUI. To install the CLI, see Install Determined CLI.
Command Line¶
The following command will automatically start a notebook with a single GPU and open it in your browser.
det notebook start
The --context
option adds a folder or file to the notebook environment, allowing its contents to be accessed from within the notebook.
det notebook start --context folder/file
The --config-file
option can be used to create a notebook with an environment specified by a configuration file.
det notebook start --config-file config.yaml
For more information on how to write the notebook configuration file, see Notebook Configuration.
Warning
In Determined, notebook state is not persistent by default. If a failure occurs (e.g., the agent hosting the notebook crashes), the content of the notebook will not be saved. To ensure that notebook state is persisted, refer to Saving and Restoring Notebook State for more information.
Other Useful Commands¶
A full list of notebook commands can be found by running:
det notebook --help
To view all running notebooks:
det notebook list
To kill a notebook, you need its ID, which can be found using the list command.
det notebook kill <id>
WebUI¶
Notebooks can also be started from the WebUI by first going to the Determined home page. On the left-hand side, you can click the notebooks tab to take you to the notebooks page.
From here, you can find running notebooks. You can reopen, kill, or view logs for each notebook.
To create a new notebook, click “Launch new notebook”. If you would like to use a CPU-only notebook, click the dropdown arrow and select “Launch new CPU-only notebook”.
Notebook Configuration¶
Notebooks may be supplied an optional notebook configuration to control aspects of the notebook’s environment. For example, to launch a notebook that uses two GPUs:
$ det notebook start --config resources.slots=2
In addition to the --config
flag, configuration may also be supplied
via a YAML file (--config-file
):
$ cat > config.yaml <<EOL
description: test-notebook
resources:
slots: 2
bind_mounts:
- host_path: /data/notebook_scratch
container_path: /scratch
EOL
$ det notebook start --config-file config.yaml
See Command and Notebook Configuration for details on the supported configuration options.
Example: CPU-Only Notebooks¶
By default, each notebook is assigned a single GPU. This is appropriate for
some uses of notebooks (e.g., training a deep learning model) but
unnecessary for other tasks (e.g., analyzing the training
metrics of a previously trained model). To launch a notebook that does
not use any GPUs, set resources.slots
to 0
:
$ det notebook start --config resources.slots=0
Saving and Restoring Notebook State¶
Warning
It is only possible to save and restore notebook state on Determined clusters that are configured with a shared filesystem available to all agents.
To ensure that your work is saved even if your notebook gets terminated,
it is recommended to launch all notebooks with a shared filesystem
directory bind-mounted into the notebook container and work on files
inside of the bind mounted directory. For example, a user jimmy
with
a shared filesystem home directory at /shared/home/jimmy
could use
the following configuration to launch a notebook:
$ cat > config.yaml << EOL
bind_mounts:
- host_path: /shared/home/jimmy
container_path: /shared/home/jimmy
EOL
$ det notebook start --config-file config.yaml
Working on a notebook file within the shared bind mounted directory will
ensure that your code and Jupyter checkpoints are saved on the shared
filesystem rather than an ephemeral container filesystem. If your
notebook gets terminated, launching another notebook and loading the
previous notebook file will effectively restore the session of your
previous notebook. To restore the full notebook state (in addition to
code), you can use Jupyter’s File
> Revert to Checkpoint
functionality.
Note
By default, JupyterLab will take a checkpoint every 120 seconds in an
.ipynb_checkpoints
folder in the same directory as the notebook file. To
modify this setting, click on Settings
> Advanced Settings Editor
and
change the value of "autosaveInternal"
under Document Manager
.