Shortcuts

Commands and Shells

In addition to structured model training workloads, which are handled using experiments, Determined also supports more free-form tasks using commands and shells. Commands and shells enable developers to use a Determined cluster and its GPUs without having to write code conforming to the trial APIs. Commands are useful for running existing code in a batch manner; shells provide access to the cluster in the form of interactive SSH sessions.

This document provides an overview of the most common CLI commands related to shells and commands; see Command-line Interface for full documentation.

Getting Started

Commands

Command-related CLI commands start with det command (which can be abbreviated to det cmd). The main subcommand is det cmd run, which runs a command in the cluster and streams its output. For example, the following CLI command uses nvidia-smi to display information about the GPUs available to tasks in the container:

det cmd run nvidia-smi

More complex commands including shell constructs can be run as well, as long as they are quoted to prevent interpretation by the local shell:

det cmd run 'for x in a b c; do echo $x; done'

det cmd run will stream output from the command until it finishes, but the command will continue executing and occupying cluster resources even if the CLI is interrupted or killed (e.g., due to Control-C being pressed). In order to stop the command or view further output from it, you’ll need its UUID, which can be obtained from the output of either the original det cmd run or det cmd list. Once you have the UUID, run det cmd logs <UUID> to view a snapshot of logs, det cmd logs -f <UUID> to view the current logs and continue streaming future output, or det cmd kill <UUID> to stop the command.

Shells

Shell-related CLI commands start with det shell. To start a persistent SSH server container in the Determined cluster and connect an interactive session to it, use det shell start:

det shell start

After starting a server with det shell start, you can make another independent connection to the same server by running det shell open <UUID>. The UUID can be obtained from the output of either the original det shell start command or det shell list:

$ det shell list
 Id                                   | Owner      | Description                  | State   | Exit Status
--------------------------------------+------------+------------------------------+---------+---------------
 d75c3908-fb11-4fa5-852c-4c32ed30703b | determined | Shell (annually-alert-crane) | RUNNING | N/A
$ det shell open d75c3908-fb11-4fa5-852c-4c32ed30703b

Optionally, you can provide extra options to pass to the SSH client when using det shell start or det shell open by including them after --. For example, this command will start a new shell and forward a port from the local machine to the container:

det shell start -- -L8080:localhost:8080

In order to stop the SSH server container and free up cluster resources, run det shell kill <UUID>.

Context Directories

Commands and shells become much more powerful with the use of the -c <directory> option, which tells Determined to transfer files from a directory on the local machine (the “context directory”) to the container. The contents of the context directory are placed into the directory /run/determined/workdir within the container before the command or shell starts running. /run/determined/workdir is also the initial working directory for commands, so they can easily access files from the context using relative paths.

$ mkdir context
$ echo 'print("hello world")' > context/run.py
$ det cmd run -c context python run.py

The total size of the files in the context directory must be less than 95 MB. Larger files, such as datasets, must be mounted into the container (see next section), downloaded after the container starts, or included in a custom Docker image.

Advanced Configuration

Additional configuration settings for both commands and shells can be set using the --config and --config-file options. Commonly useful settings include:

  • bind_mounts: Specifies directories to be bind-mounted into the container from the host machine. (Due to the structured values required for this setting, it needs to be specified in a config file.)

  • resources.slots: Specifies the number of slots the container will have access to. (Distributed commands and shells are not supported; all slots will be on one machine and attempting to use more slots than are available on one machine will prevent the container from being scheduled.)

  • environment.image: Specifies a custom Docker image to use for the container.

  • description: Specifies a description for the command or shell to distinguish it from others.