Commands and Shells

In addition to structured model training workloads, which are handled using experiments, Determined also supports more free-form tasks using commands and shells. Commands and shells enable developers to use a Determined cluster and its GPUs without having to write code conforming to the trial APIs. Commands are useful for running existing code in a batch manner; shells provide access to the cluster in the form of interactive SSH sessions.

This document provides an overview of the most common CLI commands related to shells and commands; see Command-line Interface (CLI) for full documentation.

Getting Started

Commands

Command-related CLI commands start with det command (which can be abbreviated to det cmd). The main subcommand is det cmd run, which runs a command in the cluster and streams its output. For example, the following CLI command uses nvidia-smi to display information about the GPUs available to tasks in the container:

det cmd run nvidia-smi

More complex commands including shell constructs can be run as well, as long as they are quoted to prevent interpretation by the local shell:

det cmd run 'for x in a b c; do echo $x; done'

det cmd run will stream output from the command until it finishes, but the command will continue executing and occupying cluster resources even if the CLI is interrupted or killed (e.g., due to Control-C being pressed). In order to stop the command or view further output from it, you’ll need its UUID, which can be obtained from the output of either the original det cmd run or det cmd list. Once you have the UUID, run det cmd logs <UUID> to view a snapshot of logs, det cmd logs -f <UUID> to view the current logs and continue streaming future output, or det cmd kill <UUID> to stop the command.

Shells

Shell-related CLI commands start with det shell. To start a persistent SSH server container in the Determined cluster and connect an interactive session to it, use det shell start:

det shell start

After starting a server with det shell start, you can make another independent connection to the same server by running det shell open <UUID>. The UUID can be obtained from the output of either the original det shell start command or det shell list:

$ det shell list
 Id                                   | Owner      | Description                  | State   | Exit Status
--------------------------------------+------------+------------------------------+---------+---------------
 d75c3908-fb11-4fa5-852c-4c32ed30703b | determined | Shell (annually-alert-crane) | RUNNING | N/A
$ det shell open d75c3908-fb11-4fa5-852c-4c32ed30703b

Optionally, you can provide extra options to pass to the SSH client when using det shell start or det shell open by including them after --. For example, this command will start a new shell and forward a port from the local machine to the container:

det shell start -- -L8080:localhost:8080

In order to stop the SSH server container and free up cluster resources, run det shell kill <UUID>.

Context Directories

By using the -c <directory> option, files are transferred from a directory on the local machine (the “context directory”) to the container. The contents of the context directory are placed into the working directory within the container before the command or shell starts running, so files in the context can be easily accessed using relative paths.

$ mkdir context
$ echo 'print("hello world")' > context/run.py
$ det cmd run -c context python run.py

The total size of the files in the context directory must be less than 95 MB. Larger files, such as datasets, must be mounted into the container (see next section), downloaded after the container starts, or included in a custom Docker image.

Advanced Configuration

Additional configuration settings for both commands and shells can be set using the --config and --config-file options. Commonly useful settings include:

  • bind_mounts: Specifies directories to be bind-mounted into the container from the host machine. (Due to the structured values required for this setting, it needs to be specified in a config file.)

  • resources.slots: Specifies the number of slots the container will have access to. (Distributed commands and shells are not supported; all slots will be on one machine and attempting to use more slots than are available on one machine will prevent the container from being scheduled.)

  • environment.image: Specifies a custom Docker image to use for the container.

  • description: Specifies a description for the command or shell to distinguish it from others.

IDE integration

Determined shells can be used in the popular IDEs similarly to a common remote SSH host.

Visual Studio Code

  1. Make sure Visual Studio Code Remote - SSH extension is installed.

  2. Start a new shell and get its SSH command by running:

    det shell start --show-ssh-command
    

    You can also get the SSH command for an existing shell using:

    det shell show_ssh_command <SHELL UUID>
    
  3. Copy the SSH command, then select Remote-SSH: Add new SSH Host... from the Command Palette in VS Code, and paste the copied SSH command when prompted. Finally, you’ll be asked to pick a config file to use. The default option should work for most users.

  4. The remote host will now be available in the VS Code Remote Explorer. For further detail, please refer to the official documentation.

PyCharm

  1. PyCharm Professional is required for remote Python interpreters via SSH.

  2. Start a new shell and get its SSH command by running:

    det shell start --show-ssh-command
    

    You can also get the SSH command for an existing shell using:

    det shell show_ssh_command <SHELL UUID>
    
  3. As of this writing, PyCharm doesn’t support providing custom options in the SSH commands via the UI, so you’ll need to supply them via an entry in your ssh_config file, commonly located at ~/.ssh/config on Linux and macOS systems. Determined SSH command line will have the following pattern:

    ssh -o "ProxyCommand=<YOUR PROXY COMMAND>" -o StrictHostKeyChecking=no -tt -o IdentitiesOnly=yes -i <YOUR KEY PATH> -p <YOUR PORT NUMBER> <YOUR USERNAME>@<YOUR SHELL HOSTNAME>
    

    You’ll need to add the following to your SSH config:

    Host <YOUR SHELL HOSTNAME>
    HostName <YOUR SHELL HOSTNAME>
    ProxyCommand <YOUR PROXY COMMAND>
    StrictHostKeyChecking no
    IdentitiesOnly yes
    IdentityFile <YOUR KEY PATH>
    Port <YOUR PORT NUMBER>
    User <YOUR USERNAME>
    
  4. In PyCharm, open Settings/Preferences > Tools > SSH Configurations. Click the plus icon to add a new configuration. Enter YOUR HOST NAME, YOUR PORT NUMBER, and YOUR USERNAME in the corresponding fields. Then switch Authentication type dropdown to OpenSSH config and authentication agent. Save the new configuration by clicking “OK”.

  5. Use the new SSH configuration to setup a remote interpreter by following the official documentation.