Commands and Shells¶
In addition to structured model training workloads, which are handled using experiments, Determined also supports more free-form tasks using commands and shells. Commands and shells enable developers to use a Determined cluster and its GPUs without having to write code conforming to the trial APIs. Commands are useful for running existing code in a batch manner; shells provide access to the cluster in the form of interactive SSH sessions.
This document provides an overview of the most common CLI commands related to shells and commands; see Command-line Interface for full documentation.
Command-related CLI commands start with
det command (which can be
det cmd). The main subcommand is
det cmd run,
which runs a command in the cluster and streams its output. For example,
the following CLI command uses
nvidia-smi to display information
about the GPUs available to tasks in the container:
det cmd run nvidia-smi
More complex commands including shell constructs can be run as well, as long as they are quoted to prevent interpretation by the local shell:
det cmd run 'for x in a b c; do echo $x; done'
det cmd run will stream output from the command until it finishes,
but the command will continue executing and occupying cluster resources
even if the CLI is interrupted or killed (e.g., due to Control-C being
pressed). In order to stop the command or view further output from it,
you’ll need its UUID, which can be obtained from the output of either
det cmd run or
det cmd list. Once you have the
det cmd logs <UUID> to view a snapshot of logs,
logs -f <UUID> to view the current logs and continue streaming future
det cmd kill <UUID> to stop the command.
Shell-related CLI commands start with
det shell. To start a
persistent SSH server container in the Determined cluster and connect an
interactive session to it, use
det shell start:
det shell start
After starting a server with
det shell start, you can make another
independent connection to the same server by running
det shell open
<UUID>. The UUID can be obtained from the output of either the
det shell start command or
det shell list:
$ det shell list Id | Owner | Description | State | Exit Status --------------------------------------+------------+------------------------------+---------+--------------- d75c3908-fb11-4fa5-852c-4c32ed30703b | determined | Shell (annually-alert-crane) | RUNNING | N/A $ det shell open d75c3908-fb11-4fa5-852c-4c32ed30703b
Optionally, you can provide extra options to pass to the SSH client when
det shell start or
det shell open by including them after
--. For example, this command will start a new shell and forward a
port from the local machine to the container:
det shell start -- -L8080:localhost:8080
In order to stop the SSH server container and free up cluster resources,
det shell kill <UUID>.
Commands and shells become much more powerful with the use of the
<directory> option, which tells Determined to transfer files from a
directory on the local machine (the “context directory”) to the
container. The contents of the context directory are placed into the
/run/determined/workdir within the container before the
command or shell starts running.
/run/determined/workdir is also the
initial working directory for commands, so they can easily access files
from the context using relative paths.
$ mkdir context $ echo 'print("hello world")' > context/run.py $ det cmd run -c context python run.py
The total size of the files in the context directory must be less than 95 MB. Larger files, such as datasets, must be mounted into the container (see next section), downloaded after the container starts, or included in a custom Docker image.
Additional configuration settings for both commands and shells can be
set using the
--config-file options. Commonly
useful settings include:
bind_mounts: Specifies directories to be bind-mounted into the container from the host machine. (Due to the structured values required for this setting, it needs to be specified in a config file.)
resources.slots: Specifies the number of slots the container will have access to. (Distributed commands and shells are not supported; all slots will be on one machine and attempting to use more slots than are available on one machine will prevent the container from being scheduled.)
environment.image: Specifies a custom Docker image to use for the container.
description: Specifies a description for the command or shell to distinguish it from others.