Determined Task Configuration¶
The behavior of Determined tasks, such as TensorBoards, notebooks, commands, and shells, can be influenced by setting a variety of configuration variables. Configuration settings can be specified by passing a YAML configuration file when launching the workload via the Determined CLI:
$ det tensorboard start experiment_id --config-file=my_config.yaml
$ det notebook start --config-file=my_config.yaml
$ det cmd run --config-file=my_config.yaml ...
$ det shell start --config-file=my_config.yaml
Configuration variables can also be set directly on the command line when any Determined task, except a TensorBoard, is launched:
$ det notebook start --config resources.slots=2
$ det cmd run --config description="determined_command" ...
$ det shell start --config resources.priority=1
Options set via --config
take precedence over values specified in
the configuration file. Configuration settings are compatible with any
Determined task unless otherwise specified.
Configuration Settings¶
The following configuration settings are supported:
description
: A human-readable description of the command/notebook. This does not need to be unique. The default description consists of a timestamp and the entrypoint of the command.environment
: Specifies the environment of the container that is used to execute the command/notebook.image
: Specifies a Docker image to use when executing the workload. The image must be available viadocker pull
to every Determined agent machine in the cluster. Users can customize the image used for GPU vs. CPU agents by specifying a dict with two keys,cpu
andgpu
. Defaults todeterminedai/environments:py-3.6.9-pytorch-1.4-tf-1.15-cpu-0.8.0
for CPU agents anddeterminedai/environments:cuda-10.0-pytorch-1.4-tf-1.15-gpu-0.8.0
for GPU agents.force_pull_image
: Forcibly pull the image from the Docker registry and bypass the Docker cache. Defaults tofalse
.environment_variables
: Specifies a list of environment variables for the container. Each element of the list should be a string of the formNAME=VALUE
. See Environment Variables for more details. Users can customize environment variables for GPU vs. CPU agents differently by specifying a dict with two keys,cpu
andgpu
.pod_spec
: Only applicable when running Determined on Kubernetes. Applies a pod spec to the pods that are launched by Determined for this task. See Specifying Custom Pod Specs for details.registry_auth
: Specifies the Docker registry credentials to use when pulling a Docker image, if needed.username
(required)password
(required)server
(optional)email
(optional)
resources
: The resources Determined allows a command/notebook to use.slots
: Specifies the number of slots to use for the command/notebook. The default value is1
. The maximum value is the number of slots on the agent in the cluster with the most slots. For example, Determined will be unable to schedule a command that requests 4 slots if the Determined cluster is composed of agents with 2 slots each. The number of slots for TensorBoard is fixed at0
and may not be changed.agent_label
: If set, the command/notebook will _only_ be scheduled on agents that have the given label set. If this is not set (the default behavior), the command/notebook will only be scheduled on unlabeled agents. An agent’s label can be configured via thelabel
field in the agent configuration.shm_size
: The size in bytes of/dev/shm
for trial containers. Defaults to4294967296
(4GiB). If set, this value overrides the value specified in the master configuration.priority
: The priority assigned to this experiment. Smaller priority assignment indicates higher priority. Only applicable when using thepriority
scheduler.
bind_mounts
: Specifies a collection of directories that are bind-mounted into the Docker containers for execution. This can be used to allow commands to access additional data that is not contained in the command context. This field should consist of an array of entries. Note that users should ensure that the specified host paths are accessible on all agent hosts (e.g., by configuring a network file system appropriately). Defaults to an empty list.host_path
: (required) The file system path on each agent to use. Must be an absolute filepath.container_path
: (required) The file system path in the container to use. May be a relative filepath, in which case it will be mounted relative to the working directory inside the container. It is not allowed to mount directly into the working directory (container_path == "."
) to reduce the risk of cluttering the host filesystem.read_only
: Whether the bind-mount should be a read-only mount. Defaults tofalse
.propagation
: (Advanced users only) Optional propagation behavior for replicas of the bind-mount. Defaults torprivate
.
tensorboard_args
: Lists optional arguments for launching TensorBoard. Each element of the list should be a string of the formNAME=VALUE
.