Determined Task Configuration¶
The behavior of Determined tasks, such as TensorBoards, notebooks, commands, and shells, can be influenced by setting a variety of configuration variables. These configuration variables are similar but not identical to the configuration options supported by experiments.
Configuration settings can be specified by passing a YAML configuration file when launching the workload via the Determined CLI:
$ det tensorboard start experiment_id --config-file=my_config.yaml $ det notebook start --config-file=my_config.yaml $ det cmd run --config-file=my_config.yaml ... $ det shell start --config-file=my_config.yaml
Configuration variables can also be set directly on the command line when any Determined task, except a TensorBoard, is launched:
$ det notebook start --config resources.slots=2 $ det cmd run --config description="determined_command" ... $ det shell start --config resources.priority=1
Options set via
--config take precedence over values specified in
the configuration file. Configuration settings are compatible with any
Determined task unless otherwise specified.
The following configuration settings are supported:
description: A human-readable description of the task. This does not need to be unique. The default description consists of a timestamp and the entrypoint of the command.
environment: Specifies the environment of the container that is used to execute the task.
image: Specifies a Docker image to use when executing the workload. The image must be available via
docker pullto every Determined agent machine in the cluster. Users can customize the image used for GPU vs. CPU agents by specifying a dict with two keys,
gpu. Defaults to
determinedai/environments:py-3.7-pytorch-1.7-tf-1.15-cpu-0.10.0for CPU agents and
determinedai/environments:cuda-10.2-pytorch-1.7-tf-1.15-gpu-0.10.0for GPU agents.
force_pull_image: Forcibly pull the image from the Docker registry and bypass the Docker cache. Defaults to
environment_variables: Specifies a list of environment variables for the container. Each element of the list should be a string of the form
NAME=VALUE. See Environment Variables for more details. Users can customize environment variables for GPU vs. CPU agents differently by specifying a dict with two keys,
pod_spec: Only applicable when running Determined on Kubernetes. Applies a pod spec to the pods that are launched by Determined for this task. See Specifying Custom Pod Specs for details.
registry_auth: Specifies the Docker registry credentials to use when pulling a Docker image, if needed.
resources: The resources Determined allows a task to use.
slots: Specifies the number of slots to use for the task. The default value is
1. The maximum value is the number of slots on the agent in the cluster with the most slots. For example, Determined will be unable to schedule a task that requests 4 slots if the Determined cluster is composed of agents with 2 slots each. The number of slots for TensorBoard is fixed at
0and may not be changed.
agent_label: If set, the task will only be scheduled on agents that have the given label set. If this is not set (the default behavior), the task will only be scheduled on unlabeled agents. An agent’s label can be configured via the
labelfield in the agent configuration.
shm_size: The size in bytes of
/dev/shmfor task containers. Defaults to
4294967296(4GiB). If set, this value overrides the value specified in the master configuration.
priority: The priority assigned to this task. Tasks with smaller priority values are scheduled before tasks with higher priority values. Only applicable when using the
priorityscheduler. Refer to Scheduling for more information.
resource_pool: The resource pool where this task will be scheduled. If no resource pool is specified, CPU-only tasks will be scheduled in the default CPU pool, while GPU-using tasks will be scheduled in the default GPU tool. Refer to Resource Pools for more information.
bind_mounts: Specifies a collection of directories that are bind-mounted into the Docker containers for execution. This can be used to allow commands to access additional data that is not contained in the command context. This field should consist of an array of entries. Note that users should ensure that the specified host paths are accessible on all agent hosts (e.g., by configuring a network file system appropriately). Defaults to an empty list.
host_path: (required) The file system path on each agent to use. Must be an absolute filepath.
container_path: (required) The file system path in the container to use. May be a relative filepath, in which case it will be mounted relative to the working directory inside the container. It is not allowed to mount directly into the working directory (
container_path == ".") to reduce the risk of cluttering the host filesystem.
read_only: Whether the bind-mount should be a read-only mount. Defaults to
propagation: (Advanced users only) Optional propagation behavior for replicas of the bind-mount. Defaults to
tensorboard_args: Lists optional arguments for launching TensorBoard. Each element of the list should be a string of the form