Agent Configuration Reference¶
config_file
: Path to the agent configuration file. Normally this should only be set via an environment variable or command-line option. Defaults to/etc/determined/agent.yaml
.master_host
(required): The hostname or IP address of the Determined master.master_port
: The port of the Determined master. Defaults to443
if TLS is enabled and80
otherwise.agent_id
: The ID of this agent; defaults to the hostname of the current machine. Agent IDs must be unique within a cluster.container_master_host
: Master hostname that containers started by this agent will connect to. Defaults to the value ofmaster_host
.container_master_port
: Master port that containers started by this agent will connect to. Defaults to the value ofmaster_port
.resource_pool
: Which resource pool the agent should join. Defaults to the value ofdefault
, which will work if and only if there is a resource pool nameddefault
. For more information please see Resource Pools.visible_gpus
: The GPUs that should be exposed as slots by the agent. A comma-separated list of GPUs, each specified by a 0-based index, UUID, PCI bus ID, or board serial number. The 0-based index of NVIDIA GPUs or AMD GPUs can be obtained via thenvidia-smi
orrocm-smi
commands.slot_type
: The slot type that should be exposed. Dynamic agents having GPUs will be configured tocuda
, agents without GPUs withcpu_slots_allowed: true
provisioner option will be configured tocpu
, andnone
otherwise. For static agents this field defaults toauto
.auto
: Automatically detects the slot type. The agent will detect if there are NVIDIA GPUs or AMD GPUs. If there are GPUs, it maps each GPU to one slot. Otherwise, it maps all the CPUs to a slot.none
: The agent will not create any slots for detected devices.cuda
: The agent will map each detected NVIDIA GPU to a slot. Prior to 0.17.6, this option was calledgpu
.cpu
: Map all the CPUs to a slot, even when GPUs are present.rocm
: The agent will map each detected ROCm AMD GPU to a slot.
http_proxy
: The HTTP proxy address for the agent’s containers.https_proxy
: The HTTPS proxy address for the agent’s containers.ftp_proxy
: The FTP proxy address for the agent’s containers.no_proxy
: The addresses that the agent’s containers should not proxy.security
: Security-related configuration settings.tls
: Configuration settings for TLS.enabled
: Whether to use TLS to connect to the master. Defaults tofalse
.skip_verify
: Skip verifying the master certificate when using TLS. Defaults tofalse
. Enabling this setting will reduce the security of your Determined cluster.master_cert
: CA cert file for the master when using TLS.master_cert_name
: A hostname for which the master’s TLS certificate is valid, if the value of themaster_host
option is an IP address or is not contained in the certificate.client_cert
/client_key
: Paths to files containing the client TLS certificate and key to use when connecting to the master.
fluent
: fluentd settings.image
: Docker image to use for the managed Fluent Bit daemon. Defaults tofluent/fluent-bit:1.9.3
.port
: TCP port for the Fluent Bit daemon to listen on. Defaults to 24224. When running multiple agents on the same node, should be unique.container_name
: Name for the Fluent Bit container. Defaults todetermined-fluent
. When running multiple agents on the same node, should be unique.
agent_reconnect_attempts
: Maximum number of times agent will attempt to reconnect to master on connection failure. Defaults to 5.agent_reconnect_backoff
: Time interval between reconnection attempts, in seconds. Defaults to 5 seconds.container_auto_remove_disabled
(debug): Whether to disable settingAutoRemove
flag on task containers. Defaults to false.hooks
: Configuration for commands to run when certain events occur. The value of each option in this section is an array of strings specifying the command and its arguments.on_connection_lost
: A command to run when the agent fails to either connect to the master on startup or reconnect after a loss of connection. (When reconnecting, the agent will make several attempts as specified by theagent_reconnect_attempts
andagent_reconnect_backoff
configuration options.)In order to shut down the machine on which the agent is running, set this to
["sudo", "shutdown", "now"]
, or just["shutdown", "now"]
if the agent is running as root. Additional system configuration may be required in order to allow the agent to execute the command from inside a Docker container or without the need to enter a password.
label
: This field has been deprecated and will be ignored. Useresource_pool
instead.