YAML

YAML is a markup language often used for configuration. Determined uses YAML for configuring tasks such as experiments and notebooks, as well as configuring the Determined cluster as a whole. This guide describes a subset of YAML that we recommend for use with Determined. This is not a full description of YAML; see the specification or other online guides for more details.

Overview

A value in YAML can be a scalar (null or a number, string, or Boolean) or a collection (an array or map). Collections can contain other collections nested to any depth (though Determined’s YAML files generally have a fairly fixed structure).

A comment in a YAML file starts with a # character and extends to the end of the line.

If you are familiar with JSON, you can think of YAML as an alternative way of expressing JSON objects that is meant to be easier for humans to read and write, since it allows comments and has fewer markup characters around the content.

Maps

Maps represent unordered mappings from strings to YAML values. A map is written as a sequence of key-value pairs. Each key is followed by a colon and the corresponding value. The value can be on the same line as the key if it is a scalar (in which case it must be preceded by a space) or on subsequent lines (in which case it must be indented, conventionally by two spaces).

We use a map in the experiment configuration to configure hyperparameters:

hyperparameters:
  base_learning_rate: 0.001
  weight_cost: 0.0001
  global_batch_size: 64
  n_filters1: 40
  n_filters2: 40

The snippet above describes a map with one key, hyperparameters; the corresponding value is itself a map whose keys are base_learning_rate, weight_cost, etc.

Arrays

An array contains multiple other YAML values in some order. An array is written as a sequence of values, each one preceded by a hyphen and a space. The hyphens for one list must all be indented by the same amount.

We use an array in the experiment configuration to configure environment variables:

environment:
  environment_variables:
    - A=A
    - B=B
    - C=C

Scalars

Scalars generally behave naturally: null, true, 2.718, and "foo" all have the same meanings that they would in JSON (and many programming languages). However, YAML allows strings to be unquoted: foo is the same as "foo". This behavior is often convenient, but it can lead to unexpected behavior when small edits to a value change its type. For example, the following YAML block represents a list containing several values whose types are listed in the comments:

- true          # Boolean
- grue          # string

- 0.0           # number
- 0.0.          # string

- foo: bar      # map
- foo:bar       # string
- foo bar       # string

Putting It All Together

A Determined configuration file consists of a YAML object with a particular structure: a map at the top level that is expected to have certain keys, with the value for each key expected to have a certain structure in turn.

In this example experiment configuration, we use numbers, strings, maps, and an array:

name: mnist_tf_const
data:
  base_url: https://s3-us-west-2.amazonaws.com/determined-ai-datasets/mnist/
  training_data: train-images-idx3-ubyte.gz
  training_labels: train-labels-idx1-ubyte.gz
  validation_set_size: 10000
hyperparameters:
  base_learning_rate: 0.001
  weight_cost: 0.0001
  global_batch_size: 64
  n_filters1: 40
  n_filters2: 40
searcher:
  name: single
  metric: error
  max_length:
    batches: 500
  smaller_is_better: true
environment:
  environment_variables:
    - A=A
    - B=B
    - C=C

Reference