Shortcuts

Environment Configuration

Determined launches workloads using Docker containers. The container configuration is referred to as the environment.

There are three methods to customize the environment that workloads execute in:

  1. Environment variables

  2. Specifying a startup hook (startup-hook.sh)

  3. Using a custom Docker image

Environment Variables

For both trial runners and commands, Determined allows users to configure the environment variables inside the container through the environment.environment_variables configuration field of the experiment config. The format is a list of strings in the format NAME=VALUE:

environment:
  environment_variables:
    - A=hello world
    - B=$A
    - C=${B}
    # `A`, `B`, and `C` will each have the value `hello_world` in the container.

Variables are set sequentially, which affect variables that depend on the expansion of other variables.

Proxy variables set in this way will take precedent over those set using the agent configuration.

Startup Hooks

If a file named startup-hook.sh exists at the top level of your model definition directory, Determined will automatically execute this file during the startup of every Docker container. This occurs before any Python interpreters are launched or any deep learning operations are performed; this allows the startup hook to customize the container environment, install additional dependencies, download data sets, or do practically anything else that you can do in a shell script.

Note

Startup hooks are not cached and run before the start of every workload. Hence, performing expensive or long-running operations in a startup hook can result in poor performance.

Here is an example of a startup hook that installs the wget utility and the Python package pandas:

apt-get update && apt-get install -y wget
python3.6 -m pip install pandas

The Iris example contains a TensorFlow Keras model that uses a startup hook to install an additional Python dependency.

Container Images

Determined provides a set of officially supported Docker images. These are the default images used to launch containers for experiments, commands, and any other workflow in the Determined system.

Default Images

In the current version of Determined, the experiments and commands are executed in containers with the following:

  • Ubuntu 18.04

  • CUDA 10.0

  • Python 3.6.9

  • TensorFlow 1.15.0

  • PyTorch 1.4.0

Determined will automatically select GPU-specific versions of each library when running on agents with GPUs.

In addition to the above settings, all trial runner containers are launched with additional Determined-specific harness code that orchestrates model training and evaluation in the container. Trial runner containers are also loaded with the experiment’s model definition and values of the hyperparameters for the current trial.

Note

The default images are determinedai/environments:cuda-10.0-pytorch-1.4-tf-1.15-gpu-0.7.0 and determinedai/environments:py-3.6.9-pytorch-1.4-tf-1.15-cpu-0.7.0 for GPU and CPU respectively.

TensorFlow 2 Images

Determined also supports TensorFlow 2.2 and has a Docker image you can use for experiments and commands containing the following:

  • Ubuntu 18.04

  • CUDA 10.1

  • Python 3.6.9

  • TensorFlow 2.2.0

  • PyTorch 1.4.0

This can be configured in your experiment configuration like below:

environment:
  image:
    gpu: "determinedai/environments:cuda-10.1-pytorch-1.4-tf-2.2-gpu-0.7.0"
    cpu: "determinedai/environments:py-3.6.9-pytorch-1.4-tf-2.2-cpu-0.7.0"

Custom Images

While the official images contain all the dependencies needed for basic deep learning workloads, many workloads have extra dependencies. If those extra dependencies are quick to install, you may want to consider using a startup hook. For situations where installing dependencies via startup-hook.sh would take too long, we suggest building your own Docker image and publishing to a Docker registry like Docker Hub. We recommend that custom images use one of the official Determined images as a base image (using the FROM instruction).

Warning

It is important to not install the TensorFlow, PyTorch, Horovod, or Apex packages as doing so will conflict with the base packages that are installed into Determined’s official environments.

Here is an example of a Dockerfile that installs both conda- and pip-based dependencies.

FROM determinedai/environments:cuda-10.0-pytorch-1.4-tf-1.15-gpu-0.7.0
RUN apt-get update && apt-get install -y unzip python-opencv graphviz
COPY environment.yml /tmp/environment.yml
COPY pip_requirements.txt /tmp/pip_requirements.txt
RUN conda env update --name base --file /tmp/environment.yml && \
    conda clean --all --force-pkgs-dirs --yes
RUN eval "$(conda shell.bash hook)" && \
    conda activate base && \
    pip install --requirement /tmp/pip_requirements.txt

Assuming this image has been published to a public repository on Docker Hub, you can configure an experiment, command, or notebook to use the image as follows:

environment:
  image: "my-user-name/my-repo-name:my-tag"

where my-user-name is your Docker Hub user, my-repo-name is the name of the Docker Hub repository, and my-tag is the image tag to use (e.g., latest).

If your image has been published to a private Docker Hub repository, you can also specify the credentials to use to access the repository:

environment:
  image: "my-user-name/my-repo-name:my-tag"
  registry_auth:
    username: my-user-name
    password: my-password

If your image has been published to a private Docker Registry, specify the registry path as part of the image field:

environment:
  image: "myregistry.local:5000/my-user-name/my-repo-name:my-tag"

Images will be fetched via HTTPS by default. An HTTPS proxy can be configured using the https_proxy field as part of the agent configuration.

Your custom image and credentials can also be set as the defaults for all tasks launched in Determined. This can be done under image and registry_auth in the Master Configuration. Please note that for this to take effect you will have to restart the master.