Environment Configuration¶
Determined launches workloads using Docker containers. The container configuration is referred to as the environment.
There are three methods to customize the environment that workloads execute in:
Environment variables
Specifying a startup hook (
startup-hook.sh
)Using a custom Docker image
Environment Variables¶
For both trial runners and commands, Determined allows users to
configure the environment variables inside the container through the
environment.environment_variables
configuration field of the
experiment config. The format is a list of strings in the format
NAME=VALUE
:
environment:
environment_variables:
- A=hello world
- B=$A
- C=${B}
# `A`, `B`, and `C` will each have the value `hello_world` in the container.
Variables are set sequentially, which affect variables that depend on the expansion of other variables.
Proxy variables set in this way will take precedent over those set using the agent configuration.
Startup Hooks¶
If a file named startup-hook.sh
exists at the top level of your
model definition directory, Determined will automatically execute this
file during the startup of every Docker container. This occurs before
any Python interpreters are launched or any deep learning operations are
performed; this allows the startup hook to customize the container
environment, install additional dependencies, download data sets, or do
practically anything else that you can do in a shell script.
Note
Startup hooks are not cached and run before the start of every workload. Hence, performing expensive or long-running operations in a startup hook can result in poor performance.
Here is an example of a startup hook that installs the wget
utility
and the Python package pandas
:
apt-get update && apt-get install -y wget
python3.6 -m pip install pandas
The Iris example
contains a
TensorFlow Keras model that uses a startup hook to install an additional
Python dependency.
Container Images¶
Determined provides a set of officially supported Docker images. These are the default images used to launch containers for experiments, commands, and any other workflow in the Determined system.
Default Images¶
In the current version of Determined, the experiments and commands are executed in containers with the following:
Ubuntu 18.04
CUDA 10.0
Python 3.6.9
TensorFlow 1.15.0
PyTorch 1.4.0
Determined will automatically select GPU-specific versions of each library when running on agents with GPUs.
In addition to the above settings, all trial runner containers are launched with additional Determined-specific harness code that orchestrates model training and evaluation in the container. Trial runner containers are also loaded with the experiment’s model definition and values of the hyperparameters for the current trial.
Note
The default images are
determinedai/environments:cuda-10.0-pytorch-1.4-tf-1.15-gpu-0.7.0
and
determinedai/environments:py-3.6.9-pytorch-1.4-tf-1.15-cpu-0.7.0
for GPU and CPU respectively.
TensorFlow 2 Images¶
Determined also supports TensorFlow 2.2 and has a Docker image you can use for experiments and commands containing the following:
Ubuntu 18.04
CUDA 10.1
Python 3.6.9
TensorFlow 2.2.0
PyTorch 1.4.0
This can be configured in your experiment configuration like below:
environment:
image:
gpu: "determinedai/environments:cuda-10.1-pytorch-1.4-tf-2.2-gpu-0.7.0"
cpu: "determinedai/environments:py-3.6.9-pytorch-1.4-tf-2.2-cpu-0.7.0"
Custom Images¶
While the official images contain all the dependencies needed for basic
deep learning workloads, many workloads have extra dependencies. If
those extra dependencies are quick to install, you may want to consider
using a startup hook. For situations where
installing dependencies via startup-hook.sh
would take too long, we
suggest building your own Docker image and publishing to a Docker
registry like Docker Hub. We recommend
that custom images use one of the official Determined images as a base
image (using the FROM
instruction).
Warning
It is important to not install the TensorFlow, PyTorch, Horovod, or Apex packages as doing so will conflict with the base packages that are installed into Determined’s official environments.
Here is an example of a Dockerfile
that installs both conda
- and
pip
-based dependencies.
FROM determinedai/environments:cuda-10.0-pytorch-1.4-tf-1.15-gpu-0.7.0
RUN apt-get update && apt-get install -y unzip python-opencv graphviz
COPY environment.yml /tmp/environment.yml
COPY pip_requirements.txt /tmp/pip_requirements.txt
RUN conda env update --name base --file /tmp/environment.yml && \
conda clean --all --force-pkgs-dirs --yes
RUN eval "$(conda shell.bash hook)" && \
conda activate base && \
pip install --requirement /tmp/pip_requirements.txt
Assuming this image has been published to a public repository on Docker Hub, you can configure an experiment, command, or notebook to use the image as follows:
environment:
image: "my-user-name/my-repo-name:my-tag"
where my-user-name
is your Docker Hub user, my-repo-name
is the
name of the Docker Hub repository, and my-tag
is the image tag to
use (e.g., latest
).
If your image has been published to a private Docker Hub repository, you can also specify the credentials to use to access the repository:
environment:
image: "my-user-name/my-repo-name:my-tag"
registry_auth:
username: my-user-name
password: my-password
If your image has been published to a private Docker Registry, specify the registry path as
part of the image
field:
environment:
image: "myregistry.local:5000/my-user-name/my-repo-name:my-tag"
Images will be fetched via HTTPS by default. An HTTPS proxy can be
configured using the https_proxy
field as part of the agent
configuration.
Your custom image and credentials can also be set as the defaults for
all tasks launched in Determined. This can be done under image
and
registry_auth
in the Master Configuration. Please note that
for this to take effect you will have to restart the master.