Skip to content

Quick Start Chapter 3: PEDL Commands

A command is a task that is executed in a containerized environment on a PEDL cluster. Any code, binaries, or scripts you can run on your local machine can be executed on a PEDL cluster by prefixing the original command with pedl cmd run.

Commands are a great way to run workflows that may not easily fit into the standard PEDL experiment workflow described above, while still getting the benefits of PEDL features such as resource scheduling and dependency management.

To run your first PEDL command, try starting with the bash command echo:

$ pedl cmd run echo hello world
Created command civil-oryx (id: 14f88590-4373-4b25-9a4d-a0beea5ff40d)
Scheduling...
Command scheduled ✅
...
[PEDL] 2019-02-11T19:13:57.542148900Z hello
[PEDL] finished command 14f88590-4373-4b25-9a4d-a0beea5ff40d: task exited successfully with a zero exit code

If the command is a single quoted string, it is interpreted as an argument to sh -c. This allows stringing together a sequence of commands:

$ pedl cmd run "echo stage-1 && echo stage-2 && echo stage-3"
Created command clever-viper (id: 6a226ee9-2d86-4e24-a3df-369c64bb3685)
Scheduling...
Command scheduled ✅
...
[PEDL] 2019-02-11T19:14:44.257477100Z stage1
[PEDL] 2019-02-11T19:14:44.257548300Z stage2
[PEDL] 2019-02-11T19:14:44.257569900Z stage3
[PEDL] finished command 6a226ee9-2d86-4e24-a3df-369c64bb3685: task exited successfully with a zero exit code

File System Access

Commands are run as isolated containers on a PEDL agent node — as such, they do not have access to the local file system on the machine where pedl cmd run is executed. This behavior ensures that commands do not accidentally depend on aspects of an individual user's development environment. In order to make files on a local file system accessible to a PEDL command, you can specify a context directory. The content of the context directory is uploaded to the PEDL cluster and included in the command's container. For example, a Python script can be run as follows:

$ mkdir -p /tmp/context
$ echo "print('hello world')" > /tmp/context/hello.py
$ pedl cmd run --context /tmp/context python hello.py
Created command strong-cow (id: 8db7e0d5-a6bb-4d6a-bda8-281855ba40e2)
Scheduling...
Command scheduled ✅
...
[PEDL] 2019-01-16T22:32:44.360129700Z hello world
[PEDL] finished command 8db7e0d5-a6bb-4d6a-bda8-281855ba40e2: task exited successfully with a zero exit code

Any directory on your local file system can be used as the command context by specifying the --context flag. The context directory is also used as the command's initial working directory; this means that relative file paths can be used to access files in the context directory. By default, the context directory is empty and no files are added to the command container.

Warning

The maximum allowed size of the command context is 96 MB. It is recommended to restrict it to source code, configuration files, and small artifacts only. If a command needs access to larger resources (e.g., training sets), it is better to store that data on a distributed file system and bind-mount the file system into the command container.

Hint

A .pedlignore file at the top level of the command context directory can be used to exclude files. The .pedlignore file uses the same syntax as .gitignore.

Dependencies

One powerful PEDL feature is the ability to easily modify your desired dependencies:

$ pedl cmd run --config environment.tensorflow=1.10.0 python -c 'import tensorflow as tf; print(tf.__version__)'

More generally, commands may be supplied an optional command configuration to control how a command gets executed. In addition to the --config flag, configuration may also be supplied via a YAML file (--config-file):

$ cat > config.yaml << EOL
description: test-command
resources:
  slots: 1
environment:
  python: 3.6.9
  tensorflow: 1.10.0
  keras: 2.2.4
EOL
$ pedl cmd run --config-file config.yaml python -c 'import keras, tensorflow as tf; print(keras.__version__, tf.__version__)'

The configuration associated with each command is always stored in the PEDL database so that every command can be logged and reproduced in the future. Command configuration values are optional and given defaults if unspecified. See the Configuration section for full documentation of the configuration schema.