TensorBoard

TensorBoard is a widely used tool for visualizing and inspecting deep learning models. PEDL makes it easy to use TensorBoard to examine a single PEDL experiment or to compare multiple experiments.

TensorBoard instances can be launched via the WebUI or the CLI. To launch TensorBoard instances from the CLI, first install the PEDL CLI on your development machine.

Analyzing Experiments

To launch TensorBoard to analyze a single PEDL experiment, use pedl tensorboard start <experiment-id>:

$ pedl tensorboard start 7
Scheduling TensorBoard (rarely-cute-man) (id: aab49ba5-3357-4145-861c-7e6ff2d702c5)...
TensorBoard (rarely-cute-man) was assigned to an agent...
Scheduling tensorboard tensorboard (id: c68c9fc9-7eed-475b-a50f-fd78406d7c83)...
TensorBoard is running at: http://localhost:8080/proxy/c68c9fc9-7eed-475b-a50f-fd78406d7c83-tensorboard-0/
disconnecting websocket

The PEDL master will schedule a TensorBoard instance in the cluster. The PEDL CLI will wait until the TensorBoard instance is running. Then it will open the TensorBoard web interface in a local browser window.

You may also access scheduled and running TensorBoard instances by executing the following command

$ pedl tensorboard list
 Id                                   | Owner   | Description                         | State      | Experiment Id   | Trial Ids   | Exit Status
--------------------------------------+---------+-------------------------------------+------------+-----------------+-------------+---------------------------------
 aab49ba5-3357-4145-861c-7e6ff2d702c5 | pedl    | TensorBoard (rarely-cute-man)       | RUNNING    | 7               | N/A         | N/A

TensorBoard can also be used to analyze multiple PEDL experiments. To launch TensorBoard for multiple experiments use pedl tensorboard start <experiment-id> <experiment-id> ....

Note

Initially, TensorBoard may not contain metrics when the browser window opens. Data will be available after a trial step is completed. TensorBoard pull metrics from persistent storage. It may take up to 5 minutes for TensorBoard to receive data and render visualizations.

Analyzing Specific Trials

PEDL also supports using TensorBoard to analyze specific trials from one or more PEDL experiments. This can be useful if an experiment has many trials but you would like to only compare a small number of them. This capability can also be used to compare trials from different experiments.

To launch TensorBoard to analyze specific trials, use pedl tensorboard start --trial-ids <trial_id 1> <trial_id 2> ....

Data in TensorBoard

In this section, we summarize how PEDL captures data from TensorFlow models. For a more in depth discussion of how TensorBoard visualizes data see the TensorBoard documentation.

TensorBoard visualizes data captured during a TensorFlow run. Data is captured in tfevent files by writing TensorFlow summary operations to disk via a tf.summary.FileWriter. We provide support in each deep learning framework to write metrics as tfevent files and upload them. See below for details on how to configure PEDL with TensorBoard for your desired framework.

FileWriters are configured to write log files, called tfevent files, to a directory known as the logdir. TensorBoard watches this directory for changes and updates accordingly. The PEDL supported logdir is /tmp/tensorboard. All tfevent files written to /tmp/tensorboard via a trial are uploaded to persistent storage when a trial is configured with PEDL TensorBoard support.

PEDL Batch Metrics

At the end of every PEDL step, batch metrics are collected and stored in the database. This will give a granular view of model metrics over time. Batch metrics will appear in TensorBoard under the PEDL group. The x-axis of the plots correspond to the batch number. For example, a point at step 5 of the plot is the metric associated with the fifth batch seen.

Framework-specific Configuration

The following examples demonstrate how to configure TensorBoard for each framework.

TensorFlow Keras

To add TensorBoard support for a TFKerasTrial simply add the pedl.frameworks.tensorflow.TFKerasTensorBoard callback to your trial:

from pedl.frameworks.tensorflow import TFKerasTensorBoard

class MyModel(TFKerasTrial):
    ...
    def keras_callbacks(self, hparams):
        return [TFKerasTensorBoard()]

Keras Simple Trial

To add TensorBoard support for Keras Simple Trial simply add the pedl.frameworks.tensorflow.KerasTensorBoard callback to your trial:

from pedl.frameworks.tensorflow import KerasTensorBoard

model = ...

model.fit(..., callbacks=[KerasTensorBoard()])

Note

The logdir argument to TFKerasTensorBoard and KerasTensorBoard is fixed to /tmp/tensorboard. If logdir is passed, the value will be ignored.

Estimator

There is no configuration necessary for trials using the EstimatorTrial class.

Tensorpack

To add TensorBoard support for TensorpackTrial simply add the pedl.frameworks.tensorflow.TFEventWriter callback to your trial:

from pedl.frameworks.tensorflow import TFEventWriter
from pedl.frameworks.tensorflow.tensorpack_trial import TensorpackTrial

class MyModel(TensorpackTrial):
    ...
    def tensorpack_monitors(self, hparams):
        return [TFEventWriter()]

Lifecycle Management

Once a new TensorBoard has been scheduled onto the cluster, it will remain running until you explicitly terminate it. This can be done with pedl tensorboard kill <tensorboard-id>:

$ pedl tensorboard kill aab49ba5-3357-4145-861c-7e6ff2d702c5

To open a web browser window connected to a previously launched TensorBoard instance, use pedl tensorboard open. To view the logs of an existing TensorBoard instance, use pedl tensorboard logs.

Implementation Details

PEDL schedules TensorBoard instances in containers that run on agent machines. The PEDL master will proxy HTTP requests to and from the TensorBoard container. Although TensorBoard instances are hosted on agent machines, they do not occupy GPUs.