Note
Click here to download the full example code
Native API: Basics¶
First, lets consider what it looks like to train a very simple model on MNIST
using tf.keras
, taken directly from TensorFlow documentation.
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential(
[
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation="softmax"),
]
)
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=1)
Here is what it looks like to train the exact same model using the Native API to launch an experiment on a Determined cluster.
import determined as det
from determined import experimental
from determined.experimental.keras import init
config = {
"searcher": {"name": "single", "metric": "val_acc", "max_steps": 5},
"hyperparameters": {"global_batch_size": 32},
}
# When running from this code from a notebook, add a `command` argument to
# init() specifying the notebook file name.
context = init(config, mode=experimental.Mode.CLUSTER, context_dir=".")
model = tf.keras.models.Sequential(
[
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation="softmax"),
]
)
model = context.wrap_model(model)
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=5)
Paste the code above into a Python file tf_keras_native.py and run it as a Python script.
Note
Before submitting any experiments using the Native API, make sure the DET_MASTER environment variable is configured to connect to the appropriate IP address.
$ python tf_keras_native.py
You can also use any environment that supports Python to launch an experiment with this code, such as a Jupyter notebook or an IDE.
Let’s walk through some of the concepts introduced by the Native API.
Configuration¶
config = {
"searcher": {"name": "single", "metric": "val_acc", "max_steps": 5},
"hyperparameters": {"global_batch_size": 16},
}
Configuring any experiment for use with Determined cluster requires an Experiment Configuration. In the Native API, this represented as a Python dictionary. There are two required fields are required for every configuration submitted via the Native API:
searcher
:This field describes how many different Trials (models) are going to be trained. In this case, we’ve specified to train a
"single"
model for five training steps.hyperparameters
:This field describes the hyperparameters used.
global_batch_size
is a required hyperparameter for every experiment – we’ll revisit this requirement in Native API: Distributed Training.
Context¶
context = init(config, mode=experimental.Mode.CLUSTER, context_dir=".")
determined.experimental.keras.init() is the function that initializes the Determined training context. We can think of it as the moment in the training script where Determined will “assume control” of the execution of your code. It has two required arguments in addition to the configuration:
mode
(determined.experimenal.Mode
):determined.experimenal.Mode.CLUSTER
will submit the experiment to a Determined cluster.determined.experimenal.Mode.LOCAL
will execute a minimal training loop in your local Python environment.context_dir
(str
):Specifies the location of the code you want submitted to the cluster. This is required by Determined to execute your training script in a remote environment. In the common case, “.” submits your entire working directory to the Determined cluster.
Wrap Model (tf.keras
only)¶
model = context.wrap_model(model)
In the case of tf.keras
, we will need to use the wrap_model
API to
make the Determined context aware of the model we want to train with. After
calling wrap_model
, we proceed with the compile()
and fit()
interfaces defined by TensorFlow to begin training our model remotely.
Next Steps: