The Native API allows developers to seamlessly move between training in
a local development environment and training at cluster-scale on a
Determined cluster. It also provides an interface to train
tf.estimator models using idiomatic framework patterns, reducing
(or eliminating) the effort to port model code for use with Determined.
In this guide, we’ll cover what happens under the hood when a user submits an experiment to Determined using the Native API. This topic guide is meant as deep-dive for advanced users. For a tutorial on getting started with the Native API, see the Native API Tutorial.
Create an Experiment via
A user can submit an experiment from Python by executing a Python script that conforms to the Native API:
context = det.experimental.keras.init(config, context_dir=".") model = ... model = context.wrap_model(model) model.compile(...) model.fit(...)
init() APIs require a context directory (
argument. The context directory specifies the root directory of the code
containing the native implementation – for most users, this is the
current working directory (
init() also accepts two boolean
local=Falsewill submit the experiment to a Determined cluster.
local=Truewill execute the training loop in your local Python environment (although currently, local training is not implemented, so you must also set
test=True). Defaults to False.
test=Truewill execute a minimal training loop rather than a full experiment. This can be useful for porting or debugging a model because many common errors will surface quickly. Defaults to False.
Determined requires that the Native Python script contains a high-level training loop that it can intercept. Currently, the following training loop functions are supported:
To learn more about the
init() APIs, see:
Life of a Native API Experiment¶
The diagram above demonstrates the flow of execution when an experiment is created in cluster mode.
The Python script will first be executed in the User Environment, such
as a Python virtualenv or a Jupyter notebook where the
Python package is installed. When the code is executed,
create an experiment by submitting the contents of the context directory
and the experiment configuration to the
Determined master. Note that any code that comes after the
call (typically involving defining the model and training loop) is not
executed in the User Environment. In the case of
building and compiling the model is only done on the Determined cluster.
The Determined master will schedule the new workload on an agent
machine. The agent will launch a task container and initialize a Trial
Runner Environment; in this environment, the user script will be
re-executed from the beginning using runpy.run_path. In the
trial runner environment,
init will return a
determined.NativeContext object that holds information
specific to that trial (e.g. hyperparameter choices) and continue
executing. Once the script hits the training loop function (in this
tf.keras.Models.fit), Determined will launch into the managed
Any user code that occurs after the training loop has started will never be executed!