Organizing Models in the Model Registry¶
Determined includes built-in support for a model registry, which makes it easy to organize trained models and their respective versions. Common use-cases for the model registry include:
Grouping related checkpoints together, including checkpoints across experiments.
Storing metadata about a model that is specific to your problem or organization. Examples include references to production systems, dataset links, Git links, and metrics calculated outside of Determined.
Retrieving the latest version of a model for downstream tasks like serving or batch inference.
The model registry contains a set of models. Each model has a unique name and zero or more model versions. A model version consists of a version number and a checkpoint, which represents the state of a trained model.
The model registry is designed to be flexible, and the best way to use it depends on your organization’s requirements and workflow. For example, one approach is to define a model for each high-level task you want to use machine learning for (e.g., “object-detection”, “sentiment-analysis”, etc.). Then each version of this model would correspond to a new approach to solving that task. Note that different versions of a model might come from different experiments, use different network architectures, or even use different deep learning frameworks. Another approach would be to register a model named “FasterRCNN”, and ensure that each version of the model uses that network architecture.
Managing Models¶
A model has a unique name, an optional description, user-defined metadata, and zero or more model versions. A model’s metadata can contain arbitrary information about the model. The following is an example JSON representation of a model for illustration.
{
"mnist_cnn": {
"description": "a character recognition model",
"metadata": {
"dataset_url": "http://yann.lecun.com/exdb/mnist/",
"git_url": "http://github.com/user/repo"
},
"versions": []
}
}
Registering Models¶
A model can be added to the registry via the Python API, REST API, or CLI. This guide will cover the Python and CLI methods. For information on the REST API, see the Swagger API documentation.
The following example demonstrates how to add a new model to the registry;
create_model()
returns an instance of the
Model
class. The new model will not have any
versions (model checkpoints) associated with it; adding versions to a model is
described below.
from determined.experimental import Determined
model = Determined().create_model(
"model_name",
description="optional description",
metadata={"optional": "JSON serializable dictionary"}
)
Similarly, you can create a model from the CLI using the following command.
det model create <model_name>
Querying Models¶
The following example returns models registered in Determined as a list of
Model
objects. Models can be sorted by name,
description, creation time, and last updated time. Additionally, models can be
filtered by name or description via the Python API. For sorting and ordering
options, see ModelSortBy
and
ModelOrderBy
respectively.
from determined.experimental import Determined, ModelOrderBy
d = Determined()
all_models = d.get_models()
chronological_sort = d.get_models(sort_by=ModelSortBy.CREATION_TIME)
# Find all models with "mnist" in their name. Some possible model names
# are "mnist_pytorch", "mnist_cnn", "mnist", etc.
mnist_models = d.get_models(name="mnist")
# Find all models whose description contains "ocr".
ocr_models = d.get_models(description="ocr")
Similarly, you can list models from the CLI using the following command.
det model list --sort-by={name,description,creation_time,last_updated_time} --order-by={asc,desc}
The following snippet queries for a single model by name.
from determined.experimental import Determined
model = Determined().get_model("model_name")
The CLI equivalent is below. The describe
command will print information
about the latest version of the model by default as well.
det model describe <model_name>
Modifying Model Metadata¶
Currently, model metadata can only be edited via the Python API. The following example demonstrates how to use this API.
from determined.experimental import Determined
model = Determined().get_model("model_name")
# Metadata is merged with existing metadata.
model.add_metadata({"key", "value"})
model.add_metadata({"metrics": {"test_set_loss": 0.091}})
# Result: {"key": "value", "metrics": {"test_set_loss": 0.091}}.
# Only top-level keys are merged. The following statement will replace the
# previous value of the "metrics" key.
model.add_metadata({"metrics": {"test_set_acc": 0.97}})
# Result: {"key": "value", "metrics": {"test_set_acc": 0.97}}.
model.remove_metadata(["key"])
# Result: {"metrics": {"test_set_acc": 0.97}}.
Managing Model Versions¶
Once a model has been added to the registry, you can add one or more checkpoints
to it. These registered checkpoints are known as model versions. Version
numbers are assigned by the registry; version numbers start at 1
and
increment each time a new model version is registered.
For illustration, this JSON document illustrates an example model with a single registered version.
{
"mnist_cnn": {
"description": "a character recognition model",
"metadata": {
"dataset_url": "http://yann.lecun.com/exdb/mnist/",
"git_url": "http://github.com/user/repo"
},
"versions": [
{
"version_number": 1,
"checkpoint": {
"uuid": "6a24d772-f1f7-4655-9061-22d582afd96c",
"experiment_config": { "...": "..." },
"experimentId": 1,
"trialId": 1,
"hparams": { "...": "..." },
"batchNumber": 100,
"resources": { "...": "..." },
"metadata": {},
"framework": "tensorflow-1.14.0",
"format": "h5",
"metrics": { "...": "..." }
}
}
]
}
}
Creating Versions¶
The following snippet registers a new version of a model.
register_version()
returns an updated
Checkpoint
object representing the new model
version.
from determined.experimental import Determined
d = Determined()
checkpoint = d.get_experiment(exp_id).top_checkpoint()
model = d.get_model("model_name")
model_version = model.register_version(checkpoint.uuid)
Similarly, a new model version can be registered using the CLI as follows:
det model register <model_name> <checkpoint_uuid>
Accessing Versions¶
The example below demonstrates how to retrieve versions of a model from the
registry. If no version number is specified, the most recent version of the
model is returned. get_version()
returns
an instance of Checkpoint
; as shown in the
example, this makes it easy to perform common operations like downloading the
checkpoint to local storage or loading the trained model into memory.
from determined.experimental import Determined
model = Determined().get_model("model_name")
specific_version = model.get_version(3)
latest_version = model.get_version()
# Depending on the framework used to create the checkpoint, calling
# load() on a model version (checkpoint) will return either a TensorFlow or
# PyTorch object representing the trained model.
tf_or_torch_model = latest_version.load()
The following example lists all the versions of a model. By default, model versions are returned in descending order such that the most recent versions are returned first.
from determined.experimental import Determined
model = Determined().get_model("model_name")
model_versions = model.get_versions()
The CLI equivalent is as follows:
det model list-versions <model_name>
Next Steps¶
determined.experimental: The reference documentation for this API.