Organizing Models in the Model Registry¶
Determined includes built-in support for a model registry, which makes it easy to organize trained models and their respective versions. Common use-cases for the model registry include:
Grouping related checkpoints together, including checkpoints across experiments.
Storing metadata about a model that is specific to your problem or organization. Examples include references to production systems, dataset links, Git links, and metrics calculated outside of Determined.
Retrieving the latest version of a model for downstream tasks like serving or batch inference.
The model registry contains a set of models. Each model has a unique name and zero or more model versions. A model version consists of a version number and a checkpoint, which represents the state of a trained model.
The model registry is designed to be flexible, and the best way to use it depends on your organization’s requirements and workflow. For example, one approach is to define a model for each high-level task you want to use machine learning for (e.g., “object-detection”, “sentiment-analysis”, etc.). Then each version of this model would correspond to a new approach to solving that task. Note that different versions of a model might come from different experiments, use different network architectures, or even use different deep learning frameworks. Another approach would be to register a model named “FasterRCNN”, and ensure that each version of the model uses that network architecture.
Managing Models¶
A model has a unique name, an optional description, user-defined metadata, and zero or more model versions. A model’s metadata can contain arbitrary information about the model. The following is an example JSON representation of a model for illustration.
{
"mnist_cnn": {
"description": "a character recognition model",
"metadata": {
"dataset_url": "http://yann.lecun.com/exdb/mnist/",
"git_url": "http://github.com/user/repo"
},
"versions": []
}
}
Registering Models¶
A model can be added to the registry via the Python API, REST API, or CLI. This guide will cover the Python and CLI methods. For information on the REST API, see the Swagger API documentation.
The following example demonstrates how to add a new model to the
registry; create_model()
returns an instance of the Model
class. The new model will not have any versions (model checkpoints)
associated with it; adding versions to a model is described below.
from determined.experimental import Determined
model = Determined().create_model(
"model_name",
description="optional description",
metadata={"optional": "JSON serializable dictionary"}
)
Similarly, you can create a model from the CLI using the following command.
det model create <model_name>
Querying Models¶
The following example returns models registered in Determined as a list
of Model
objects. Models can be sorted
by name, description, creation time, and last updated time.
Additionally, models can be filtered by name or description via the
Python API. For sorting and ordering options, see
ModelSortBy
and
ModelOrderBy
respectively.
from determined.experimental import Determined, ModelOrderBy
d = Determined()
all_models = d.get_models()
chronological_sort = d.get_models(sort_by=ModelSortBy.CREATION_TIME)
# Find all models with "mnist" in their name. Some possible model names
# are "mnist_pytorch", "mnist_cnn", "mnist", etc.
mnist_models = d.get_models(name="mnist")
# Find all models whose description contains "ocr".
ocr_models = d.get_models(description="ocr")
Similarly, you can list models from the CLI using the following command.
det model list --sort-by={name,description,creation_time,last_updated_time} --order-by={asc,desc}
The following snippet queries for a single model by name.
from determined.experimental import Determined
model = Determined().get_model("model_name")
The CLI equivalent is below. The describe
command will print
information about the latest version of the model by default as well.
det model describe <model_name>
Modifying Model Metadata¶
Currently, model metadata can only be edited via the Python API. The following example demonstrates how to use this API.
from determined.experimental import Determined
model = Determined().get_model("model_name")
# Metadata is merged with existing metadata.
model.add_metadata({"key", "value"})
model.add_metadata({"metrics": {"test_set_loss": 0.091}})
# Result: {"key": "value", "metrics": {"test_set_loss": 0.091}}.
# Only top-level keys are merged. The following statement will replace the
# previous value of the "metrics" key.
model.add_metadata({"metrics": {"test_set_acc": 0.97}})
# Result: {"key": "value", "metrics": {"test_set_acc": 0.97}}.
model.remove_metadata(["key"])
# Result: {"metrics": {"test_set_acc": 0.97}}.
Managing Model Versions¶
Once a model has been added to the registry, you can add one or more
checkpoints to it. These registered checkpoints are known as model
versions. Version numbers are assigned by the registry; version numbers
start at 1
and increment each time a new model version is
registered.
For illustration, this JSON document illustrates an example model with a single registered version.
{
"mnist_cnn": {
"description": "a character recognition model",
"metadata": {
"dataset_url": "http://yann.lecun.com/exdb/mnist/",
"git_url": "http://github.com/user/repo"
},
"versions": [
{
"version_number": 1,
"checkpoint": {
"uuid": "6a24d772-f1f7-4655-9061-22d582afd96c",
"experiment_config": { "...": "..." },
"experimentId": 1,
"trialId": 1,
"hparams": { "...": "..." },
"batchNumber": 100,
"resources": { "...": "..." },
"metadata": {},
"framework": "tensorflow-1.14.0",
"format": "h5",
"metrics": { "...": "..." }
}
}
]
}
}
Creating Versions¶
The following snippet registers a new version of a model.
register_version()
returns an
updated Checkpoint
object representing
the new model version.
from determined.experimental import Determined
d = Determined()
checkpoint = d.get_experiment(exp_id).top_checkpoint()
model = d.get_model("model_name")
model_version = model.register_version(checkpoint.uuid)
Similarly, a new model version can be registered using the CLI as follows:
det model register <model_name> <checkpoint_uuid>
Accessing Versions¶
The example below demonstrates how to retrieve versions of a model from
the registry. If no version number is specified, the most recent version
of the model is returned.
get_version()
returns an instance
of Checkpoint
; as shown in the
example, this makes it easy to perform common operations like
downloading the checkpoint to local storage or loading the trained model
into memory.
from determined.experimental import Determined
model = Determined().get_model("model_name")
specific_version = model.get_version(3)
latest_version = model.get_version()
# Depending on the framework used to create the checkpoint, calling
# load() on a model version (checkpoint) will return either a TensorFlow or
# PyTorch object representing the trained model.
tf_or_torch_model = latest_version.load()
The following example lists all the versions of a model. By default, model versions are returned in descending order such that the most recent versions are returned first.
from determined.experimental import Determined
model = Determined().get_model("model_name")
model_versions = model.get_versions()
The CLI equivalent is as follows:
det model list-versions <model_name>
Next Steps¶
determined.experimental: The reference documentation for this API.