Organizing Models in the Model Registry

Determined includes built-in support for a model registry, which makes it easy to organize trained models and their respective versions. Common use-cases for the model registry include:

  • Grouping related checkpoints together, including checkpoints across experiments.

  • Storing metadata about a model that is specific to your problem or organization. Examples include references to production systems, dataset links, Git links, and metrics calculated outside of Determined.

  • Retrieving the latest version of a model for downstream tasks like serving or batch inference.

The model registry contains a set of models. Each model has a unique name and zero or more model versions. A model version consists of a version number and a checkpoint, which represents the state of a trained model.

The model registry is designed to be flexible, and the best way to use it depends on your organization’s requirements and workflow. For example, one approach is to define a model for each high-level task you want to use machine learning for (e.g., “object-detection”, “sentiment-analysis”, etc.). Then each version of this model would correspond to a new approach to solving that task. Note that different versions of a model might come from different experiments, use different network architectures, or even use different deep learning frameworks. Another approach would be to register a model named “FasterRCNN”, and ensure that each version of the model uses that network architecture.

Managing Models

A model has a unique name, an optional description, user-defined metadata, and zero or more model versions. A model’s metadata can contain arbitrary information about the model. The following is an example JSON representation of a model for illustration.

{
  "mnist_cnn": {
    "description": "a character recognition model",
    "metadata": {
      "dataset_url": "http://yann.lecun.com/exdb/mnist/",
      "git_url": "http://github.com/user/repo"
    },
    "versions": []
  }
}

Registering Models

A model can be added to the registry via the Python API, REST API, or CLI. This guide will cover the Python and CLI methods. For information on the REST API, see the Swagger API documentation.

The following example demonstrates how to add a new model to the registry; create_model() returns an instance of the Model class. The new model will not have any versions (model checkpoints) associated with it; adding versions to a model is described below.

from determined.experimental import Determined

model = Determined().create_model(
    "model_name",
    description="optional description",
    metadata={"optional": "JSON serializable dictionary"},
)

Similarly, you can create a model from the CLI using the following command.

det model create <model_name>

Querying Models

The following example returns models registered in Determined as a list of Model objects. Models can be sorted by name, description, creation time, and last updated time. Additionally, models can be filtered by name or description via the Python API. For sorting and ordering options, see ModelSortBy and ModelOrderBy respectively.

from determined.experimental import Determined, ModelOrderBy

d = Determined()

all_models = d.get_models()

chronological_sort = d.get_models(sort_by=ModelSortBy.CREATION_TIME)

# Find all models with "mnist" in their name. Some possible model names
# are "mnist_pytorch", "mnist_cnn", "mnist", etc.
mnist_models = d.get_models(name="mnist")

# Find all models whose description contains "ocr".
ocr_models = d.get_models(description="ocr")

Similarly, you can list models from the CLI using the following command.

det model list --sort-by={name,description,creation_time,last_updated_time} --order-by={asc,desc}

The following snippet queries for a single model by name.

from determined.experimental import Determined

model = Determined().get_model("model_name")

The CLI equivalent is below. The describe command will print information about the latest version of the model by default as well.

det model describe <model_name>

Modifying Model Metadata

Currently, model metadata can only be edited via the Python API. The following example demonstrates how to use this API.

from determined.experimental import Determined

model = Determined().get_model("model_name")

# Metadata is merged with existing metadata.
model.add_metadata({"key", "value"})
model.add_metadata({"metrics": {"test_set_loss": 0.091}})

# Result: {"key": "value", "metrics": {"test_set_loss": 0.091}}.

# Only top-level keys are merged. The following statement will replace the
# previous value of the "metrics" key.
model.add_metadata({"metrics": {"test_set_acc": 0.97}})

# Result: {"key": "value", "metrics": {"test_set_acc": 0.97}}.

model.remove_metadata(["key"])

# Result: {"metrics": {"test_set_acc": 0.97}}.

Managing Model Versions

Once a model has been added to the registry, you can add one or more checkpoints to it. These registered checkpoints are known as model versions. Version numbers are assigned by the registry; version numbers start at 1 and increment each time a new model version is registered.

For illustration, this JSON document illustrates an example model with a single registered version.

{
  "mnist_cnn": {
    "description": "a character recognition model",
    "metadata": {
      "dataset_url": "http://yann.lecun.com/exdb/mnist/",
      "git_url": "http://github.com/user/repo"
    },
    "versions": [
      {
        "version_number": 1,
        "checkpoint": {
          "uuid": "6a24d772-f1f7-4655-9061-22d582afd96c",
          "experiment_config": { "...": "..." },
          "experimentId": 1,
          "trialId": 1,
          "hparams": { "...": "..." },
          "batchNumber": 100,
          "resources": { "...": "..." },
          "metadata": {},
          "framework": "tensorflow-1.14.0",
          "format": "h5",
          "metrics": { "...": "..." }
        }
      }
    ]
  }
}

Creating Versions

The following snippet registers a new version of a model. register_version() returns an updated Checkpoint object representing the new model version.

from determined.experimental import Determined

d = Determined()

checkpoint = d.get_experiment(exp_id).top_checkpoint()

model = d.get_model("model_name")

model_version = model.register_version(checkpoint.uuid)

Similarly, a new model version can be registered using the CLI as follows:

det model register-version <model_name> <checkpoint_uuid>

Accessing Versions

The example below demonstrates how to retrieve versions of a model from the registry. If no version number is specified, the most recent version of the model is returned. get_version() returns an instance of Checkpoint; as shown in the example, this makes it easy to perform common operations like downloading the checkpoint to local storage or loading the trained model into memory.

from determined.experimental import Determined

model = Determined().get_model("model_name")

specific_version = model.get_version(3)
latest_version = model.get_version()

# Depending on the framework used to create the checkpoint, calling
# load() on a model version (checkpoint) will return either a TensorFlow or
# PyTorch object representing the trained model.
tf_or_torch_model = latest_version.load()

The following example lists all the versions of a model. By default, model versions are returned in descending order such that the most recent versions are returned first.

from determined.experimental import Determined

model = Determined().get_model("model_name")

model_versions = model.get_versions()

The CLI equivalent is as follows:

det model list-versions <model_name>

Next Steps