Report MetricsΒΆ

The Core API makes it easy to report training and validation metrics to the master during training with only a few new lines of code.

  1. For this example, create a new training script called by copying the script from Getting Started.

  2. Begin by importing import the determined module:

    import determined as det
  3. Enable logging, using the det.LOG_FORMAT for logs. This enables useful log messages from the determined library, and det.LOG_FORMAT enables filter-by-level in the WebUI.

        logging.basicConfig(level=logging.DEBUG, format=det.LOG_FORMAT)
        # Log at different levels to demonstrate filter-by-level in the WebUI.
        logging.debug("debug-level message")"info-level message")
        logging.warning("warning-level message")
        logging.error("error-level message")
  4. In your if __name__ == "__main__" block, wrap the entire execution of main() within the scope of determined.core.init(), which prepares resources for training and cleans them up afterward. Add the core_context as a new argument to main() because the Core API is accessed through the core_context object.

        with det.core.init() as core_context:
            main(core_context=core_context, increment_by=1)
  5. Within main(), add two calls: (1) report training metrics periodically during training and (2) report validation metrics every time a validation runs.

    def main(core_context, increment_by):
        x = 0
        for batch in range(100):
            x += increment_by
            steps_completed = batch + 1
  "x is now {x}")
            # NEW: report training metrics.
            if steps_completed % 10 == 0:
                    steps_completed=steps_completed, metrics={"x": x}
        # NEW: report a "validation" metric at the end.
            steps_completed=steps_completed, metrics={"x": x}

    The report_validation_metrics() call typically happens after the validation step, however, actual validation is not demonstrated by this example.

  6. Create a 1_metrics.yaml file with an entrypoint invoking the new file. You can copy the 0_start.yaml configuration file and change the first couple of lines:

    name: core-api-stage-1
    entrypoint: python3
  7. Run the code using the command:

    det e create 1_metrics.yaml . -f
  8. You can now navigate to the new experiment in the WebUI and view the plot populated with the training and validation metrics.

The complete and 1_metrics.yaml listings used in this example can be found in the core_api.tgz download or in the Github repository.