Report MetricsΒΆ

The Core API makes it easy to report training and validation metrics to the master during training with only a few new lines of code.

  1. For this example, create a new training script called 1_metrics.py by copying the 0_start.py script from Getting Started.

  2. Begin by importing import the determined module:

    import determined as det
    
    
  3. Enable logging, using the det.LOG_FORMAT for logs. Enabling logging enables useful log messages from the determined library, and det.LOG_FORMAT enables filter-by-level in the WebUI.

        logging.basicConfig(level=logging.DEBUG, format=det.LOG_FORMAT)
        # Log at different levels to demonstrate filter-by-level in the WebUI.
        logging.debug("debug-level message")
        logging.info("info-level message")
        logging.warning("warning-level message")
        logging.error("error-level message")
    
  4. In your if __name__ == "__main__" block, wrap the entire execution of main() within the scope of determined.core.init(), which prepares resources for training and cleans them up afterward. Add the core_context as a new argument to main() because the Core API is accessed through the core_context object.

        with det.core.init() as core_context:
            main(core_context=core_context, increment_by=1)
    
  5. Within main(), add two calls. One to report training metrics, which is called periodically during training, and one to report validation metrics, which is called every time a validation runs.

    def main(core_context, increment_by):
        x = 0
        for batch in range(100):
            x += increment_by
            steps_completed = batch + 1
            time.sleep(.1)
            logging.info(f"x is now {x}")
            # NEW: report training metrics.
            if steps_completed % 10 == 0:
                core_context.train.report_training_metrics(
                    steps_completed=steps_completed, metrics={"x": x}
                )
        # NEW: report a "validation" metric at the end.
        core_context.train.report_validation_metrics(
            steps_completed=steps_completed, metrics={"x": x}
        )
    

    The report_validation_metrics() call typically happens after the validation step, however, actual validation is not demonstrated by this example.

  6. Create a 1_metrics.yaml file with an entrypoint invoking the new 1_metrics.py file. You can copy the 0_start.yaml configuration file and change first couple of lines:

    name: core-api-stage-1
    entrypoint: python3 1_metrics.py
    
  7. Run the code using the command:

    det e create 1_metrics.yaml . -f
    
  8. You can now navigate to the new experiment in the WebUI and view the plot populated with training and validation metrics.

The complete 1_metrics.py and 1_metrics.yaml listings used in this example can be found in the core_api.tgz download or in the Github repository.