Report MetricsΒΆ
The Core API makes it easy to report training and validation metrics to the master during training with only a few new lines of code.
For this example, create a new training script called
1_metrics.py
by copying the0_start.py
script from Getting Started.Begin by importing import the
determined
module:import determined as det
Enable
logging
, using thedet.LOG_FORMAT
for logs. This enables useful log messages from thedetermined
library, anddet.LOG_FORMAT
enables filter-by-level in the WebUI.logging.basicConfig(level=logging.DEBUG, format=det.LOG_FORMAT) # Log at different levels to demonstrate filter-by-level in the WebUI. logging.debug("debug-level message") logging.info("info-level message") logging.warning("warning-level message") logging.error("error-level message")
In your
if __name__ == "__main__"
block, wrap the entire execution ofmain()
within the scope ofdetermined.core.init()
, which prepares resources for training and cleans them up afterward. Add thecore_context
as a new argument tomain()
because the Core API is accessed through thecore_context
object.with det.core.init() as core_context: main(core_context=core_context, increment_by=1)
Within
main()
, add two calls: (1) report training metrics periodically during training and (2) report validation metrics every time a validation runs.def main(core_context, increment_by): x = 0 for batch in range(100): x += increment_by steps_completed = batch + 1 time.sleep(0.1) logging.info(f"x is now {x}") # NEW: report training metrics. if steps_completed % 10 == 0: core_context.train.report_training_metrics( steps_completed=steps_completed, metrics={"x": x} ) # NEW: report a "validation" metric at the end. core_context.train.report_validation_metrics(steps_completed=steps_completed, metrics={"x": x})
The
report_validation_metrics()
call typically happens after the validation step, however, actual validation is not demonstrated by this example.Create a
1_metrics.yaml
file with anentrypoint
invoking the new1_metrics.py
file. You can copy the0_start.yaml
configuration file and change the first couple of lines:name: core-api-stage-1 entrypoint: python3 1_metrics.py
Run the code using the command:
det e create 1_metrics.yaml . -f
You can now navigate to the new experiment in the WebUI and view the plot populated with the training and validation metrics.
The complete 1_metrics.py
and 1_metrics.yaml
listings used in this example can be found in
the core_api.tgz
download or in the Github repository.