Custom Searcher Reference#
determined.searcher.LocalSearchRunner
#
- class determined.searcher.LocalSearchRunner(search_method: determined.searcher._search_method.SearchMethod, searcher_dir: Optional[pathlib.Path] = None, session: Optional[determined.common.api._session.Session] = None)#
LocalSearchRunner
performs a search for optimal hyperparameter values, applying the providedSearchMethod
. It is executed locally and interacts with a Determined cluster where it starts a multi-trial experiment. It then reacts to event notifications coming from the running experiments by forwarding them to event handler methods in yourSearchMethod
implementation and sending the returned operations back to the experiment.- run(exp_config: Union[Dict[str, Any], str], model_dir: Optional[str] = None, includes: Optional[Iterable[Union[str, pathlib.Path]]] = None) int #
Run custom search.
- Parameters
exp_config (dictionary, string) – experiment config filename (.yaml) or a dict.
model_dir (string) – directory containing model definition.
includes (Iterable[Union[str, pathlib.Path]], optional) – Additional files or directories to include in the model definition. (default:
None
)
determined.searcher.RemoteSearchRunner
#
- class determined.searcher.RemoteSearchRunner(search_method: determined.searcher._search_method.SearchMethod, context: determined.core._context.Context)#
RemoteSearchRunner
performs a search for optimal hyperparameter values, applying the providedSearchMethod
(you will subclassSearchMethod
and provide an instance of the derived class).RemoteSearchRunner
executes on-cluster: it runs a meta-experiment usingCore API
.- run(exp_config: Union[Dict[str, Any], str], model_dir: Optional[str] = None, includes: Optional[Iterable[Union[str, pathlib.Path]]] = None) int #
Run custom search as a Core API experiment (on-cluster).
- Parameters
exp_config (dictionary, string) – experiment config filename (.yaml) or a dict.
model_dir (string) – directory containing model definition.
includes (Iterable[Union[str, pathlib.Path]], optional) – Additional files or directories to include in the model definition. (default:
None
)
determined.searcher.SearchMethod
#
- class determined.searcher.SearchMethod#
The implementation of a custom hyperparameter tuning algorithm.
To implement your specific hyperparameter tuning approach, subclass
SearchMethod
overriding the event handler methods.Each event handler, except
progress()
returns a list of operations (List[Operation]
) that will be submitted to master for processing.Currently, we support the following
Operation
:Create
- starts a new trial with a unique trial id and a set of hyperparameter values.ValidateAfter
- sets number of steps (i.e., batches or epochs) after which a validation is run for a trial with a given id.Progress
- updates the progress of the multi-trial experiment to the master.Close
- closes a trial with a given id.Shutdown
- closes the experiment.
Note
Do not modify
searcher_state
passed into event handlers.- abstract initial_operations(searcher_state: determined.searcher._search_method.SearcherState) List[determined.searcher._search_method.Operation] #
Returns a list of initial operations that the custom hyperparameter search should perform. This is called by the Custom Searcher
SearchRunner
to initialize the trialsExample:
def initial_operations(self, _: searcher.SearcherState) -> List[searcher.Operation]: ops: List[searcher.Operation] = [] N = 100 hparams = { # ... } for _ in range(0, N): create = searcher.Create( request_id=uuid.uuid4(), hparams=hparams, checkpoint=None, ) ops.append(create) return ops
- Parameters
searcher_state (
SearcherState
) – Read-only current searcher state- Returns
Initial list of
Operation
to start the Hyperparameter search- Return type
List[Operation]
- load(path: pathlib.Path) Tuple[determined.searcher._search_method.SearcherState, int] #
Loads searcher state and method-specific state.
- load_method_state(path: pathlib.Path) None #
Loads method-specific search state.
- abstract on_trial_closed(searcher_state: determined.searcher._search_method.SearcherState, request_id: uuid.UUID) List[determined.searcher._search_method.Operation] #
Informs the searcher that a trial has been closed as a result of a
Close
Example:
def on_trial_closed( self, searcher_state: SearcherState, request_id: uuid.UUID ) -> List[Operation]: if searcher_state.trials_created < self.max_num_trials: hparams = { # ... } return [ searcher.Create( request_id=uuid.uuid4(), hparams=hparams, checkpoint=None, ) ] if searcher_state.trials_closed >= self.max_num_trials: return [searcher.Shutdown()] return []
- Parameters
searcher_state (SearcherState) – Read-only current searcher state
request_id (uuid.UUID) – Request UUID of the Trial that was closed
- Returns
List of
Operation
to run after closing the given trial- Return type
List[Operation]
- abstract on_trial_created(searcher_state: determined.searcher._search_method.SearcherState, request_id: uuid.UUID) List[determined.searcher._search_method.Operation] #
Informs the searcher that a trial has been created as a result of Create operation.
Example:
def on_trial_created( self, _: SearcherState, request_id: uuid.UUID ) -> List[Operation]: return [ searcher.ValidateAfter( request_id=request_id, length=1, # Run for one unit of time (epoch, etc.) ) ]
In this example, we are choosing to deterministically train for one unit of time
- Parameters
searcher_state (
SearcherState
) – Read-only current searcher staterequest_id (uuid.UUID) – Request UUID of the Trial that was created
- Returns
List of
Operation
to run upon creation of the given trial- Return type
List[Operation]
- abstract on_trial_exited_early(searcher_state: determined.searcher._search_method.SearcherState, request_id: uuid.UUID, exited_reason: determined.searcher._search_method.ExitedReason) List[determined.searcher._search_method.Operation] #
Informs the searcher that a trial has exited earlier than expected.
Example:
def on_trial_exited_early( self, searcher_state: SearcherState, request_id: uuid.UUID, exited_reason: ExitedReason, ) -> List[Operation]: if exited_reason == searcher.ExitedReason.USER_CANCELED: return [searcher.Shutdown(cancel=True)] if exited_reason == searcher.ExitedReason.INVALID_HP: return [searcher.Shutdown(failure=True)] if searcher_state.failures >= self.max_failures: return [searcher.Shutdown(failure=True)] return []
Note
The trial has already been internally closed when this callback is run. You do not need to explicitly issue a
Close
operation- Parameters
searcher_state (SearcherState) – Read-only current searcher state
request_id (uuid.UUID) – Request UUID of the Trial that exited early
exited_reason (ExitedReason) – The reason that the trial exited early
- Returns
List of
Operation
to run in response to the given trial exiting early- Return type
List[Operation]
- abstract on_validation_completed(searcher_state: determined.searcher._search_method.SearcherState, request_id: uuid.UUID, metric: Any, train_length: int) List[determined.searcher._search_method.Operation] #
Informs the searcher that the validation workload has completed after training for
train_length
units. It returns any new operations as a result of this workload completingExample:
def on_validation_completed( self, searcher_state: SearcherState, request_id: uuid.UUID, metric: Any, train_length: int ) -> List[Operation]: if train_length < self.max_train_length: return [ searcher.ValidateAfter( request_id=request_id, length=train_length + 1, # Run an additional unit of time ) ] return [searcher.Close(request_id=request_id)]
- Parameters
searcher_state (SearcherState) – Read-only current searcher state
request_id (uuid.UUID) – Request UUID of the Trial that was trained
metric (Any) – Metric data returned by the trial
train_length (int) – The cumulative units of time that that trial has finished training for (epochs, etc.)
- Returns
List of
Operation
to run upon completion of training for the given trial- Return type
List[Operation]
- abstract progress(searcher_state: determined.searcher._search_method.SearcherState) float #
Returns experiment progress as a float between 0 and 1.
Example:
def progress(self, searcher_state: SearcherState) -> float: return searcher_state.trials_closed / float(self.max_num_trials)
- Parameters
searcher_state (SearcherState) – Read-only current searcher state
- Returns
Experiment progress as a float between 0 and 1.
- Return type
float
- save(searcher_state: determined.searcher._search_method.SearcherState, path: pathlib.Path, *, experiment_id: int) None #
Saves the searcher state and the search method state. It will be called by the
SearchRunner
after receiving operations from theSearchMethod
- save_method_state(path: pathlib.Path) None #
Saves method-specific state
determined.searcher.SearcherState
#
- class determined.searcher.SearcherState#
Custom Searcher State.
Search runners maintain this state that can be used by a
SearchMethod
to inform event handling. In other words, this state can be taken into account when deciding which operations to return from your event handler. Do not modifySearcherState
in yourSearchMethod
. If your hyperparameter tuning algorithm needs additional state variables, add those variable to yourSearchMethod
implementation.- failures#
number of failed trials
- Type
Set[uuid.UUID]
- trial_progress#
progress of each trial as a number between 0.0 and 1.0
- Type
Dict[uuid.UUID, float]
- trials_closed#
set of completed trials
- Type
Set[uuid.UUID]
- trials_created#
set of created trials
- Type
Set[uuid.UUID]
determined.searcher.Operation
#
- class determined.searcher.Operation#
Abstract base class for all Operations
determined.searcher.Close
#
- class determined.searcher.Close(request_id: uuid.UUID)#
Operation for closing the specified trial
determined.searcher.Progress
#
- class determined.searcher.Progress(progress: float)#
Operation for signalling the relative progress of the hyperparameter search between 0 and 1
determined.searcher.Create
#
- class determined.searcher.Create(request_id: uuid.UUID, hparams: Dict[str, Any], checkpoint: Optional[determined.common.experimental.checkpoint._checkpoint.Checkpoint])#
Operation for creating a trial with a specified combination of hyperparameter values
determined.searcher.ValidateAfter
#
- class determined.searcher.ValidateAfter(request_id: uuid.UUID, length: int)#
Operation signaling the trial to train until its total units trained equals the specified length, where the units (batches, epochs, etc.) are specified in the searcher section of the experiment configuration
determined.searcher.Shutdown
#
- class determined.searcher.Shutdown(cancel: bool = False, failure: bool = False)#
Operation for shutting the experiment down
determined.searcher.ExitedReason
#
- class determined.searcher.ExitedReason(value)#
The reason why a trial exited early
The following reasons are supported:
ERRORED: The Trial encountered an exception
USER_CANCELLED: The Trial was manually closed by the user
INVALID_HP: The hyperparameters the trial was created with were invalid