Optimisers

SGD

class DLL.DeepLearning.Optimisers.SGD(learning_rate=0.001, momentum=0.9)[source]

Bases: BaseOptimiser

Stochastic gradient descent optimiser with momentum. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory.

Parameters:
  • learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.

  • momentum (float, optional) – Determines how long the previous gradients affect the current direction. Must be in range [0, 1). Defaults to 0.9.

initialise_parameters(model_parameters)[source]

Initialises the optimiser with the parameters that need to be optimised.

Parameters:

model_parameters (list[torch.Tensor]) – The parameters that will be optimised. Must be a list or a tuple of torch tensors.

update_parameters()[source]

Takes a step towards the optimum for each parameter.

ADAM

class DLL.DeepLearning.Optimisers.ADAM(learning_rate=0.001, beta1=0.9, beta2=0.999, weight_decay=0, amsgrad=False)[source]

Bases: BaseOptimiser

The adaptive moment estimation optimiser. Is very robust and does not require a lot of tuning it’s hyperparameters. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory. Is based on algorithm 1 on this paper.

Parameters:
  • learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.

  • beta1 (float, optional) – Determines how long the previous gradients affect the current step direction. Must be in range [0, 1). Defaults to 0.9.

  • beta2 (float, optional) – Determines how long the previous squared gradients affect the current step direction. Must be in range [0, 1). Defaults to 0.999.

  • weight_decay (float, optional) – Determines if regularisation should be applied to the weights. Must be in range [0, 1). Defaults to 0.

  • amsgrad (bool, optional) – Determines if a modified version of the algorithm is used. Defaults to False.

initialise_parameters(model_parameters)[source]

Initialises the optimiser with the parameters that need to be optimised.

Parameters:

model_parameters (list[torch.Tensor]) – The parameters that will be optimised. Must be a list or a tuple of torch tensors.

update_parameters()[source]

Takes a step towards the optimum for each parameter.

ADAGRAD

class DLL.DeepLearning.Optimisers.ADAGRAD(learning_rate=0.001, lr_decay=0, weight_decay=0)[source]

Bases: BaseOptimiser

The adaptive gradient optimiser. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory.

Parameters:
  • learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.

  • lr_decay (float, optional) – Determines how fast the learning rate decreases. Must be positive. Defaults to 0.

  • weight_decay (float, optional) – Determines if regularisation should be applied to the weights. Must be in range [0, 1). Defaults to 0.

initialise_parameters(model_parameters)[source]

Initialises the optimiser with the parameters that need to be optimised.

Parameters:

model_parameters (list[torch.Tensor]) – The parameters that will be optimised. Must be a list or a tuple of torch tensors.

update_parameters()[source]

Takes a step towards the optimum for each parameter.

ADADELTA

class DLL.DeepLearning.Optimisers.ADADELTA(learning_rate=0.001, rho=0.9, weight_decay=0)[source]

Bases: BaseOptimiser

The adadelta optimiser. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory.

Parameters:
  • learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.

  • rho (float, optional) – Determines how long the previous gradients affect the current step direction. Must be in range [0, 1). Defaults to 0.9.

  • weight_decay (float, optional) – Determines if regularisation should be applied to the weights. Must be in range [0, 1). Defaults to 0.

initialise_parameters(model_parameters)[source]

Initialises the optimiser with the parameters that need to be optimised.

Parameters:

model_parameters (list[torch.Tensor]) – The parameters that will be optimised. Must be a list or a tuple of torch tensors.

update_parameters()[source]

Takes a step towards the optimum for each parameter.

RMSPROP

class DLL.DeepLearning.Optimisers.RMSPROP(learning_rate=0.001, alpha=0.99, momentum=0, weight_decay=0, centered=False)[source]

Bases: BaseOptimiser

The Root Mean Square Propagation optimiser. An improvement over the ADAGRAD optimiser solving the diminishing learning rate problem. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory.

Parameters:
  • learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.

  • alpha (float, optional) – A smoothing constant. Defaults to 0.99.

  • momentum (float, optional) – Determines how long the previous gradients affect the current direction. Must be in range [0, 1). Defaults to 0.

  • weight_decay (float, optional) – Determines if regularisation should be applied to the weights. Must be in range [0, 1). Defaults to 0.

  • centered (bool, optional) – Determines if a centered version of the algorithm is used. Defaults to False.

initialise_parameters(model_parameters)[source]

Initialises the optimiser with the parameters that need to be optimised.

Parameters:

model_parameters (list[torch.Tensor]) – The parameters that will be optimised. Must be a list or a tuple of torch tensors.

update_parameters()[source]

Takes a step towards the optimum for each parameter.

LBFGS

class DLL.DeepLearning.Optimisers.LBFGS(loss, learning_rate=0.001, history_size=10, maxiterls=20)[source]

Bases: BaseOptimiser

Limited-memory Broyden-Fletcher-Goldfarb-Shanno optimiser. A second order method and approximates the hessian matrix using changes in position and gradient. Hence, requires more memory than first order methods.

Parameters:
  • loss (Callable[[], float]) – The target function. For a deep learning model, one could use eg. lambda: model.loss.loss(model.predict(x_train), y_train).

  • learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.

  • history_size (int, optional) – The number of old changes in position and gradient stored. Must be a non-negative integer. Defaults to 10.

  • maxiterls (int, optional) – The maximum number of iterations in the line search. Must be a non-negative integer. Defaults to 20.

initialise_parameters(model_parameters)[source]

Initialises the optimiser with the parameters that need to be optimised.

Parameters:

model_parameters (list[torch.Tensor]) – The parameters that will be optimised. Must be a list or a tuple of torch tensors.

update_parameters()[source]

Takes a step towards the optimum for each parameter.