Optimisers

SGD

class DLL.DeepLearning.Optimisers.SGD(learning_rate=0.001, momentum=0.9)[source]

Bases: BaseOptimiser

Stochastic gradient descent optimiser with momentum. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory.

Parameters:

learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.
momentum (float, optional) – Determines how long the previous gradients affect the current direction. Must be in range [0, 1). Defaults to 0.9.

initialise_parameters(model_parameters)[source]

Initialises the optimiser with the parameters that need to be optimised.

Parameters:: model_parameters (list[torch.Tensor]) – The parameters that will be optimised. Must be a list or a tuple of torch tensors.

update_parameters()[source]: Takes a step towards the optimum for each parameter.

ADAM

class DLL.DeepLearning.Optimisers.ADAM(learning_rate=0.001, beta1=0.9, beta2=0.999, weight_decay=0, amsgrad=False)[source]

Bases: BaseOptimiser

The adaptive moment estimation optimiser. Is very robust and does not require a lot of tuning it’s hyperparameters. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory. Is based on algorithm 1 on this paper.

Parameters:

learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.
beta1 (float, optional) – Determines how long the previous gradients affect the current step direction. Must be in range [0, 1). Defaults to 0.9.
beta2 (float, optional) – Determines how long the previous squared gradients affect the current step direction. Must be in range [0, 1). Defaults to 0.999.
weight_decay (float, optional) – Determines if regularisation should be applied to the weights. Must be in range [0, 1). Defaults to 0.
amsgrad (bool, optional) – Determines if a modified version of the algorithm is used. Defaults to False.

initialise_parameters(model_parameters)[source]

Initialises the optimiser with the parameters that need to be optimised.

Parameters:: model_parameters (list[torch.Tensor]) – The parameters that will be optimised. Must be a list or a tuple of torch tensors.

update_parameters()[source]: Takes a step towards the optimum for each parameter.

ADAGRAD

class DLL.DeepLearning.Optimisers.ADAGRAD(learning_rate=0.001, lr_decay=0, weight_decay=0)[source]

Bases: BaseOptimiser

The adaptive gradient optimiser. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory.

Parameters:

learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.
lr_decay (float, optional) – Determines how fast the learning rate decreases. Must be positive. Defaults to 0.
weight_decay (float, optional) – Determines if regularisation should be applied to the weights. Must be in range [0, 1). Defaults to 0.

initialise_parameters(model_parameters)[source]

Initialises the optimiser with the parameters that need to be optimised.

Parameters:: model_parameters (list[torch.Tensor]) – The parameters that will be optimised. Must be a list or a tuple of torch tensors.

update_parameters()[source]: Takes a step towards the optimum for each parameter.

ADADELTA

class DLL.DeepLearning.Optimisers.ADADELTA(learning_rate=0.001, rho=0.9, weight_decay=0)[source]

Bases: BaseOptimiser

The adadelta optimiser. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory.

Parameters:

learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.
rho (float, optional) – Determines how long the previous gradients affect the current step direction. Must be in range [0, 1). Defaults to 0.9.
weight_decay (float, optional) – Determines if regularisation should be applied to the weights. Must be in range [0, 1). Defaults to 0.

initialise_parameters(model_parameters)[source]

Initialises the optimiser with the parameters that need to be optimised.

Parameters:: model_parameters (list[torch.Tensor]) – The parameters that will be optimised. Must be a list or a tuple of torch tensors.

update_parameters()[source]: Takes a step towards the optimum for each parameter.

RMSPROP

class DLL.DeepLearning.Optimisers.RMSPROP(learning_rate=0.001, alpha=0.99, momentum=0, weight_decay=0, centered=False)[source]

Bases: BaseOptimiser

The Root Mean Square Propagation optimiser. An improvement over the ADAGRAD optimiser solving the diminishing learning rate problem. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory.

Parameters:

learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.
alpha (float, optional) – A smoothing constant. Defaults to 0.99.
momentum (float, optional) – Determines how long the previous gradients affect the current direction. Must be in range [0, 1). Defaults to 0.
weight_decay (float, optional) – Determines if regularisation should be applied to the weights. Must be in range [0, 1). Defaults to 0.
centered (bool, optional) – Determines if a centered version of the algorithm is used. Defaults to False.

initialise_parameters(model_parameters)[source]

Initialises the optimiser with the parameters that need to be optimised.

Parameters:: model_parameters (list[torch.Tensor]) – The parameters that will be optimised. Must be a list or a tuple of torch tensors.

update_parameters()[source]: Takes a step towards the optimum for each parameter.

LBFGS

class DLL.DeepLearning.Optimisers.LBFGS(loss, learning_rate=0.001, history_size=10, maxiterls=20)[source]

Bases: BaseOptimiser

Limited-memory Broyden-Fletcher-Goldfarb-Shanno optimiser. A second order method and approximates the hessian matrix using changes in position and gradient. Hence, requires more memory than first order methods.

Parameters:

loss (Callable[[], float]) – The target function. For a deep learning model, one could use eg. lambda: model.loss.loss(model.predict(x_train), y_train).
learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.
history_size (int, optional) – The number of old changes in position and gradient stored. Must be a non-negative integer. Defaults to 10.
maxiterls (int, optional) – The maximum number of iterations in the line search. Must be a non-negative integer. Defaults to 20.

initialise_parameters(model_parameters)[source]

Initialises the optimiser with the parameters that need to be optimised.

Parameters:: model_parameters (list[torch.Tensor]) – The parameters that will be optimised. Must be a list or a tuple of torch tensors.

update_parameters()[source]: Takes a step towards the optimum for each parameter.