Optimisers
SGD
- class DLL.DeepLearning.Optimisers.SGD(learning_rate=0.001, momentum=0.9)[source]
Bases:
BaseOptimiser
Stochastic gradient descent optimiser with momentum. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory.
- Parameters:
learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.
momentum (float, optional) – Determines how long the previous gradients affect the current direction. Must be in range [0, 1). Defaults to 0.9.
ADAM
- class DLL.DeepLearning.Optimisers.ADAM(learning_rate=0.001, beta1=0.9, beta2=0.999, weight_decay=0, amsgrad=False)[source]
Bases:
BaseOptimiser
The adaptive moment estimation optimiser. Is very robust and does not require a lot of tuning it’s hyperparameters. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory. Is based on algorithm 1 on this paper.
- Parameters:
learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.
beta1 (float, optional) – Determines how long the previous gradients affect the current step direction. Must be in range [0, 1). Defaults to 0.9.
beta2 (float, optional) – Determines how long the previous squared gradients affect the current step direction. Must be in range [0, 1). Defaults to 0.999.
weight_decay (float, optional) – Determines if regularisation should be applied to the weights. Must be in range [0, 1). Defaults to 0.
amsgrad (bool, optional) – Determines if a modified version of the algorithm is used. Defaults to False.
ADAGRAD
- class DLL.DeepLearning.Optimisers.ADAGRAD(learning_rate=0.001, lr_decay=0, weight_decay=0)[source]
Bases:
BaseOptimiser
The adaptive gradient optimiser. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory.
- Parameters:
learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.
lr_decay (float, optional) – Determines how fast the learning rate decreases. Must be positive. Defaults to 0.
weight_decay (float, optional) – Determines if regularisation should be applied to the weights. Must be in range [0, 1). Defaults to 0.
ADADELTA
- class DLL.DeepLearning.Optimisers.ADADELTA(learning_rate=0.001, rho=0.9, weight_decay=0)[source]
Bases:
BaseOptimiser
The adadelta optimiser. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory.
- Parameters:
learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.
rho (float, optional) – Determines how long the previous gradients affect the current step direction. Must be in range [0, 1). Defaults to 0.9.
weight_decay (float, optional) – Determines if regularisation should be applied to the weights. Must be in range [0, 1). Defaults to 0.
RMSPROP
- class DLL.DeepLearning.Optimisers.RMSPROP(learning_rate=0.001, alpha=0.99, momentum=0, weight_decay=0, centered=False)[source]
Bases:
BaseOptimiser
The Root Mean Square Propagation optimiser. An improvement over the ADAGRAD optimiser solving the diminishing learning rate problem. A first order method and therefore does not use information on second gradients, i.e. the hessian matrix. Hence, does not require a lot of memory.
- Parameters:
learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.
alpha (float, optional) – A smoothing constant. Defaults to 0.99.
momentum (float, optional) – Determines how long the previous gradients affect the current direction. Must be in range [0, 1). Defaults to 0.
weight_decay (float, optional) – Determines if regularisation should be applied to the weights. Must be in range [0, 1). Defaults to 0.
centered (bool, optional) – Determines if a centered version of the algorithm is used. Defaults to False.
LBFGS
- class DLL.DeepLearning.Optimisers.LBFGS(loss, learning_rate=0.001, history_size=10, maxiterls=20)[source]
Bases:
BaseOptimiser
Limited-memory Broyden-Fletcher-Goldfarb-Shanno optimiser. A second order method and approximates the hessian matrix using changes in position and gradient. Hence, requires more memory than first order methods.
- Parameters:
loss (Callable[[], float]) – The target function. For a deep learning model, one could use eg. lambda: model.loss.loss(model.predict(x_train), y_train).
learning_rate (float, optional) – The learning rate of the optimiser. Must be positive. Defaults to 0.001.
history_size (int, optional) – The number of old changes in position and gradient stored. Must be a non-negative integer. Defaults to 10.
maxiterls (int, optional) – The maximum number of iterations in the line search. Must be a non-negative integer. Defaults to 20.