Core layers

Dense

class DLL.DeepLearning.Layers.Dense(output_shape, bias=True, initialiser=<DLL.DeepLearning.Initialisers._Xavier_Glorot.Xavier_Uniform object>, activation=None, normalisation=None, **kwargs)[source]

Bases: BaseLayer

The basic dense linear layer.

Parameters:
  • output_shape (tuple[int] or int) – The output_shape of the model not containing the batch_size dimension. Must contain non-negative integers. If is int, returned shape is (n_samples, int). If is the length is zero, the returned tensor is of shape (n_samples,). Otherwise the returned tensor is of shape (n_samples, *output_shape).

  • initialiser (Initialisers, optional) – The initialisation method for models weights. Defaults to Xavier_uniform.

  • activation (Activation layers | None, optional) – The activation used after this layer. If is set to None, no activation is used. Defaults to None. If both activation and regularisation is used, the regularisation is performed first in the forward propagation.

  • normalisation (Regularisation layers | None, optional) – The regularisation layer used after this layer. If is set to None, no regularisation is used. Defaults to None. If both activation and regularisation is used, the regularisation is performed first in the forward propagation.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer. Also calculates the gradients of the loss function with respect to the model parameters.

Parameters:

dCdy (torch.Tensor of shape (n_samples,) if len(layer.output_shape) == 0 else (n_samples, output_shape)) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (n_samples, layer.input_shape[0])

forward(input, training=False, **kwargs)[source]

Applies the basic linear transformation

\[\begin{split}\begin{align*} y_{lin} = xW + b,\\ y_{reg} = f(y_{lin}),\\ y_{activ} = g(y_{reg}), \end{align*}\end{split}\]

where \(f\) is the possible regularisation function and \(g\) is the possible activation function.

Parameters:
  • input (torch.Tensor of shape (n_samples, n_features)) – The input to the dense layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

  • training (bool, optional) – The boolean flag deciding if the model is in training mode. Defaults to False.

Returns:

The output tensor after the transformations with the spesified shape.

Return type:

torch.Tensor of shape (n_samples,) if len(layer.output_shape) == 0 else (n_samples, output_shape)

Conv2D

class DLL.DeepLearning.Layers.Conv2D(kernel_size, output_depth, initialiser=<DLL.DeepLearning.Initialisers._Xavier_Glorot.Xavier_Uniform object>, activation=None, normalisation=None, **kwargs)[source]

Bases: BaseLayer

The convolutional layer for a neural network.

Parameters:
  • kernel_size (int) – The kernel size used for the model. The kernel is automatically square. Must be a positive integer.

  • output_depth (int) – The output depth of the layer. Must be a positive integer.

  • initialiser (Initialisers, optional) – The initialisation method for models weights. Defaults to Xavier_uniform.

  • activation (Activation layers | None, optional) – The activation used after this layer. If is set to None, no activation is used. Defaults to None. If both activation and regularisation is used, the regularisation is performed first in the forward propagation.

  • normalisation (Regularisation layers | None, optional) – The regularisation layer used fter this layer. If is set to None, no regularisation is used. Defaults to None. If both activation and regularisation is used, the regularisation is performed first in the forward propagation.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer. Also calculates the gradients of the loss function with respect to the model parameters.

Parameters:

dCdy (torch.Tensor of shape (n_samples, output_depth, output_height, output_width) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (n_samples, input_depth, input_height, input_width)

forward(input, training=False, **kwargs)[source]

Applies the convolutional transformation.

\[\begin{split}\begin{align*} y_{i, j} &= \text{bias}_j + \sum_{k = 1}^{\text{d_in}} \text{kernel}(j, k) \star \text{input}(i, k),\\ y_{reg_{i, j}} &= f(y_{i, j}),\\ y_{activ_{i, j}} &= g(y_{reg}), \end{align*}\end{split}\]

where \(\star\) is the cross-correlation operator, \(\text{d_in}\) is the input_depth, \(i\in [1,\dots, \text{batch_size}]\), \(j\in[1,\dots, \text{output_depth}]\), \(f\) is the possible regularisation function and \(g\) is the possible activation function.

Parameters:
  • input (torch.Tensor of shape (n_samples, input_depth, input_height, input_width)) – The input to the layer. Must be a torch.Tensor of the spesified shape.

  • training (bool, optional) – The boolean flag deciding if the model is in training mode. Defaults to False.

Returns:

The output tensor after the transformations with the spesified shape.

Return type:

torch.Tensor of shape (n_samples, output_depth, height - kernel_size + 1, width - kernel_size + 1)

Flatten

class DLL.DeepLearning.Layers.Flatten(**kwargs)[source]

Bases: BaseLayer

The flattening layer.

backward(dCdy, **kwargs)[source]

Reshapes the gradient to the original shape.

Parameters:

dCdy (torch.Tensor of shape (n_samples, product_of_other_dimensions) – The gradient given by the next layer.

Returns:

The reshaped gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (n_samples, *layer.input_shape)

forward(input, **kwargs)[source]

Flattens the input tensor into a 2 dimensional tensor.

Parameters:

input (torch.Tensor of shape (n_samples, ...)) – The input to the layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

Returns:

The output tensor after flattening the input tensor.

Return type:

torch.Tensor of shape (n_samples, product_of_other_dimensions)

LSTM

class DLL.DeepLearning.Layers.LSTM(output_shape, hidden_size, return_last=True, initialiser=<DLL.DeepLearning.Initialisers._Xavier_Glorot.Xavier_Uniform object>, activation=None, normalisation=None, **kwargs)[source]

Bases: BaseLayer

The long short-term memory layer for neural networks.

Parameters:
  • output_shape (tuple[int]) – The ouput shape by the forward method. Must be tuple containing non-negative ints. Based on the length of the tuple and the return_last parameter, the returned tensor is of shape (n_samples,), (n_samples, sequence_length), (n_samples, n_features) or (n_samples, sequence_length, n_features).

  • hidden_size (int) – The number of features in the hidden state vector. Must be a positive integer.

  • return_last (bool) – Determines if only the last element or the whole sequence is returned.

  • initialiser (Initialisers, optional) – The initialisation method for models weights. Defaults to Xavier_uniform.

  • activation (Activation layers | None, optional) – The activation used after this layer. If is set to None, no activation is used. Defaults to None. If both activation and regularisation is used, the regularisation is performed first in the forward propagation.

  • normalisation (Regularisation layers | None, optional) – The regularisation layer used after this layer. If is set to None, no regularisation is used. Defaults to None. If both activation and regularisation is used, the regularisation is performed first in the forward propagation.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer. Also calculates the gradients of the loss function with respect to the model parameters.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (batch_size, sequence_length, input_size)

forward(input, training=False, **kwargs)[source]

Calculates the forward propagation of the model using the equations

\[\begin{split}\begin{align*} f_t &= \sigma(W_fx_t + U_fh_{t - 1} + b_f),\\ i_t &= \sigma(W_ix_t + U_ih_{t - 1} + b_i),\\ o_t &= \sigma(W_ox_t + U_oh_{t - 1} + b_o),\\ \widetilde{c}_t &= \text{tanh}(W_cx_t + U_ch_{t - 1} + b_c),\\ c_t &= f_t\odot c_{t - 1} + i_t\odot\widetilde{c}_t,\\ h_t &= o_t\odot\text{tanh}(c_t),\\ y_t &= W_yh_t + b_y,\\ y_{reg} &= f(y) \text{ or } f(y_\text{sequence_length}),\\ y_{activ} &= g(y_{reg}), \end{align*}\end{split}\]

where \(t\in[1,\dots, \text{sequence_length}]\), \(x\) is the input, \(h_t\) is the hidden state, \(W\) and \(U\) are the weight matricies, \(b\) are the biases, \(f\) is the possible regularisation function and \(g\) is the possible activation function. Also \(\odot\) represents the hadamard or the element-wise product and \(\sigma\) represents the sigmoid function.

Parameters:
  • input (torch.Tensor of shape (batch_size, sequence_length, input_size)) – The input to the layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

  • training (bool, optional) – The boolean flag deciding if the model is in training mode. Defaults to False.

Returns:

The output tensor after the transformations with the spesified shape.

The return shapes of the method depending on the parameters.

Parameter

Return Shape

len(LSTM.output_shape) == 0 and LSTM.return_last

(n_samples,)

len(LSTM.output_shape) == 1 and LSTM.return_last

(n_samples, LSTM.output_shape[1])

len(LSTM.output_shape) == 1 and not LSTM.return_last

(n_samples, sequence_length)

len(LSTM.output_shape) == 2 and not LSTM.return_last

(n_samples, sequence_length, LSTM.output_shape[1])

Return type:

torch.Tensor

MaxPooling2D

class DLL.DeepLearning.Layers.MaxPooling2D(pool_size, **kwargs)[source]

Bases: Activation

The max pooling layer for a neural network.

Parameters:

pool_size (int) – The pooling size used for the model. The pooling kernel is automatically square. Must be a positive integer.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer.

Parameters:

dCdy (torch.Tensor of shape (n_samples, output_depth, output_height, output_width) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (n_samples, input_depth, input_height, input_width)

forward(input, **kwargs)[source]

Applies the max pooling transformation.

Parameters:

input (torch.Tensor of shape (n_samples, input_depth, height, width)) – The input to the layer. Must be a torch.Tensor of the spesified shape.

Returns:

The output tensor after the transformations with the spesified shape.

Return type:

torch.Tensor of shape (n_samples, output_depth, height // layer.pool_size, width // layer.pool_size)

RNN

class DLL.DeepLearning.Layers.RNN(output_shape, hidden_size, return_last=True, initialiser=<DLL.DeepLearning.Initialisers._Xavier_Glorot.Xavier_Uniform object>, activation=None, normalisation=None, **kwargs)[source]

Bases: BaseLayer

The recurrent neural network layer.

Parameters:
  • output_shape (tuple[int]) – The ouput shape by the forward method. Must be tuple containing non-negative ints. Based on the length of the tuple and the return_last parameter, the returned tensor is of shape (n_samples,), (n_samples, sequence_length), (n_samples, n_features) or (n_samples, sequence_length, n_features).

  • hidden_size (int) – The number of features in the hidden state vector. Must be a positive integer.

  • return_last (bool) – Determines if only the last element or the whole sequence is returned.

  • initialiser (Initialisers, optional) – The initialisation method for models weights. Defaults to Xavier_uniform.

  • activation (Activation layers | None, optional) – The activation used after this layer. If is set to None, no activation is used. Defaults to None. If both activation and regularisation is used, the regularisation is performed first in the forward propagation.

  • normalisation (Regularisation layers | None, optional) – The regularisation layer used after this layer. If is set to None, no regularisation is used. Defaults to None. If both activation and regularisation is used, the regularisation is performed first in the forward propagation.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer. Also calculates the gradients of the loss function with respect to the model parameters.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (n_samples, sequence_length, input_size)

forward(input, training=False, **kwargs)[source]

Calculates the forward propagation of the model using the equation

\[\begin{split}\begin{align*} h_t &= \text{tanh}(x_tW_{ih}^T + h_{t - 1}W_{hh}^T + b_h),\\ y_{t} &= h_tW_o^T + b_o,\\ y_{reg} &= f(y) \text{ or } f(y_\text{sequence_length}),\\ y_{activ} &= g(y_{reg}), \end{align*}\end{split}\]

where \(t\in[1,\dots, \text{sequence_length}]\), \(x\) is the input, \(h_t\) is the hidden state, \(W_{ih}\) is the input to hidden weights, \(W_{hh}\) is the hidden to hidden weights, \(b_h\) is the hidden bias, \(W_o\) is the output weights, \(b_o\) is the output bias, \(f\) is the possible regularisation function and \(g\) is the possible activation function.

Parameters:
  • input (torch.Tensor of shape (batch_size, sequence_length, input_size)) – The input to the layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

  • training (bool, optional) – The boolean flag deciding if the model is in training mode. Defaults to False.

Returns:

The output tensor after the transformations with the spesified shape.

The return shapes of the method depending on the parameters.

Parameter

Return Shape

len(RNN.output_shape) == 0 and RNN.return_last

(n_samples,)

len(RNN.output_shape) == 1 and RNN.return_last

(n_samples, RNN.output_shape[1])

len(RNN.output_shape) == 1 and not RNN.return_last

(n_samples, sequence_length)

len(RNN.output_shape) == 2 and not RNN.return_last

(n_samples, sequence_length, RNN.output_shape[1])

Return type:

torch.Tensor

Bidirectional

class DLL.DeepLearning.Layers.Bidirectional(layer, **kwargs)[source]

Bases: BaseLayer

The bidirectional wrapper for LSTM or RNN layers.

Parameters:

layer (DLL.DeepLearning.Layers.RNN or LSTM object) – The input is passed to this layer in forward and reverse. The results of each layer are concatanated together along the feature axis.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer. Also calculates the gradients of the loss function with respect to the model parameters.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (n_samples, sequence_length, input_size)

forward(input, training=False, **kwargs)[source]

Computes the forward values of the RNN or LSTM layer for both normal input and reverse input and concatanates the results along the feature axis.

Parameters:
  • input (torch.Tensor of shape (batch_size, sequence_length, input_size)) – The input to the layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

  • training (bool, optional) – The boolean flag deciding if the model is in training mode. Defaults to False.

Returns:

The output tensor after the transformations with the spesified shape.

Return type:

torch.Tensor of shape (n_samples, 2 * RNN.output_shape[-1]) or (n_samples, sequence_length, 2 * RNN.output_shape[-1])

get_nparams()[source]
summary(offset='')[source]

MultiHeadAttention

class DLL.DeepLearning.Layers.MultiHeadAttention(output_shape, n_heads=1, use_mask=True, dropout=0.0, activation=None, normalisation=None, **kwargs)[source]

Bases: BaseLayer

The multi head attention layer.

Parameters:
  • output_shape (tuple[int]) – The output_shape of the model not containing the batch_size. Must be a tuple of positive integers. The returned tensor is of shape (n_samples, seq_len, output_shape) if len(output_shape) == 2 else (n_samples, seq_len).

  • n_heads (int) – The number of heads used in the layer. The output dimension must be divisible by n_heads.

  • use_mask (bool) – Determines if a mask is used to make the model only consider past tokens. Must be a boolean.

  • dropout (float) – The probability of a node being dropped out. Must be in range [0, 1). Defaults to 0.0.

  • activation (Activation layers | None, optional) – The activation used after this layer. If is set to None, no activation is used. Defaults to None. If both activation and regularisation is used, the regularisation is performed first in the forward propagation.

  • normalisation (Regularisation layers | None, optional) – The regularisation layer used after this layer. If is set to None, no regularisation is used. Defaults to None. If both activation and regularisation is used, the regularisation is performed first in the forward propagation.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer.

Parameters:

dCdy (torch.Tensor of shape (n_samples, seq_len, output_dim) if len(output_shape[0]) != 0 else (n_samples, seq_len)) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (n_samples, seq_len, output_dim)

forward(input, training=False, **kwargs)[source]

Applies the attention mechanism on multiple heads.

\[\begin{split}\begin{align*} y_{\text{MultiHead}} &= \text{Concat}(head_1, \dots, head_{\text{n_heads}}),\\ y_{reg} &= f(y_{\text{MultiHead}}),\\ y_{activ} &= g(y_{reg}), \end{align*}\end{split}\]

where \(head_i = \text{Attention}(Q, K, V) = \text{softmax}(\frac{QK^T}{\sqrt{\text{output_dim}}})\), \(f\) is the possible regularisation function and \(g\) is the possible activation function. \(Q, K\) and \(V\) are the query, key and value matricies, which taken from the input by transforming it by a linear layer and splitting the result on the feature axis.

Parameters:
  • input (torch.Tensor of shape (n_samples, seq_len, output_shape)) – The input to the dense layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

  • training (bool, optional) – The boolean flag deciding if the model is in training mode. Defaults to False.

Returns:

The output tensor after the transformations with the spesified shape.

Return type:

torch.Tensor of shape (n_samples, seq_len) if output_shape == 0 else (n_samples, seq_len, output_shape)

Identity

class DLL.DeepLearning.Layers.Identity(**kwargs)[source]

Bases: Activation

The identity layer.

backward(dCdy, **kwargs)[source]

Returns the gradient.

Parameters:

dCdy (torch.Tensor) – The gradient given by the next layer.

Returns:

The same tensor as the input gradient

Return type:

torch.Tensor

forward(input, training=False, **kwargs)[source]

Returns the input.

Parameters:
  • input (torch.Tensor of shape (n_samples, n_features)) – The input to the dense layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

  • training (bool, optional) – The boolean flag deciding if the model is in training mode. Defaults to False.

Returns:

The same tensor as the input

Return type:

torch.Tensor

Add

class DLL.DeepLearning.Layers.Add(layer1, layer2, activation=None, normalisation=None, **kwargs)[source]

Bases: BaseLayer

The addition layer.

Parameters:
  • layer1 (DLL.DeepLearning.Layers.BaseLayer object) – The first layer the input is passed to. The results of each layer are added together. The input and outpput shapes of the layers must be the same.

  • layer2 (DLL.DeepLearning.Layers.BaseLayer object) – The second layer the input is passed to. The results of each layer are added together. The input and outpput shapes of the layers must be the same.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layers. Also calculates the gradients of the loss function with respect to the model parameters.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of the spesified shape

forward(input, training=False, **kwargs)[source]

Computes the forward values of the input layers and adds them together.

Parameters:
  • input (torch.Tensor) – The input to the layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

  • training (bool, optional) – The boolean flag deciding if the model is in training mode. Defaults to False.

Returns:

The output tensor after the transformations with the spesified shape.

Return type:

torch.Tensor

get_nparams()[source]
summary(offset='')[source]

LayerList

class DLL.DeepLearning.Layers.LayerList(*args, **kwargs)[source]

Bases: BaseLayer

The list of consecutive layers.

Parameters:

*args (DLL.DeepLearning.Layers.BaseLayer objects) – An arbitrary amount of consecutive layers.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layers. Also calculates the gradients of the loss function with respect to the model parameters.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of the spesified shape

forward(input, training=False, **kwargs)[source]

Computes the forward values of the input layers.

Parameters:
  • input (torch.Tensor) – The input to the layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

  • training (bool, optional) – The boolean flag deciding if the model is in training mode. Defaults to False.

Returns:

The output tensor after the layers with the spesified shape.

Return type:

torch.Tensor

get_nparams()[source]
summary(offset='')[source]

Reshape

class DLL.DeepLearning.Layers.Reshape(output_shape, **kwargs)[source]

Bases: BaseLayer

The reshape layer.

Parameters:

output_shape (int) – The output_shape of the model not containing the batch_size dimension. Must be a positive integer or a tuple.

backward(dCdy, **kwargs)[source]

Reshapes the gradient to the original shape.

Parameters:

dCdy (torch.Tensor of shape (n_samples, *layer.output_shape) – The gradient given by the next layer.

Returns:

The reshaped gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (n_samples, *layer.input_shape)

forward(input, **kwargs)[source]

Reshapes the input into the output_shape.

Parameters:

input (torch.Tensor of shape (n_samples, *layer.input_shape)) – The input to the layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

Returns:

The output tensor after reshaping the input tensor.

Return type:

torch.Tensor of shape (n_samples, *layer.output_shape)

DenseKAN

class DLL.DeepLearning.Layers.DenseKAN(output_shape, n_basis_funcs=10, bounds=(-1, 1), basis_func_degree=3, initialiser=<DLL.DeepLearning.Initialisers._Xavier_Glorot.Xavier_Uniform object>, activation=None, normalisation=None, **kwargs)[source]

Bases: BaseLayer

The dense Kolmogorov-Arnold network layer. The implementation is based on this paper and this article.

Parameters:
  • output_shape (tuple[int] or int) – The output_shape of the model not containing the batch_size dimension. Must contain non-negative integers. If is int, returned shape is (n_samples, int). If is the length is zero, the returned tensor is of shape (n_samples,). Otherwise the returned tensor is of shape (n_samples, *output_shape).

  • n_basis_funcs (int, optional) – The number of basis functions used for fitting. If 1, only SiLU is used. Otherwise 1 SiLU and n_basis_funcs - 1 Bsplines basis functions are used. Must be a positive integer. Defaults to 10.

  • bounds (tuple[int], optional) – The theoretical min and max of the data that will be passed to the forward method. Must be a tuple containing two integers. Defaults to (-1, 1).

  • basis_func_degree (int, optional) – The degree of the Bspline basis functions. Must be positive. Defaults to 3.

  • initialiser (Initialisers, optional) – The initialisation method for models weights. Defaults to Xavier_uniform.

  • activation (Activation layers | None, optional) – The activation used after this layer. If is set to None, no activation is used. Defaults to None. If both activation and regularisation is used, the regularisation is performed first in the forward propagation.

  • normalisation (Regularisation layers | None, optional) – The regularisation layer used after this layer. If is set to None, no regularisation is used. Defaults to None. If both activation and regularisation is used, the regularisation is performed first in the forward propagation.

Note

n_basis_funcs and basis_func_degree should be different to avoid certain errors.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer. Also calculates the gradients of the loss function with respect to the model parameters.

Parameters:

dCdy (torch.Tensor of shape (n_samples,) if len(layer.output_shape) == 0 else (n_samples, output_shape)) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (n_samples, layer.input_shape[0])

forward(input, training=False, **kwargs)[source]

Applies the forward equation of the Kolmogorov-Arnold network.

Parameters:
  • input (torch.Tensor of shape (n_samples, n_features)) – The input to the dense layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

  • training (bool, optional) – The boolean flag deciding if the model is in training mode. Defaults to False.

Returns:

The output tensor after the transformations with the spesified shape.

Return type:

torch.Tensor of shape (n_samples,) if len(layer.output_shape) == 0 else (n_samples, output_shape)

Activation layers

ReLU

class DLL.DeepLearning.Layers.Activations.ReLU(**kwargs)[source]

Bases: Activation

The basic rectified linear unit activation function.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (batch_size, …)

forward(input, **kwargs)[source]

Calculates the following function for every element of the input matrix:

\[\text{ReLU}(x) = \text{max}(0, x).\]
Parameters:

input (torch.Tensor of shape (batch_size, ...)) – The input to the layer. Must be a torch.Tensor of any shape.

Returns:

The output tensor after applying the activation function of the same shape as the input.

Return type:

torch.Tensor

Sigmoid

class DLL.DeepLearning.Layers.Activations.Sigmoid(**kwargs)[source]

Bases: Activation

The sigmoid activation function.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (batch_size, …)

forward(input, **kwargs)[source]

Calculates the following function for every element of the input matrix:

\[\sigma(x) = \frac{1}{1 + e^{-x}}.\]
Parameters:

input (torch.Tensor of shape (batch_size, ...)) – The input to the layer. Must be a torch.Tensor of any shape.

Returns:

The output tensor after applying the activation function of the same shape as the input.

Return type:

torch.Tensor

SoftMax

class DLL.DeepLearning.Layers.Activations.SoftMax(dim=-1, **kwargs)[source]

Bases: Activation

The softmax activation function.

Parameters:

dim (int) – The dimension on which the softmax is calculated. If the data is (n_samples, n_channels, n_features) and one wants to calculate the softmax on the channels, one should select dim=1 or dim=-2.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (n_samples, n_features)

forward(input, **kwargs)[source]

Calculates the following function for every element of the input matrix:

\[\text{Softmax}(x)_i = \frac{e^{x_i}}{\sum_{j=1}^{K} e^{x_j}},\]

where \(K\) is the number of features of the input.

Parameters:

input (torch.Tensor of shape (n_samples, n_features)) – The input to the layer. Must be a torch.Tensor of the spesified shape.

Returns:

The output tensor after applying the activation function of the same shape as the input.

Return type:

torch.Tensor

Tanh

class DLL.DeepLearning.Layers.Activations.Tanh(**kwargs)[source]

Bases: Activation

The hyperbolic tangent activation function.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (batch_size, …)

forward(input, **kwargs)[source]

Calculates the hyperbolic tangent function for every element of the input matrix.

Parameters:

input (torch.Tensor of shape (batch_size, ...)) – The input to the layer. Must be a torch.Tensor of any shape.

Returns:

The output tensor after applying the activation function of the same shape as the input.

Return type:

torch.Tensor

Regularisation layers

Dropout

class DLL.DeepLearning.Layers.Regularisation.Dropout(p=0.5, **kwargs)[source]

Bases: BaseRegularisation

The dropout layer for neural networks.

Parameters:

p (float, optional) – The probability of a node being dropped out. Must be strictly between 0 and 1. Defaults to 0.5.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (batch_size, channels, …)

forward(input, training=False, **kwargs)[source]

Sets some values of the input to zero with probability p.

Parameters:
  • input (torch.Tensor of shape (batch_size, channels, ...)) – The input to the layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

  • training (bool, optional) – The boolean flag deciding if the model is in training mode. Defaults to False.

Returns:

The output tensor after the transformation with the same shape as the input.

Return type:

torch.Tensor

BatchNormalisation

class DLL.DeepLearning.Layers.Regularisation.BatchNorm(patience=0.9, **kwargs)[source]

Bases: BaseRegularisation

The batch normalisation layer for neural networks.

Parameters:

patience (float, optional) – The number deciding how fast the mean and variance in training. Must be strictly between 0 and 1. Defaults to 0.9.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer. Also calculates the gradients of the loss function with respect to the model parameters.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (batch_size, channels, …)

forward(input, training=False, **kwargs)[source]

Normalises the input to have zero mean and one variance with the following equation:

\[y = \gamma\frac{x - \mathbb{E}[x]}{\sqrt{\text{var}(x) + \epsilon}} + \beta,\]

where \(x\) is the input, \(\mathbb{E}[x]\) is the expected value or the mean accross the batch dimension, \(\text{var}(x)\) is the variance accross the variance accross the batch dimension, \(\epsilon\) is a small constant and \(\gamma\) and \(\beta\) are trainable parameters.

Parameters:
  • input (torch.Tensor of shape (batch_size, channels, ...)) – The input to the layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

  • training (bool, optional) – The boolean flag deciding if the model is in training mode. Defaults to False.

Returns:

The output tensor after the normalisation with the same shape as the input.

Return type:

torch.Tensor

GroupNormalisation

class DLL.DeepLearning.Layers.Regularisation.GroupNorm(num_groups=32, **kwargs)[source]

Bases: BaseRegularisation

The group normalisation layer for neural networks. Computes the group norm of a batch along axis=1

Parameters:

num_groups (int, optional) – The number of groups used in the normalisation. Must be a positive integer. Defaults to 32. The number of channels must be evenly divisible by num_groups. If is set to 1, is identical to layer normalisation and if batch_size, is identical to the instance normalisation.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer. Also calculates the gradients of the loss function with respect to the model parameters.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (batch_size, channels, …)

forward(input, **kwargs)[source]

Normalises the input to have zero mean and one variance accross self.num_groups groups accross the channel dimension with the following equation:

\[y = \gamma\frac{x - \mathbb{E}[x]}{\sqrt{\text{var}(x) + \epsilon}} + \beta,\]

where \(x\) is the input, \(\mathbb{E}[x]\) is the expected value or the mean accross each group, \(\text{var}(x)\) is the variance accross the variance accross each group, \(\epsilon\) is a small constant and \(\gamma\) and \(\beta\) are trainable parameters.

Parameters:
  • input (torch.Tensor of shape (batch_size, channels, ...)) – The input to the layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

  • training (bool, optional) – The boolean flag deciding if the model is in training mode. Defaults to False.

Returns:

The output tensor after the normalisation with the same shape as the input.

Return type:

torch.Tensor

InstanceNormalisation

class DLL.DeepLearning.Layers.Regularisation.InstanceNorm(**kwargs)[source]

Bases: GroupNorm

The instance normalisation layer for neural networks. Computes the group norm of a batch along axis=1 with the same number of groups as channels.

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer. Also calculates the gradients of the loss function with respect to the model parameters.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (batch_size, channels, …)

forward(input, **kwargs)[source]

Normalises the input to have zero mean and one variance accross the channel dimension with the following equation:

\[y = \gamma\frac{x - \mathbb{E}[x]}{\sqrt{\text{var}(x) + \epsilon}} + \beta,\]

where \(x\) is the input, \(\mathbb{E}[x]\) is the expected value or the mean accross the channel dimension, \(\text{var}(x)\) is the variance accross the variance accross the channel dimension, \(\epsilon\) is a small constant and \(\gamma\) and \(\beta\) are trainable parameters.

Parameters:

input (torch.Tensor of shape (batch_size, channels, ...)) – The input to the layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

Returns:

The output tensor after the normalisation with the same shape as the input.

Return type:

torch.Tensor

LayerNormalisation

class DLL.DeepLearning.Layers.Regularisation.LayerNorm(**kwargs)[source]

Bases: GroupNorm

backward(dCdy, **kwargs)[source]

Calculates the gradient of the loss function with respect to the input of the layer. Also calculates the gradients of the loss function with respect to the model parameters.

Parameters:

dCdy (torch.Tensor of the same shape as returned from the forward method) – The gradient given by the next layer.

Returns:

The new gradient after backpropagation through the layer.

Return type:

torch.Tensor of shape (batch_size, channels, …)

forward(input, **kwargs)[source]

Normalises the input to have zero mean and one variance accross the channel dimension with the following equation:

\[y = \gamma\frac{x - \mathbb{E}[x]}{\sqrt{\text{var}(x) + \epsilon}} + \beta,\]

where \(x\) is the input, \(\mathbb{E}[x]\) is the expected value or the mean accross the channel dimension, \(\text{var}(x)\) is the variance accross the variance accross the channel dimension, \(\epsilon\) is a small constant and \(\gamma\) and \(\beta\) are trainable parameters.

Parameters:

input (torch.Tensor of shape (batch_size, channels, ...)) – The input to the layer. Must be a torch.Tensor of the spesified shape given by layer.input_shape.

Returns:

The output tensor after the normalisation with the same shape as the input.

Return type:

torch.Tensor