Preprocessing

class DLL.Data.Preprocessing.CategoricalEncoder[source]

Bases: object

The categorical encoder.

decode(data)[source]

Decodes the data to the original classes. CategoricalEncoder.fit() must be called before decoding.

Parameters:

data (torch.Tensor of shape (n_samples,)) – the predicted labels of samples.

Returns:

A decoded predictions transformed to the original classes.

Return type:

torch.Tensor of shape (n_samples,)

encode(data)[source]

Encodes the data to values [0, …, n_classes - 1]. CategoricalEncoder.fit() must be called before encoding.

Parameters:

data (torch.Tensor of shape (n_samples,)) – the true labels of samples.

Returns:

An encoded tensor.

Return type:

torch.Tensor of shape (n_samples,)

fit(data)[source]

Finds the classes in the data.

Parameters:

data (torch.Tensor of shape (n_samples,)) – the true labels of samples.

fit_encode(data)[source]

First fits the encoder and then encodes the data.

Parameters:

data (torch.Tensor of shape (n_samples,)) – the true labels of samples.

Returns:

An encoded tensor.

Return type:

torch.Tensor of shape (n_samples,)

class DLL.Data.Preprocessing.MinMaxScaler[source]

Bases: object

The min-max scaler.

fit(data)[source]

Finds the minimum and the maximum of the data.

Parameters:

data (torch.Tensor) – the input samples.

fit_transform(data)[source]

First fits the scaler and then transforms the data.

Parameters:

data (torch.Tensor) – the input samples.

Returns:

the transformed data.

Return type:

torch.Tensor

inverse_transform(data)[source]

Scales the data back to it’s original space.

Parameters:

data (torch.Tensor) – the input samples.

Returns:

the transformed data.

Return type:

torch.Tensor

transform(data)[source]

Normalises the data between 0 and 1.

Parameters:

data (torch.Tensor) – the input samples.

Returns:

the transformed data.

Return type:

torch.Tensor

class DLL.Data.Preprocessing.OneHotEncoder[source]

Bases: object

The one-hot encoder.

decode(data)[source]

One-hot encodes the data. OneHotEncoder.fit() must be called before decoding.

Parameters:

data (torch.Tensor of shape (n_samples, n_classes_1 + ... + n_classes_n_features)) – the predictions of samples.

Returns:

A decoded predictions transformed to the original classes.

Return type:

torch.Tensor of shape (n_samples,) or (n_samples, n_features)

encode(data)[source]

One-hot encodes the data. OneHotEncoder.fit() must be called before encoding.

Parameters:

data (torch.Tensor of shape (n_samples,) or (n_samples, n_features)) – the true labels of samples.

Returns:

A one-hot encoded tensor.

Return type:

torch.Tensor of shape (n_samples, n_classes_1 + … + n_classes_n_features)

fit(data)[source]

Finds the classes in the data.

Parameters:

data (torch.Tensor of shape (n_samples,) or (n_samples, n_features)) – the true labels of samples.

fit_encode(data)[source]

First fits the encoder and then one-hot encodes the data.

Parameters:

data (torch.Tensor of shape (n_samples,) or (n_samples, n_features)) – the true labels of samples.

Returns:

A one-hot encoded tensor.

Return type:

torch.Tensor of shape (n_samples, n_classes_1 + … + n_classes_n_features)

class DLL.Data.Preprocessing.PolynomialFeatures(degree=2, include_bias=True)[source]

Bases: object

Polynomial features.

Parameters:
  • degree (int, optional) – The degree of the polynomial. Must be a positive integer. Defaults to 2.

  • include_bias (bool) – If true, a column of ones is included. Must be a boolean. Defaults to True.

transform(data)[source]

Creates a matrix of data containing every possible combination of the given set of features.

Parameters:

data (torch.Tensor of shape (n_samples, n_features)) – the input samples.

Returns:

A tensor of the new features.

Return type:

torch.Tensor of shape (n_samples, sum([nCr(n_features + deg - 1, deg) for deg in range(1, degree + 1)]) + 1)

class DLL.Data.Preprocessing.StandardScaler[source]

Bases: object

The standard scaler.

fit(data)[source]

Finds the mean and the variance of the data.

Parameters:

data (torch.Tensor) – the input samples.

fit_transform(data)[source]

First fits the scaler and then encodes the data.

Parameters:

data (torch.Tensor) – the input samples.

Returns:

the transformed data.

Return type:

torch.Tensor

inverse_transform(data)[source]

Scales the data back to it’s original space.

Parameters:

data (torch.Tensor) – the input samples.

Returns:

the transformed data.

Return type:

torch.Tensor

transform(data)[source]

Transforms the data to zero mean and one variance.

Parameters:

data (torch.Tensor) – the input samples.

Returns:

the transformed data.

Return type:

torch.Tensor

DLL.Data.Preprocessing.data_split(X, Y, train_split=0.8, validation_split=0.2)[source]

Splits the data into train, validation and test sets.

Parameters:
  • X (torch.Tensor of shape (n_samples, ...)) – The input values.

  • Y (torch.Tensor of shape (n_samples, ...)) – The target values.

  • train_split (float, optional) – The precentage of train data of the whole data. Must be a real number in range (0, 1]. Defaults to 0.8.

  • validation_split (float, optional) – The precentage of validation data of the whole data. Must be a real number in range [0, 1). Defaults to 0.2.

Returns:

The original data shuffled and split according to train and validation splits.

Return type:

x_train, y_train, x_val, y_val, x_test, y_test (tuple[torch.Tensor])

Note

The sum of train_split and validation_split must be less than or equal to 1. The remaining samples are returned as the test data.