Preprocessing
- class DLL.Data.Preprocessing.CategoricalEncoder[source]
Bases:
object
The categorical encoder.
- decode(data)[source]
Decodes the data to the original classes. CategoricalEncoder.fit() must be called before decoding.
- Parameters:
data (torch.Tensor of shape (n_samples,)) – the predicted labels of samples.
- Returns:
A decoded predictions transformed to the original classes.
- Return type:
torch.Tensor of shape (n_samples,)
- encode(data)[source]
Encodes the data to values [0, …, n_classes - 1]. CategoricalEncoder.fit() must be called before encoding.
- Parameters:
data (torch.Tensor of shape (n_samples,)) – the true labels of samples.
- Returns:
An encoded tensor.
- Return type:
torch.Tensor of shape (n_samples,)
- class DLL.Data.Preprocessing.MinMaxScaler[source]
Bases:
object
The min-max scaler.
- fit(data)[source]
Finds the minimum and the maximum of the data.
- Parameters:
data (torch.Tensor) – the input samples.
- fit_transform(data)[source]
First fits the scaler and then transforms the data.
- Parameters:
data (torch.Tensor) – the input samples.
- Returns:
the transformed data.
- Return type:
torch.Tensor
- class DLL.Data.Preprocessing.OneHotEncoder[source]
Bases:
object
The one-hot encoder.
- decode(data)[source]
One-hot encodes the data. OneHotEncoder.fit() must be called before decoding.
- Parameters:
data (torch.Tensor of shape (n_samples, n_classes_1 + ... + n_classes_n_features)) – the predictions of samples.
- Returns:
A decoded predictions transformed to the original classes.
- Return type:
torch.Tensor of shape (n_samples,) or (n_samples, n_features)
- encode(data)[source]
One-hot encodes the data. OneHotEncoder.fit() must be called before encoding.
- Parameters:
data (torch.Tensor of shape (n_samples,) or (n_samples, n_features)) – the true labels of samples.
- Returns:
A one-hot encoded tensor.
- Return type:
torch.Tensor of shape (n_samples, n_classes_1 + … + n_classes_n_features)
- fit(data)[source]
Finds the classes in the data.
- Parameters:
data (torch.Tensor of shape (n_samples,) or (n_samples, n_features)) – the true labels of samples.
- fit_encode(data)[source]
First fits the encoder and then one-hot encodes the data.
- Parameters:
data (torch.Tensor of shape (n_samples,) or (n_samples, n_features)) – the true labels of samples.
- Returns:
A one-hot encoded tensor.
- Return type:
torch.Tensor of shape (n_samples, n_classes_1 + … + n_classes_n_features)
- class DLL.Data.Preprocessing.PolynomialFeatures(degree=2, include_bias=True)[source]
Bases:
object
Polynomial features.
- Parameters:
degree (int, optional) – The degree of the polynomial. Must be a positive integer. Defaults to 2.
include_bias (bool) – If true, a column of ones is included. Must be a boolean. Defaults to True.
- transform(data)[source]
Creates a matrix of data containing every possible combination of the given set of features.
- Parameters:
data (torch.Tensor of shape (n_samples, n_features)) – the input samples.
- Returns:
A tensor of the new features.
- Return type:
torch.Tensor of shape (n_samples, sum([nCr(n_features + deg - 1, deg) for deg in range(1, degree + 1)]) + 1)
- class DLL.Data.Preprocessing.StandardScaler[source]
Bases:
object
The standard scaler.
- fit(data)[source]
Finds the mean and the variance of the data.
- Parameters:
data (torch.Tensor) – the input samples.
- fit_transform(data)[source]
First fits the scaler and then encodes the data.
- Parameters:
data (torch.Tensor) – the input samples.
- Returns:
the transformed data.
- Return type:
torch.Tensor
- DLL.Data.Preprocessing.data_split(X, Y, train_split=0.8, validation_split=0.2)[source]
Splits the data into train, validation and test sets.
- Parameters:
X (torch.Tensor of shape (n_samples, ...)) – The input values.
Y (torch.Tensor of shape (n_samples, ...)) – The target values.
train_split (float, optional) – The precentage of train data of the whole data. Must be a real number in range (0, 1]. Defaults to 0.8.
validation_split (float, optional) – The precentage of validation data of the whole data. Must be a real number in range [0, 1). Defaults to 0.2.
- Returns:
The original data shuffled and split according to train and validation splits.
- Return type:
x_train, y_train, x_val, y_val, x_test, y_test (tuple[torch.Tensor])
Note
The sum of train_split and validation_split must be less than or equal to 1. The remaining samples are returned as the test data.