Linear models

LinearRegression

class DLL.MachineLearning.SupervisedLearning.LinearModels.LinearRegression[source]

Bases: object

Implements the basic linear regression model.

n_features

The number of features. Available after fitting.

Type:

int

beta

The weights of the linear regression model. Available after fitting.

Type:

torch.Tensor of shape (n_features + 1,)

residuals

The residuals of the fitted model. For a good fit, the residuals should be normally distributed with zero mean and constant variance. Available after fitting.

Type:

torch.Tensor of shape (n_samples,)

fit(X, y, include_bias=True, method='ols', sample_weight=None)[source]

Fits the LinearRegression model to the input data by minimizing the squared error.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.

  • include_bias (bool, optional) – Decides if a bias is included in a model. Defaults to True.

  • method (str, optional) – Determines if the loss function is ordinary least squares or total least squares. Must be one of “ols” or “tls”. Defaults to “ols”.

  • sample_weight (torch.Tensor of shape (n_samples,) or None) – A weight given to each sample in the regression. If None, this parameter is ignored. sample_weight is ignored if method == “tls”.

Returns:

None

Raises:
  • TypeError – If the input matrix or the target matrix is not a PyTorch tensor.

  • ValueError – If the input matrix or the target matrix is not the correct shape.

predict(X)[source]

Applies the fitted LinearRegression model to the input data, predicting the correct values.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be regressed.

Returns:

The predicted values corresponding to each sample.

Return type:

target values (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the LinearRegression model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

LogisticRegression

class DLL.MachineLearning.SupervisedLearning.LinearModels.LogisticRegression(learning_rate=0.001)[source]

Bases: object

Implements a logistic regression model for binary and multi-class classification.

Parameters:

learning_rate (float, optional) – The step size towards the negative gradient. Must be a positive real number. Defaults to 0.01.

n_features

The number of features. Available after fitting.

Type:

int

weights

The weights of the logistic regression model. Available after fitting.

Type:

torch.Tensor of shape (n_features,)

bias

The constant of the model. Available after fitting.

Type:

torch.Tensor of shape (1,)

fit(X, y, sample_weight=None, val_data=None, epochs=100, optimiser=None, callback_frequency=1, metrics=['loss'], batch_size=None, shuffle_every_epoch=True, shuffle_data=True, verbose=False)[source]

Fits the LogisticRegression model to the input data by minimizing the cross entropy loss (logistic loss).

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The labels corresponding to each sample. Every element must be in [0, …, n_classes - 1].

  • val_data (tuple[X_val, y_val] | None, optional) – Optional validation samples. If None, no validation data is used. Defaults to None.

  • epochs (int, optional) – The number of training iterations. Must be a positive integer. Defaults to 100.

  • optimiser (Optimisers | None, optional) – The optimiser used for training the model. If None, the Adam optimiser is used.

  • callback_frequency (int, optional) – The number of iterations between printing info from training. Must be a positive integer. Defaults to 1, which means that every iteration, info is printed assuming verbose=True.

  • metrics (list[str], optional) – The metrics that will be tracked during training. Defaults to [“loss”].

  • batch_size (int | None, optional) – The batch size used in training. Must be a positive integer. If None, every sample is used for every gradient calculation. Defaults to None.

  • shuffle_every_epoch (bool, optional) – If True, shuffles the order of the samples every epoch. Defaults to True.

  • shuffle_data (bool, optional) – If True, shuffles data before the training.

  • verbose (bool, optional) – If True, prints info of the chosen metrics during training. Defaults to False.

Returns:

A dictionary tracking the evolution of selected metrics at intervals defined by callback_frequency.

Return type:

history (dict[str, torch.Tensor], each tensor is floor(epochs / callback_frequency) long.)

Raises:
  • TypeError – If the input matrix or the target matrix is not a PyTorch tensor or if other parameters are of wrong type.

  • ValueError – If the input matrix or the target matrix is not the correct shape or if other parameters have incorrect values.

predict(X)[source]

Applies the fitted LogisticRegression model to the input data, predicting the labels.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted values corresponding to each sample.

Return type:

target values (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the LogisticRegression model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

predict_proba(X)[source]

Applies the fitted LogisticRegression model to the input data, predicting the probabilities of each class.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted values corresponding to each sample.

Return type:

target values (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the LogisticRegression model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

LassoRegression

class DLL.MachineLearning.SupervisedLearning.LinearModels.LASSORegression(alpha=1.0)[source]

Bases: object

Implements a linear regression model with L1 regularization.

Parameters:

alpha (int | float, optional) – The regularization parameter. Larger alpha will force the l1 norm of the weights to be lower. Must be a positive real number. Defaults to 1.

n_features

The number of features. Available after fitting.

Type:

int

weights

The weights of the linear regression model. Available after fitting.

Type:

torch.Tensor of shape (n_features + 1,)

residuals

The residuals of the fitted model. For a good fit, the residuals should be normally distributed with zero mean and constant variance. Available after fitting.

Type:

torch.Tensor of shape (n_samples,)

fit(X, y, sample_weight=None, val_data=None, epochs=100, callback_frequency=1, metrics=['loss'], verbose=False)[source]

Fits the LASSORegression model to the input data by minimizing the mean squared error loss function using cyclic coordinate-wise descent from this paper.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.

  • val_data (tuple[X_val, y_val] | None, optional) – Optional validation samples. If None, no validation data is used. Defaults to None.

  • epochs (int, optional) – The number of training iterations. Must be a positive integer. Defaults to 100.

  • callback_frequency (int, optional) – The number of iterations between printing info from training. Must be a positive integer. Defaults to 1, which means that every iteration, info is printed assuming verbose=True.

  • metrics (list[str], optional) – The metrics that will be tracked during training. Defaults to [“loss”].

  • verbose (bool, optional) – If True, prints info of the chosen metrics during training. Defaults to False.

Returns:

A dictionary tracking the evolution of selected metrics at intervals defined by callback_frequency.

Return type:

history (dict[str, torch.Tensor], each tensor is floor(epochs / callback_frequency) long.)

Raises:
  • TypeError – If the input matrix or the target matrix is not a PyTorch tensor or if other parameters are of wrong type.

  • ValueError – If the input matrix or the target matrix is not the correct shape or if other parameters have incorrect values.

predict(X)[source]

Applies the fitted LASSORegression model to the input data, predicting the correct values.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be regressed.

Returns:

The predicted values corresponding to each sample.

Return type:

target values (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the LASSORegression model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

RidgeRegression

class DLL.MachineLearning.SupervisedLearning.LinearModels.RidgeRegression(alpha=1.0)[source]

Bases: object

Implements a linear regression model with L2 regularization.

Parameters:

alpha (int | float, optional) – The regularization parameter. Larger alpha will force the l2 norm of the weights to be lower. Must be a positive real number. Defaults to 1.

n_features

The number of features. Available after fitting.

Type:

int

beta

The weights of the linear regression model. Available after fitting.

Type:

torch.Tensor of shape (n_features + 1,)

residuals

The residuals of the fitted model. For a good fit, the residuals should be normally distributed with zero mean and constant variance. Available after fitting.

Type:

torch.Tensor of shape (n_samples,)

fit(X, y, sample_weight=None)[source]

Fits the RidgeRegression model to the input data by minimizing the squared error.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.

  • sample_weight (torch.Tensor of shape (n_samples,) or None) – A weight given to each sample in the regression. If None, this parameter is ignored.

Returns:

None

Raises:
  • TypeError – If the input matrix or the target matrix is not a PyTorch tensor.

  • ValueError – If the input matrix or the target matrix is not the correct shape.

predict(X)[source]

Applies the fitted RidgeRegression model to the input data, predicting the correct values.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be regressed.

Returns:

The predicted values corresponding to each sample.

Return type:

target values (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the RidgeRegression model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

Elasticnet

class DLL.MachineLearning.SupervisedLearning.LinearModels.ElasticNetRegression(alpha=1.0, l1_ratio=0.5)[source]

Bases: object

Implements a linear regression model with L1 and L2 regularization.

Parameters:
  • alpha (int | float, optional) – The regularization parameter. Larger alpha will force the l1 and l2 norms of the weights to be lower. Must be a non-negative real number. Defaults to 1.

  • l1_ratio (int | float, optional) – The proportion of l1 regularisation compared to l2 regularisation. Must be in the range [0, 1]. Defaults to 0.5.

  • loss (Losses, optional) – A loss function used for training the model. Defaults to the mean squared error.

n_features

The number of features. Available after fitting.

Type:

int

weights

The weights of the linear regression model. The first element is the bias of the model. Available after fitting.

Type:

torch.Tensor of shape (n_features + 1,)

residuals

The residuals of the fitted model. For a good fit, the residuals should be normally distributed with zero mean and constant variance. Available after fitting.

Type:

torch.Tensor of shape (n_samples,)

fit(X, y, sample_weight=None, val_data=None, epochs=100, callback_frequency=1, metrics=['loss'], verbose=False)[source]

Fits the ElasticNetRegression model to the input data by minimizing the mean squared error loss function using cyclic coordinate-wise descent from this paper.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.

  • val_data (tuple[X_val, y_val] | None, optional) – Optional validation samples. If None, no validation data is used. Defaults to None.

  • epochs (int, optional) – The number of training iterations. Must be a positive integer. Defaults to 100.

  • callback_frequency (int, optional) – The number of iterations between printing info from training. Must be a positive integer. Defaults to 1, which means that every iteration, info is printed assuming verbose=True.

  • metrics (list[str], optional) – The metrics that will be tracked during training. Defaults to [“loss”].

  • verbose (bool, optional) – If True, prints info of the chosen metrics during training. Defaults to False.

Returns:

A dictionary tracking the evolution of selected metrics at intervals defined by callback_frequency.

Return type:

history (dict[str, torch.Tensor], each tensor is floor(epochs / callback_frequency) long.)

Raises:
  • TypeError – If the input matrix or the target matrix is not a PyTorch tensor or if other parameters are of wrong type.

  • ValueError – If the input matrix or the target matrix is not the correct shape or if other parameters have incorrect values.

predict(X)[source]

Applies the fitted ElasticNetRegression model to the input data, predicting the correct values.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be regressed.

Returns:

The predicted values corresponding to each sample.

Return type:

target values (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the ElasticNetRegression model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

Random sample consensus

class DLL.MachineLearning.SupervisedLearning.LinearModels.RANSACRegression(estimator=<DLL.MachineLearning.SupervisedLearning.LinearModels._LinearRegression.LinearRegression object>)[source]

Bases: object

Implements the random sample consensus (RANSAC) regression model.

Parameters:

estimator (A regression model with fit and predict methods) – A base model which is fit to random samples of the data. Defaults to LinearRegression.

best_estimator

The best model. Available after fitting.

Type:

estimator

fit(X, y, min_samples=None, residual_threshold=None, max_trials=100, stop_inliers_prob=1, **kwargs)[source]

Samples random subsamples of the datapoints and fits base estimators to the subsamples.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.

  • min_samples (int | float | None, optional) – The number of samples used to fit the base estimators. If float, ceil(n_samples * min_samples) is used and if None, n_features + 1 is used. Defaults to None

  • residual_threshold (int | float | None, optional) – The threshold for which larger absolute errors are considered outliers. If None, the median absolute deviation of y is used. Defaults to None.

  • max_trials (int, optional) – The number of tries to sample the data. Must be a positive integer. Defaults to 100.

  • stop_inliers_prob (int | float, optional) – If the proportion of inliers on an iteration exceeds this value, the random sampling is stopped early. Defaults to 1, i.e. the process is never stopped early as the max(n_inliers / number_of_samples_in_subsample) == 1.

  • kwargs – Other parameters are passed to estimator.fit()

predict(X, **kwargs)[source]

Predicts the values of the samples using the best estimator determined in the fit method.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • kwargs – Other parameters are passed to estimator.predict()

Time series

SARIMA

class DLL.MachineLearning.SupervisedLearning.LinearModels.TimeSeries.SARIMA(series, order, seasonal_order)[source]

Bases: object

The Seasonal auto regressive moving average model for time series analysis.

Parameters:
  • series (torch.Tensor of shape (n_samples,)) – The time series for fitting. Must be one dimensional.

  • order (tuple of ints) – The orders of the non-seasonal parts. Follows the format (p, d, q).

  • seasonal_order (tuple of ints) – The orders of the seasonal parts. Follows the format (P, D, Q, S). If a seasonal component is not needed, the seasonal order should be put as (0, 0, 0, 1).

fit()[source]

Fits the ARMA model to the given time series. Currently, the function fits two linear regression models separately for the AR and MA components.

Note

This approach is suboptimal for the MA component, as it should be fitted using Kalman filters for correctness.

predict(steps=1, fit_between_steps=False)[source]

Predicts the next values of the given time series.

Parameters:
  • steps (int, optional) – The number of next values to predict. Must be a positive integer. Defaults to 1.

  • fit_between_steps (bool, optional) – Determines if the model should be refitted between each prediction. Defaults to False.

Returns:

The predicted values as a one-dimensional torch Tensor.

Return type:

torch.Tensor