Trees

DecisionTree

class DLL.MachineLearning.SupervisedLearning.Trees.DecisionTree(max_depth=10, min_samples_split=2, criterion='gini', ccp_alpha=0.0)[source]

Bases: object

DecisionTree implements a classification algorithm splitting the data along features yielding the maximum entropy.

Parameters:
  • max_depth (int, optional) – The maximum depth of the tree. Defaults to 10. Must be a positive integer.

  • min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.

  • criterion (str, optional) – The information criterion used to select optimal splits. Must be one of “entropy” or “gini”. Defaults to “gini”.

  • ccp_alpha (non-negative float, optional) – Determines how easily subtrees are pruned in cost-complexity pruning. The larger the value, more subtrees are pruned. Defaults to 0.0.

n_classes

The number of classes. A positive integer available after calling DecisionTree.fit().

Type:

int

classes

The classes in an arbitrary order. Available after calling DecisionTree.fit().

Type:

torch.Tensor of shape (n_classes,)

fit(X, y)[source]

Fits the DecisionTree model to the input data by generating a tree, which splits the data appropriately.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The labels corresponding to each sample.

Returns:

None

Raises:
  • TypeError – If the input matrix or the label matrix is not a PyTorch tensor.

  • ValueError – If the input matrix or the label matrix is not the correct shape.

predict(X)[source]

Applies the fitted DecisionTree model to the input data, predicting the correct classes.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted labels corresponding to each sample.

Return type:

labels (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the DecisionTree model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

predict_proba(X)[source]

Applies the fitted DecisionTree model to the input data, predicting the probabilities of each class.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted probabilities corresponding to each sample.

Return type:

probabilities (torch.Tensor of shape (n_samples, n_classes))

Raises:
  • NotFittedError – If the DecisionTree model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

RegressionTree

class DLL.MachineLearning.SupervisedLearning.Trees.RegressionTree(max_depth=25, min_samples_split=2, ccp_alpha=0.0)[source]

Bases: object

RegressionTree implements a regression algorithm splitting the data along features minimizing the variance.

Parameters:
  • max_depth (int, optional) – The maximum depth of the tree. Defaults to 25. Must be a positive integer.

  • min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.

  • ccp_alpha (non-negative float, optional) – Determines how easily subtrees are pruned in cost-complexity pruning. The larger the value, more subtrees are pruned. Defaults to 0.0.

fit(X, y)[source]

Fits the RegressionTree model to the input data by generating a tree, which splits the data appropriately.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.

Returns:

None

Raises:
  • TypeError – If the input matrix or the target matrix is not a PyTorch tensor.

  • ValueError – If the input matrix or the target matrix is not the correct shape.

predict(X)[source]

Applies the fitted RegressionTree model to the input data, predicting the correct values.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be regressed.

Returns:

The predicted values corresponding to each sample.

Return type:

target values (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the RegressionTree model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

RandomForestClassifier

class DLL.MachineLearning.SupervisedLearning.Trees.RandomForestClassifier(n_trees=10, max_depth=10, min_samples_split=2)[source]

Bases: object

RandomForestClassifier implements a classification algorithm fitting many DecisionTrees to bootstrapped data.

Parameters:
  • n_trees (int, optional) – The number of trees used for predicting. Defaults to 10. Must be a positive integer.

  • max_depth (int, optional) – The maximum depth of the tree. Defaults to 10. Must be a positive integer.

  • min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.

fit(X, y)[source]

Fits the RandomForestClassifier model to the input data by generating trees, which split the data appropriately.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The labels corresponding to each sample.

Returns:

None

Raises:
  • TypeError – If the input matrix or the label matrix is not a PyTorch tensor.

  • ValueError – If the input matrix or the label matrix is not the correct shape.

predict(X)[source]

Applies the fitted RandomForestClassifier model to the input data, predicting the correct classes.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted labels corresponding to each sample.

Return type:

labels (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the RandomForestClassifier model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

predict_proba(X)[source]

Applies the fitted RandomForestClassifier model to the input data, predicting the probabilities of each class. Is calculated as the average of each individual trees predicted probabilities.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted probabilities corresponding to each sample.

Return type:

probabilities (torch.Tensor of shape (n_samples, n_classes))

Raises:
  • NotFittedError – If the RandomForestClassifier model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

RandomForestRegressor

class DLL.MachineLearning.SupervisedLearning.Trees.RandomForestRegressor(n_trees=10, max_depth=25, min_samples_split=2)[source]

Bases: object

RandomForestRegressor implements a regression algorithm fitting many RegressionTrees to bootstrapped data.

Parameters:
  • n_trees (int, optional) – The number of trees used for predictiong. Defaults to 10. Must be a positive integer.

  • max_depth (int, optional) – The maximum depth of the tree. Defaults to 10. Must be a positive integer.

  • min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.

fit(X, y)[source]

Fits the RandomForestRegressor model to the input data by generating trees, which split the data appropriately.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.

Returns:

None

Raises:
  • TypeError – If the input matrix or the target matrix is not a PyTorch tensor.

  • ValueError – If the input matrix or the target matrix is not the correct shape.

predict(X)[source]

Applies the fitted RandomForestRegressor model to the input data, predicting the correct values.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be regressed.

Returns:

The predicted target values corresponding to each sample.

Return type:

labels (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the RandomForestRegressor model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

GradientBoostingClassifier

class DLL.MachineLearning.SupervisedLearning.Trees.GradientBoostingClassifier(n_trees=10, learning_rate=0.5, max_depth=25, min_samples_split=2, loss='log_loss')[source]

Bases: object

GradientBoostingClassifier implements a classification algorithm fitting many consecutive RegressionTrees to residuals of the model.

Parameters:
  • n_trees (int, optional) – The number of trees used for predicting. Defaults to 10. Must be a positive integer.

  • learning_rate (float, optional) – The number multiplied to each additional trees residuals. Must be a real number in range (0, 1). Defaults to 0.5.

  • max_depth (int, optional) – The maximum depth of the tree. Defaults to 25. Must be a positive integer.

  • min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.

  • loss (string, optional) – The loss function used in calculations of the residuals. Must be one of “log_loss” or “exponential”. Defaults to “log_loss”. “exponential” can only be used for binary classification.

n_features

The number of features. Available after fitting.

Type:

int

n_classes

The number of classes. 2 for binary classification. Available after fitting.

Type:

int

fit(X, y, metrics=['loss'])[source]

Fits the GradientBoostingClassifier model to the input data by fitting trees to the errors made by previous trees.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The labels corresponding to each sample. Every element must be in [0, …, n_classes - 1].

  • metrics (dict[str, torch.Tensor]) – Contains the metrics that will be calculated between fitting each tree and returned. Only available for binary classification.

Returns:

metrics if binary classification else None

Raises:
  • TypeError – If the input matrix or the label vector is not a PyTorch tensor or if the problem is binary and metrics is not a list or a tuple.

  • ValueError – If the input matrix or the label vector is not the correct shape or the label vector contains wrong values.

predict(X)[source]

Applies the fitted GradientBoostingClassifier model to the input data, predicting the correct classes.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted labels corresponding to each sample.

Return type:

labels (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the GradientBoostingClassifier model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

predict_proba(X)[source]

Applies the fitted GradientBoostingClassifier model to the input data, predicting the probabilities of each class.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted probabilities corresponding to each sample.

Return type:

probabilities (torch.Tensor of shape (n_samples, n_classes) or for binary classification (n_samples,))

Raises:
  • NotFittedError – If the GradientBoostingClassifier model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

GradientBoostingRegressor

class DLL.MachineLearning.SupervisedLearning.Trees.GradientBoostingRegressor(n_trees=50, learning_rate=0.5, max_depth=3, min_samples_split=2, loss='squared', huber_delta=1)[source]

Bases: object

GradientBoostingRegressor implements a regression algorithm fitting many consecutive RegressionTrees to residuals of the model.

Parameters:
  • n_trees (int, optional) – The number of trees used for predicting. Defaults to 50. Must be a positive integer.

  • learning_rate (float, optional) – The number multiplied to each additional trees residuals. Must be a real number in range (0, 1). Defaults to 0.5.

  • max_depth (int, optional) – The maximum depth of the tree. Defaults to 3. Must be a positive integer.

  • min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.

  • loss (string, optional) – The loss function used in calculations of the gradients. Must be one of “squared”, “absolute” or “huber”. Defaults to “squared”.

  • huber_delta (float | int, optional) – The delta parameter for the possibly used huber loss. If loss is not “huber”, this parameter is ignored.

n_features

The number of features. Available after fitting.

Type:

int

fit(X, y, metrics=['loss'])[source]

Fits the GradientBoostingRegressor model to the input data by fitting trees to the errors made by previous trees.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.

  • metrics (dict[str, torch.Tensor]) – Contains the metrics that will be calculated between fitting each tree and returned.

Returns:

The calculated metrics.

Return type:

metrics (dict[str, torch.Tensor])

Raises:
  • TypeError – If the input matrix or the target vector is not a PyTorch tensor.

  • ValueError – If the input matrix or the target vector is not the correct shape.

predict(X)[source]

Applies the fitted GradientBoostingRegressor model to the input data, predicting the target values.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted target values corresponding to each sample.

Return type:

targets (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the GradientBoostingRegressor model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

AdaBoostClassifier

class DLL.MachineLearning.SupervisedLearning.Trees.AdaBoostClassifier(n_trees=10, max_depth=25, min_samples_split=2, criterion='gini')[source]

Bases: object

AdaBoostClassifier implements a classification algorithm fitting many consecutive DecisionTrees to previously missclassified samples.

Parameters:
  • n_trees (int, optional) – The number of trees used for predicting. Defaults to 10. Must be a positive integer.

  • max_depth (int, optional) – The maximum depth of the tree. Defaults to 25. Must be a positive integer.

  • min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.

  • criterion (str, optional) – The information criterion used to select optimal splits. Must be one of “entropy” or “gini”. Defaults to “gini”.

n_features

The number of features. Available after fitting.

Type:

int

n_classes

The number of classes. 2 for binary classification. Available after fitting.

Type:

int

confidences

The confidence on each tree.

Type:

torch.tensor of shape (n_trees,)

fit(X, y, verbose=True)[source]

Fits the AdaBoostClassifier model to the input data by fitting trees to the errors made by previous trees.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The labels corresponding to each sample. Every element must be in [0, …, n_classes - 1].

  • verbose (bool, optional) – Determines if warnings are given if the training ends due to a weak learner being worse than random guessing. Defaults to True.

Returns:

The average errors after each tree.

Raises:
  • TypeError – If the input matrix or the label vector is not a PyTorch tensor or if the problem is binary and metrics is not a list or a tuple.

  • ValueError – If the input matrix or the label vector is not the correct shape or the label vector contains wrong values.

predict(X)[source]

Applies the fitted AdaBoostClassifier model to the input data, predicting the correct classes.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted labels corresponding to each sample.

Return type:

labels (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the AdaBoostClassifier model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

predict_proba(X)[source]

Applies the fitted AdaBoostClassifier model to the input data, predicting the probabilities of each class.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data for which to predict probabilities.

Returns:

The predicted probabilities for each class.

Return type:

torch.Tensor of shape (n_samples, n_classes)

Raises:
  • NotFittedError – If the AdaBoostClassifier model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

AdaBoostRegressor

class DLL.MachineLearning.SupervisedLearning.Trees.AdaBoostRegressor(n_trees=10, max_depth=25, min_samples_split=2, loss='square')[source]

Bases: object

AdaBoostRegressor implements a regression algorithm fitting many consecutive RegressionTrees to previously incorrectly predicted samples.

Parameters:
  • n_trees (int, optional) – The number of trees used for predicting. Defaults to 10. Must be a positive integer.

  • max_depth (int, optional) – The maximum depth of the tree. Defaults to 25. Must be a positive integer.

  • min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.

  • loss (str, optional) – The loss function used. Must be in [“linear”, “square”, “exponential”]. Defaults to “square”.

n_features

The number of features. Available after fitting.

Type:

int

fit(X, y, verbose=True)[source]

Fits the AdaBoostRegressor model to the input data by fitting trees to the errors made by previous trees.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.

  • verbose (bool, optional) – Determines if warnings are given if the training ends due to a weak learner having over 0.5 weighted loss. Defaults to True.

Returns:

The average errors after each tree.

Raises:
  • TypeError – If the input matrix or the label vector is not a PyTorch tensor or if the problem is binary and metrics is not a list or a tuple.

  • ValueError – If the input matrix or the label vector is not the correct shape or the label vector contains wrong values.

predict(X, method='average')[source]

Applies the fitted AdaBoostRegressor model to the input data, predicting the correct values.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data to be regressed.

  • method (str, optional) – The method for computing the prediction. Must be one of “average” or “weighted_median”. Defaults to average.

Returns:

The predicted values corresponding to each sample.

Return type:

values (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the AdaBoostRegressor model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

XGBoostingClassifier

class DLL.MachineLearning.SupervisedLearning.Trees.XGBoostingClassifier(n_trees=10, learning_rate=0.5, max_depth=25, min_samples_split=2, reg_lambda=1, gamma=0, loss='log_loss')[source]

Bases: object

XGBoostingClassifier implements a classification algorithm fitting many consecutive trees to gradients and hessians of the predictions.

Parameters:
  • n_trees (int, optional) – The number of trees used for predicting. Defaults to 10. Must be a positive integer.

  • learning_rate (float, optional) – The number multiplied to each additional trees residuals. Must be a real number in range (0, 1). Defaults to 0.5.

  • max_depth (int, optional) – The maximum depth of the tree. Defaults to 25. Must be a positive integer.

  • min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.

  • reg_lambda (float | int, optional) – The regularisation parameter used in fitting the trees. The larger the parameter, the smaller the trees. Must be a positive real number. Defaults to 1.

  • gamma (float | int, optional) – The minimum gain to make a split. Must be a non-negative real number. Defaults to 0.

  • loss (string, optional) – The loss function used in calculations of the residuals. Must be one of “log_loss” or “exponential”. Defaults to “log_loss”. “exponential” can only be used for binary classification.

n_features

The number of features. Available after fitting.

Type:

int

n_classes

The number of classes. 2 for binary classification. Available after fitting.

Type:

int

fit(X, y, metrics=['loss'])[source]

Fits the XGBoostingClassifier model to the input data by fitting trees to the errors made by previous trees.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The labels corresponding to each sample. Every element must be in [0, …, n_classes - 1].

  • metrics (dict[str, torch.Tensor]) – Contains the metrics that will be calculated between fitting each tree and returned. Only available for binary classification.

Returns:

metrics if binary classification else None

Raises:
  • TypeError – If the input matrix or the label vector is not a PyTorch tensor or if the problem is binary and metrics is not a list or a tuple.

  • ValueError – If the input matrix or the label vector is not the correct shape or the label vector contains wrong values.

predict(X)[source]

Applies the fitted XGBoostingClassifier model to the input data, predicting the correct classes.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted labels corresponding to each sample.

Return type:

labels (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the XGBoostingClassifier model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

predict_proba(X)[source]

Applies the fitted XGBoostingClassifier model to the input data, predicting the probabilities of each class.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted probabilities corresponding to each sample.

Return type:

probabilities (torch.Tensor of shape (n_samples, n_classes) or for binary classification (n_samples,))

Raises:
  • NotFittedError – If the XGBoostingClassifier model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

XGBoostingRegressor

class DLL.MachineLearning.SupervisedLearning.Trees.XGBoostingRegressor(n_trees=50, learning_rate=0.5, max_depth=3, min_samples_split=2, reg_lambda=1, gamma=0, loss='squared', huber_delta=1)[source]

Bases: object

XGBoostingRegressor implements a regression algorithm fitting many consecutive trees to gradients and hessians of the loss function.

Parameters:
  • n_trees (int, optional) – The number of trees used for predicting. Defaults to 50. Must be a positive integer.

  • learning_rate (float, optional) – The number multiplied to each additional trees residuals. Must be a real number in range (0, 1). Defaults to 0.5.

  • max_depth (int, optional) – The maximum depth of the tree. Defaults to 3. Must be a positive integer.

  • min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.

  • reg_lambda (float | int, optional) – The regularisation parameter used in fitting the trees. The larger the parameter, the smaller the trees. Must be a positive real number. Defaults to 1.

  • gamma (float | int, optional) – The minimum gain to make a split. Must be a non-negative real number. Defaults to 0.

  • loss (string, optional) – The loss function used in calculations of the gradients and hessians. Must be one of “squared”, “absolute” or “huber”. Defaults to “squared”.

  • huber_delta (float | int, optional) – The delta parameter for the possibly used huber loss. If loss is not “huber”, this parameter is ignored.

n_features

The number of features. Available after fitting.

Type:

int

fit(X, y, metrics=['loss'])[source]

Fits the XGBoostingRegressor model to the input data by fitting trees to the errors made by previous trees.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.

  • metrics (dict[str, torch.Tensor]) – Contains the metrics that will be calculated between fitting each tree and returned.

Returns:

The calculated metrics.

Return type:

metrics (dict[str, torch.Tensor])

Raises:
  • TypeError – If the input matrix or the target vector is not a PyTorch tensor.

  • ValueError – If the input matrix or the target vector is not the correct shape.

predict(X)[source]

Applies the fitted XGBoostingRegressor model to the input data, predicting the target values.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted target values corresponding to each sample.

Return type:

targets (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the XGBoostingRegressor model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

LGBMClassifier

class DLL.MachineLearning.SupervisedLearning.Trees.LGBMClassifier(n_trees=10, learning_rate=0.5, max_depth=25, min_samples_split=2, n_bins=30, reg_lambda=1, gamma=0, large_error_proportion=0.3, small_error_proportion=0.2, loss='log_loss', max_conflict_rate=0.0, use_efb=True)[source]

Bases: object

LGBMClassifier implements a classification algorithm fitting many consecutive trees to gradients and hessians of the predictions.

Parameters:
  • n_trees (int, optional) – The number of trees used for predicting. Defaults to 10. Must be a positive integer.

  • learning_rate (float, optional) – The number multiplied to each additional trees residuals. Must be a real number in range (0, 1). Defaults to 0.5.

  • max_depth (int, optional) – The maximum depth of the tree. Defaults to 25. Must be a positive integer.

  • min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.

  • n_bins (int, optional) – The number of bins used to find the optimal split of data. Must be greater than 1. Defaults to 30.

  • reg_lambda (float | int, optional) – The regularisation parameter used in fitting the trees. The larger the parameter, the smaller the trees. Must be a positive real number. Defaults to 1.

  • gamma (float | int, optional) – The minimum gain to make a split. Must be a non-negative real number. Defaults to 0.

  • large_error_proportion (float, optional) – The proportion of the whole data with the largest error, which is always used to train the next weak learner. Defaults to 0.3.

  • small_error_proportion (float, optional) – The proportion of data randomly selected from the remaining (1 - large_error_proportion) percent of data to train the next weak learner. Defaults to 0.2.

  • loss (string, optional) – The loss function used in calculations of the residuals. Must be one of “log_loss” or “exponential”. Defaults to “log_loss”. “exponential” can only be used for binary classification.

  • max_conflict_rate (float, optional) – The proportion of samples, which are allowed to be nonzero without featuers being bundled. Is ignored if use_efb=False. Defaults to 0.0.

  • use_efb (bool, optional) – Determines if the exclusive feature bundling algorithm is used. Defaults to True.

n_features

The number of features. Available after fitting.

Type:

int

n_classes

The number of classes. 2 for binary classification. Available after fitting.

Type:

int

fit(X, y, metrics=['loss'])[source]

Fits the LGBMClassifier model to the input data by fitting trees to the errors made by previous trees.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The labels corresponding to each sample. Every element must be in [0, …, n_classes - 1].

  • metrics (dict[str, torch.Tensor]) – Contains the metrics that will be calculated between fitting each tree and returned. Only available for binary classification.

Returns:

metrics if binary classification else None

Raises:
  • TypeError – If the input matrix or the label vector is not a PyTorch tensor or if the problem is binary and metrics is not a list or a tuple.

  • ValueError – If the input matrix or the label vector is not the correct shape or the label vector contains wrong values.

predict(X)[source]

Applies the fitted LGBMClassifier model to the input data, predicting the correct classes.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted labels corresponding to each sample.

Return type:

labels (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the LGBMClassifier model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

predict_proba(X)[source]

Applies the fitted LGBMClassifier model to the input data, predicting the probabilities of each class.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted probabilities corresponding to each sample.

Return type:

probabilities (torch.Tensor of shape (n_samples, n_classes) or for binary classification (n_samples,))

Raises:
  • NotFittedError – If the LGBMClassifier model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.

LGBMRegressor

class DLL.MachineLearning.SupervisedLearning.Trees.LGBMRegressor(n_trees=50, learning_rate=0.5, max_depth=3, min_samples_split=2, n_bins=30, reg_lambda=1, gamma=0, large_error_proportion=0.3, small_error_proportion=0.2, loss='squared', huber_delta=1, max_conflict_rate=0.0, use_efb=True)[source]

Bases: object

LGBMRegressor implements a regression algorithm fitting many consecutive trees to gradients and hessians of the loss function. The algorithm is based on this paper.

Parameters:
  • n_trees (int, optional) – The number of trees used for predicting. Defaults to 50. Must be a positive integer.

  • learning_rate (float, optional) – The number multiplied to each additional trees residuals. Must be a real number in range (0, 1). Defaults to 0.5.

  • max_depth (int, optional) – The maximum depth of the tree. Defaults to 3. Must be a positive integer.

  • min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.

  • n_bins (int, optional) – The number of bins used to find the optimal split of data. Must be greater than 1. Defaults to 30.

  • reg_lambda (float | int, optional) – The regularisation parameter used in fitting the trees. The larger the parameter, the smaller the trees. Must be a positive real number. Defaults to 1.

  • gamma (float | int, optional) – The minimum gain to make a split. Must be a non-negative real number. Defaults to 0.

  • large_error_proportion (float, optional) – The proportion of the whole data with the largest error, which is always used to train the next weak learner. Defaults to 0.3.

  • small_error_proportion (float, optional) – The proportion of data randomly selected from the remaining (1 - large_error_proportion) percent of data to train the next weak learner. Defaults to 0.2.

  • loss (string, optional) – The loss function used in calculations of the gradients and hessians. Must be one of “squared”, “absolute” or “huber”. Defaults to “squared”.

  • huber_delta (float | int, optional) – The delta parameter for the possibly used huber loss. If loss is not “huber”, this parameter is ignored.

  • max_conflict_rate (float, optional) – The proportion of samples, which are allowed to be nonzero without featuers being bundled. Is ignored if use_efb=False. Defaults to 0.0.

  • use_efb (bool, optional) – Determines if the exclusive feature bundling algorithm is used. Defaults to True.

n_features

The number of features. Available after fitting.

Type:

int

fit(X, y, metrics=['loss'])[source]

Fits the LGBMRegressor model to the input data by fitting trees to the errors made by previous trees.

Parameters:
  • X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.

  • y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.

  • metrics (dict[str, torch.Tensor]) – Contains the metrics that will be calculated between fitting each tree and returned.

Returns:

The calculated metrics.

Return type:

metrics (dict[str, torch.Tensor])

Raises:
  • TypeError – If the input matrix or the target vector is not a PyTorch tensor.

  • ValueError – If the input matrix or the target vector is not the correct shape.

predict(X)[source]

Applies the fitted LGBMRegressor model to the input data, predicting the target values.

Parameters:

X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.

Returns:

The predicted target values corresponding to each sample.

Return type:

targets (torch.Tensor of shape (n_samples,))

Raises:
  • NotFittedError – If the LGBMRegressor model has not been fitted before predicting.

  • TypeError – If the input matrix is not a PyTorch tensor.

  • ValueError – If the input matrix is not the correct shape.