Trees
DecisionTree
- class DLL.MachineLearning.SupervisedLearning.Trees.DecisionTree(max_depth=10, min_samples_split=2, criterion='gini', ccp_alpha=0.0)[source]
Bases:
object
DecisionTree implements a classification algorithm splitting the data along features yielding the maximum entropy.
- Parameters:
max_depth (int, optional) – The maximum depth of the tree. Defaults to 10. Must be a positive integer.
min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.
criterion (str, optional) – The information criterion used to select optimal splits. Must be one of “entropy” or “gini”. Defaults to “gini”.
ccp_alpha (non-negative float, optional) – Determines how easily subtrees are pruned in cost-complexity pruning. The larger the value, more subtrees are pruned. Defaults to 0.0.
- n_classes
The number of classes. A positive integer available after calling DecisionTree.fit().
- Type:
int
- classes
The classes in an arbitrary order. Available after calling DecisionTree.fit().
- Type:
torch.Tensor of shape (n_classes,)
- fit(X, y)[source]
Fits the DecisionTree model to the input data by generating a tree, which splits the data appropriately.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.
y (torch.Tensor of shape (n_samples,)) – The labels corresponding to each sample.
- Returns:
None
- Raises:
TypeError – If the input matrix or the label matrix is not a PyTorch tensor.
ValueError – If the input matrix or the label matrix is not the correct shape.
- predict(X)[source]
Applies the fitted DecisionTree model to the input data, predicting the correct classes.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted labels corresponding to each sample.
- Return type:
labels (torch.Tensor of shape (n_samples,))
- Raises:
NotFittedError – If the DecisionTree model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
- predict_proba(X)[source]
Applies the fitted DecisionTree model to the input data, predicting the probabilities of each class.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted probabilities corresponding to each sample.
- Return type:
probabilities (torch.Tensor of shape (n_samples, n_classes))
- Raises:
NotFittedError – If the DecisionTree model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
RegressionTree
- class DLL.MachineLearning.SupervisedLearning.Trees.RegressionTree(max_depth=25, min_samples_split=2, ccp_alpha=0.0)[source]
Bases:
object
RegressionTree implements a regression algorithm splitting the data along features minimizing the variance.
- Parameters:
max_depth (int, optional) – The maximum depth of the tree. Defaults to 25. Must be a positive integer.
min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.
ccp_alpha (non-negative float, optional) – Determines how easily subtrees are pruned in cost-complexity pruning. The larger the value, more subtrees are pruned. Defaults to 0.0.
- fit(X, y)[source]
Fits the RegressionTree model to the input data by generating a tree, which splits the data appropriately.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.
y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.
- Returns:
None
- Raises:
TypeError – If the input matrix or the target matrix is not a PyTorch tensor.
ValueError – If the input matrix or the target matrix is not the correct shape.
- predict(X)[source]
Applies the fitted RegressionTree model to the input data, predicting the correct values.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be regressed.
- Returns:
The predicted values corresponding to each sample.
- Return type:
target values (torch.Tensor of shape (n_samples,))
- Raises:
NotFittedError – If the RegressionTree model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
RandomForestClassifier
- class DLL.MachineLearning.SupervisedLearning.Trees.RandomForestClassifier(n_trees=10, max_depth=10, min_samples_split=2)[source]
Bases:
object
RandomForestClassifier implements a classification algorithm fitting many
DecisionTrees
to bootstrapped data.- Parameters:
n_trees (int, optional) – The number of trees used for predicting. Defaults to 10. Must be a positive integer.
max_depth (int, optional) – The maximum depth of the tree. Defaults to 10. Must be a positive integer.
min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.
- fit(X, y)[source]
Fits the RandomForestClassifier model to the input data by generating trees, which split the data appropriately.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.
y (torch.Tensor of shape (n_samples,)) – The labels corresponding to each sample.
- Returns:
None
- Raises:
TypeError – If the input matrix or the label matrix is not a PyTorch tensor.
ValueError – If the input matrix or the label matrix is not the correct shape.
- predict(X)[source]
Applies the fitted RandomForestClassifier model to the input data, predicting the correct classes.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted labels corresponding to each sample.
- Return type:
labels (torch.Tensor of shape (n_samples,))
- Raises:
NotFittedError – If the RandomForestClassifier model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
- predict_proba(X)[source]
Applies the fitted RandomForestClassifier model to the input data, predicting the probabilities of each class. Is calculated as the average of each individual trees predicted probabilities.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted probabilities corresponding to each sample.
- Return type:
probabilities (torch.Tensor of shape (n_samples, n_classes))
- Raises:
NotFittedError – If the RandomForestClassifier model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
RandomForestRegressor
- class DLL.MachineLearning.SupervisedLearning.Trees.RandomForestRegressor(n_trees=10, max_depth=25, min_samples_split=2)[source]
Bases:
object
RandomForestRegressor implements a regression algorithm fitting many
RegressionTrees
to bootstrapped data.- Parameters:
n_trees (int, optional) – The number of trees used for predictiong. Defaults to 10. Must be a positive integer.
max_depth (int, optional) – The maximum depth of the tree. Defaults to 10. Must be a positive integer.
min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.
- fit(X, y)[source]
Fits the RandomForestRegressor model to the input data by generating trees, which split the data appropriately.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.
y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.
- Returns:
None
- Raises:
TypeError – If the input matrix or the target matrix is not a PyTorch tensor.
ValueError – If the input matrix or the target matrix is not the correct shape.
- predict(X)[source]
Applies the fitted RandomForestRegressor model to the input data, predicting the correct values.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be regressed.
- Returns:
The predicted target values corresponding to each sample.
- Return type:
labels (torch.Tensor of shape (n_samples,))
- Raises:
NotFittedError – If the RandomForestRegressor model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
GradientBoostingClassifier
- class DLL.MachineLearning.SupervisedLearning.Trees.GradientBoostingClassifier(n_trees=10, learning_rate=0.5, max_depth=25, min_samples_split=2, loss='log_loss')[source]
Bases:
object
GradientBoostingClassifier implements a classification algorithm fitting many consecutive
RegressionTrees
to residuals of the model.- Parameters:
n_trees (int, optional) – The number of trees used for predicting. Defaults to 10. Must be a positive integer.
learning_rate (float, optional) – The number multiplied to each additional trees residuals. Must be a real number in range (0, 1). Defaults to 0.5.
max_depth (int, optional) – The maximum depth of the tree. Defaults to 25. Must be a positive integer.
min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.
loss (string, optional) – The loss function used in calculations of the residuals. Must be one of “log_loss” or “exponential”. Defaults to “log_loss”. “exponential” can only be used for binary classification.
- n_features
The number of features. Available after fitting.
- Type:
int
- n_classes
The number of classes. 2 for binary classification. Available after fitting.
- Type:
int
- fit(X, y, metrics=['loss'])[source]
Fits the GradientBoostingClassifier model to the input data by fitting trees to the errors made by previous trees.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.
y (torch.Tensor of shape (n_samples,)) – The labels corresponding to each sample. Every element must be in [0, …, n_classes - 1].
metrics (dict[str, torch.Tensor]) – Contains the metrics that will be calculated between fitting each tree and returned. Only available for binary classification.
- Returns:
metrics if binary classification else None
- Raises:
TypeError – If the input matrix or the label vector is not a PyTorch tensor or if the problem is binary and metrics is not a list or a tuple.
ValueError – If the input matrix or the label vector is not the correct shape or the label vector contains wrong values.
- predict(X)[source]
Applies the fitted GradientBoostingClassifier model to the input data, predicting the correct classes.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted labels corresponding to each sample.
- Return type:
labels (torch.Tensor of shape (n_samples,))
- Raises:
NotFittedError – If the GradientBoostingClassifier model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
- predict_proba(X)[source]
Applies the fitted GradientBoostingClassifier model to the input data, predicting the probabilities of each class.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted probabilities corresponding to each sample.
- Return type:
probabilities (torch.Tensor of shape (n_samples, n_classes) or for binary classification (n_samples,))
- Raises:
NotFittedError – If the GradientBoostingClassifier model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
GradientBoostingRegressor
- class DLL.MachineLearning.SupervisedLearning.Trees.GradientBoostingRegressor(n_trees=50, learning_rate=0.5, max_depth=3, min_samples_split=2, loss='squared', huber_delta=1)[source]
Bases:
object
GradientBoostingRegressor implements a regression algorithm fitting many consecutive
RegressionTrees
to residuals of the model.- Parameters:
n_trees (int, optional) – The number of trees used for predicting. Defaults to 50. Must be a positive integer.
learning_rate (float, optional) – The number multiplied to each additional trees residuals. Must be a real number in range (0, 1). Defaults to 0.5.
max_depth (int, optional) – The maximum depth of the tree. Defaults to 3. Must be a positive integer.
min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.
loss (string, optional) – The loss function used in calculations of the gradients. Must be one of “squared”, “absolute” or “huber”. Defaults to “squared”.
huber_delta (float | int, optional) – The delta parameter for the possibly used huber loss. If loss is not “huber”, this parameter is ignored.
- n_features
The number of features. Available after fitting.
- Type:
int
- fit(X, y, metrics=['loss'])[source]
Fits the GradientBoostingRegressor model to the input data by fitting trees to the errors made by previous trees.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.
y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.
metrics (dict[str, torch.Tensor]) – Contains the metrics that will be calculated between fitting each tree and returned.
- Returns:
The calculated metrics.
- Return type:
metrics (dict[str, torch.Tensor])
- Raises:
TypeError – If the input matrix or the target vector is not a PyTorch tensor.
ValueError – If the input matrix or the target vector is not the correct shape.
- predict(X)[source]
Applies the fitted GradientBoostingRegressor model to the input data, predicting the target values.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted target values corresponding to each sample.
- Return type:
targets (torch.Tensor of shape (n_samples,))
- Raises:
NotFittedError – If the GradientBoostingRegressor model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
AdaBoostClassifier
- class DLL.MachineLearning.SupervisedLearning.Trees.AdaBoostClassifier(n_trees=10, max_depth=25, min_samples_split=2, criterion='gini')[source]
Bases:
object
AdaBoostClassifier implements a classification algorithm fitting many consecutive
DecisionTrees
to previously missclassified samples.- Parameters:
n_trees (int, optional) – The number of trees used for predicting. Defaults to 10. Must be a positive integer.
max_depth (int, optional) – The maximum depth of the tree. Defaults to 25. Must be a positive integer.
min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.
criterion (str, optional) – The information criterion used to select optimal splits. Must be one of “entropy” or “gini”. Defaults to “gini”.
- n_features
The number of features. Available after fitting.
- Type:
int
- n_classes
The number of classes. 2 for binary classification. Available after fitting.
- Type:
int
- confidences
The confidence on each tree.
- Type:
torch.tensor of shape (n_trees,)
- fit(X, y, verbose=True)[source]
Fits the AdaBoostClassifier model to the input data by fitting trees to the errors made by previous trees.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.
y (torch.Tensor of shape (n_samples,)) – The labels corresponding to each sample. Every element must be in [0, …, n_classes - 1].
verbose (bool, optional) – Determines if warnings are given if the training ends due to a weak learner being worse than random guessing. Defaults to True.
- Returns:
The average errors after each tree.
- Raises:
TypeError – If the input matrix or the label vector is not a PyTorch tensor or if the problem is binary and metrics is not a list or a tuple.
ValueError – If the input matrix or the label vector is not the correct shape or the label vector contains wrong values.
- predict(X)[source]
Applies the fitted AdaBoostClassifier model to the input data, predicting the correct classes.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted labels corresponding to each sample.
- Return type:
labels (torch.Tensor of shape (n_samples,))
- Raises:
NotFittedError – If the AdaBoostClassifier model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
- predict_proba(X)[source]
Applies the fitted AdaBoostClassifier model to the input data, predicting the probabilities of each class.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data for which to predict probabilities.
- Returns:
The predicted probabilities for each class.
- Return type:
torch.Tensor of shape (n_samples, n_classes)
- Raises:
NotFittedError – If the AdaBoostClassifier model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
AdaBoostRegressor
- class DLL.MachineLearning.SupervisedLearning.Trees.AdaBoostRegressor(n_trees=10, max_depth=25, min_samples_split=2, loss='square')[source]
Bases:
object
AdaBoostRegressor implements a regression algorithm fitting many consecutive
RegressionTrees
to previously incorrectly predicted samples.- Parameters:
n_trees (int, optional) – The number of trees used for predicting. Defaults to 10. Must be a positive integer.
max_depth (int, optional) – The maximum depth of the tree. Defaults to 25. Must be a positive integer.
min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.
loss (str, optional) – The loss function used. Must be in [“linear”, “square”, “exponential”]. Defaults to “square”.
- n_features
The number of features. Available after fitting.
- Type:
int
- fit(X, y, verbose=True)[source]
Fits the AdaBoostRegressor model to the input data by fitting trees to the errors made by previous trees.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.
y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.
verbose (bool, optional) – Determines if warnings are given if the training ends due to a weak learner having over 0.5 weighted loss. Defaults to True.
- Returns:
The average errors after each tree.
- Raises:
TypeError – If the input matrix or the label vector is not a PyTorch tensor or if the problem is binary and metrics is not a list or a tuple.
ValueError – If the input matrix or the label vector is not the correct shape or the label vector contains wrong values.
- predict(X, method='average')[source]
Applies the fitted AdaBoostRegressor model to the input data, predicting the correct values.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be regressed.
method (str, optional) – The method for computing the prediction. Must be one of “average” or “weighted_median”. Defaults to average.
- Returns:
The predicted values corresponding to each sample.
- Return type:
values (torch.Tensor of shape (n_samples,))
- Raises:
NotFittedError – If the AdaBoostRegressor model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
XGBoostingClassifier
- class DLL.MachineLearning.SupervisedLearning.Trees.XGBoostingClassifier(n_trees=10, learning_rate=0.5, max_depth=25, min_samples_split=2, reg_lambda=1, gamma=0, loss='log_loss')[source]
Bases:
object
XGBoostingClassifier implements a classification algorithm fitting many consecutive trees to gradients and hessians of the predictions.
- Parameters:
n_trees (int, optional) – The number of trees used for predicting. Defaults to 10. Must be a positive integer.
learning_rate (float, optional) – The number multiplied to each additional trees residuals. Must be a real number in range (0, 1). Defaults to 0.5.
max_depth (int, optional) – The maximum depth of the tree. Defaults to 25. Must be a positive integer.
min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.
reg_lambda (float | int, optional) – The regularisation parameter used in fitting the trees. The larger the parameter, the smaller the trees. Must be a positive real number. Defaults to 1.
gamma (float | int, optional) – The minimum gain to make a split. Must be a non-negative real number. Defaults to 0.
loss (string, optional) – The loss function used in calculations of the residuals. Must be one of “log_loss” or “exponential”. Defaults to “log_loss”. “exponential” can only be used for binary classification.
- n_features
The number of features. Available after fitting.
- Type:
int
- n_classes
The number of classes. 2 for binary classification. Available after fitting.
- Type:
int
- fit(X, y, metrics=['loss'])[source]
Fits the XGBoostingClassifier model to the input data by fitting trees to the errors made by previous trees.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.
y (torch.Tensor of shape (n_samples,)) – The labels corresponding to each sample. Every element must be in [0, …, n_classes - 1].
metrics (dict[str, torch.Tensor]) – Contains the metrics that will be calculated between fitting each tree and returned. Only available for binary classification.
- Returns:
metrics if binary classification else None
- Raises:
TypeError – If the input matrix or the label vector is not a PyTorch tensor or if the problem is binary and metrics is not a list or a tuple.
ValueError – If the input matrix or the label vector is not the correct shape or the label vector contains wrong values.
- predict(X)[source]
Applies the fitted XGBoostingClassifier model to the input data, predicting the correct classes.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted labels corresponding to each sample.
- Return type:
labels (torch.Tensor of shape (n_samples,))
- Raises:
NotFittedError – If the XGBoostingClassifier model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
- predict_proba(X)[source]
Applies the fitted XGBoostingClassifier model to the input data, predicting the probabilities of each class.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted probabilities corresponding to each sample.
- Return type:
probabilities (torch.Tensor of shape (n_samples, n_classes) or for binary classification (n_samples,))
- Raises:
NotFittedError – If the XGBoostingClassifier model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
XGBoostingRegressor
- class DLL.MachineLearning.SupervisedLearning.Trees.XGBoostingRegressor(n_trees=50, learning_rate=0.5, max_depth=3, min_samples_split=2, reg_lambda=1, gamma=0, loss='squared', huber_delta=1)[source]
Bases:
object
XGBoostingRegressor implements a regression algorithm fitting many consecutive trees to gradients and hessians of the loss function.
- Parameters:
n_trees (int, optional) – The number of trees used for predicting. Defaults to 50. Must be a positive integer.
learning_rate (float, optional) – The number multiplied to each additional trees residuals. Must be a real number in range (0, 1). Defaults to 0.5.
max_depth (int, optional) – The maximum depth of the tree. Defaults to 3. Must be a positive integer.
min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.
reg_lambda (float | int, optional) – The regularisation parameter used in fitting the trees. The larger the parameter, the smaller the trees. Must be a positive real number. Defaults to 1.
gamma (float | int, optional) – The minimum gain to make a split. Must be a non-negative real number. Defaults to 0.
loss (string, optional) – The loss function used in calculations of the gradients and hessians. Must be one of “squared”, “absolute” or “huber”. Defaults to “squared”.
huber_delta (float | int, optional) – The delta parameter for the possibly used huber loss. If loss is not “huber”, this parameter is ignored.
- n_features
The number of features. Available after fitting.
- Type:
int
- fit(X, y, metrics=['loss'])[source]
Fits the XGBoostingRegressor model to the input data by fitting trees to the errors made by previous trees.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.
y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.
metrics (dict[str, torch.Tensor]) – Contains the metrics that will be calculated between fitting each tree and returned.
- Returns:
The calculated metrics.
- Return type:
metrics (dict[str, torch.Tensor])
- Raises:
TypeError – If the input matrix or the target vector is not a PyTorch tensor.
ValueError – If the input matrix or the target vector is not the correct shape.
- predict(X)[source]
Applies the fitted XGBoostingRegressor model to the input data, predicting the target values.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted target values corresponding to each sample.
- Return type:
targets (torch.Tensor of shape (n_samples,))
- Raises:
NotFittedError – If the XGBoostingRegressor model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
LGBMClassifier
- class DLL.MachineLearning.SupervisedLearning.Trees.LGBMClassifier(n_trees=10, learning_rate=0.5, max_depth=25, min_samples_split=2, n_bins=30, reg_lambda=1, gamma=0, large_error_proportion=0.3, small_error_proportion=0.2, loss='log_loss', max_conflict_rate=0.0, use_efb=True)[source]
Bases:
object
LGBMClassifier implements a classification algorithm fitting many consecutive trees to gradients and hessians of the predictions.
- Parameters:
n_trees (int, optional) – The number of trees used for predicting. Defaults to 10. Must be a positive integer.
learning_rate (float, optional) – The number multiplied to each additional trees residuals. Must be a real number in range (0, 1). Defaults to 0.5.
max_depth (int, optional) – The maximum depth of the tree. Defaults to 25. Must be a positive integer.
min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.
n_bins (int, optional) – The number of bins used to find the optimal split of data. Must be greater than 1. Defaults to 30.
reg_lambda (float | int, optional) – The regularisation parameter used in fitting the trees. The larger the parameter, the smaller the trees. Must be a positive real number. Defaults to 1.
gamma (float | int, optional) – The minimum gain to make a split. Must be a non-negative real number. Defaults to 0.
large_error_proportion (float, optional) – The proportion of the whole data with the largest error, which is always used to train the next weak learner. Defaults to 0.3.
small_error_proportion (float, optional) – The proportion of data randomly selected from the remaining (1 - large_error_proportion) percent of data to train the next weak learner. Defaults to 0.2.
loss (string, optional) – The loss function used in calculations of the residuals. Must be one of “log_loss” or “exponential”. Defaults to “log_loss”. “exponential” can only be used for binary classification.
max_conflict_rate (float, optional) – The proportion of samples, which are allowed to be nonzero without featuers being bundled. Is ignored if use_efb=False. Defaults to 0.0.
use_efb (bool, optional) – Determines if the exclusive feature bundling algorithm is used. Defaults to True.
- n_features
The number of features. Available after fitting.
- Type:
int
- n_classes
The number of classes. 2 for binary classification. Available after fitting.
- Type:
int
- fit(X, y, metrics=['loss'])[source]
Fits the LGBMClassifier model to the input data by fitting trees to the errors made by previous trees.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.
y (torch.Tensor of shape (n_samples,)) – The labels corresponding to each sample. Every element must be in [0, …, n_classes - 1].
metrics (dict[str, torch.Tensor]) – Contains the metrics that will be calculated between fitting each tree and returned. Only available for binary classification.
- Returns:
metrics if binary classification else None
- Raises:
TypeError – If the input matrix or the label vector is not a PyTorch tensor or if the problem is binary and metrics is not a list or a tuple.
ValueError – If the input matrix or the label vector is not the correct shape or the label vector contains wrong values.
- predict(X)[source]
Applies the fitted LGBMClassifier model to the input data, predicting the correct classes.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted labels corresponding to each sample.
- Return type:
labels (torch.Tensor of shape (n_samples,))
- Raises:
NotFittedError – If the LGBMClassifier model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
- predict_proba(X)[source]
Applies the fitted LGBMClassifier model to the input data, predicting the probabilities of each class.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted probabilities corresponding to each sample.
- Return type:
probabilities (torch.Tensor of shape (n_samples, n_classes) or for binary classification (n_samples,))
- Raises:
NotFittedError – If the LGBMClassifier model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.
LGBMRegressor
- class DLL.MachineLearning.SupervisedLearning.Trees.LGBMRegressor(n_trees=50, learning_rate=0.5, max_depth=3, min_samples_split=2, n_bins=30, reg_lambda=1, gamma=0, large_error_proportion=0.3, small_error_proportion=0.2, loss='squared', huber_delta=1, max_conflict_rate=0.0, use_efb=True)[source]
Bases:
object
LGBMRegressor implements a regression algorithm fitting many consecutive trees to gradients and hessians of the loss function. The algorithm is based on this paper.
- Parameters:
n_trees (int, optional) – The number of trees used for predicting. Defaults to 50. Must be a positive integer.
learning_rate (float, optional) – The number multiplied to each additional trees residuals. Must be a real number in range (0, 1). Defaults to 0.5.
max_depth (int, optional) – The maximum depth of the tree. Defaults to 3. Must be a positive integer.
min_samples_split (int, optional) – The minimum required samples in a leaf to make a split. Defaults to 2. Must be a positive integer.
n_bins (int, optional) – The number of bins used to find the optimal split of data. Must be greater than 1. Defaults to 30.
reg_lambda (float | int, optional) – The regularisation parameter used in fitting the trees. The larger the parameter, the smaller the trees. Must be a positive real number. Defaults to 1.
gamma (float | int, optional) – The minimum gain to make a split. Must be a non-negative real number. Defaults to 0.
large_error_proportion (float, optional) – The proportion of the whole data with the largest error, which is always used to train the next weak learner. Defaults to 0.3.
small_error_proportion (float, optional) – The proportion of data randomly selected from the remaining (1 - large_error_proportion) percent of data to train the next weak learner. Defaults to 0.2.
loss (string, optional) – The loss function used in calculations of the gradients and hessians. Must be one of “squared”, “absolute” or “huber”. Defaults to “squared”.
huber_delta (float | int, optional) – The delta parameter for the possibly used huber loss. If loss is not “huber”, this parameter is ignored.
max_conflict_rate (float, optional) – The proportion of samples, which are allowed to be nonzero without featuers being bundled. Is ignored if use_efb=False. Defaults to 0.0.
use_efb (bool, optional) – Determines if the exclusive feature bundling algorithm is used. Defaults to True.
- n_features
The number of features. Available after fitting.
- Type:
int
- fit(X, y, metrics=['loss'])[source]
Fits the LGBMRegressor model to the input data by fitting trees to the errors made by previous trees.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data, where each row is a sample and each column is a feature.
y (torch.Tensor of shape (n_samples,)) – The target values corresponding to each sample.
metrics (dict[str, torch.Tensor]) – Contains the metrics that will be calculated between fitting each tree and returned.
- Returns:
The calculated metrics.
- Return type:
metrics (dict[str, torch.Tensor])
- Raises:
TypeError – If the input matrix or the target vector is not a PyTorch tensor.
ValueError – If the input matrix or the target vector is not the correct shape.
- predict(X)[source]
Applies the fitted LGBMRegressor model to the input data, predicting the target values.
- Parameters:
X (torch.Tensor of shape (n_samples, n_features)) – The input data to be classified.
- Returns:
The predicted target values corresponding to each sample.
- Return type:
targets (torch.Tensor of shape (n_samples,))
- Raises:
NotFittedError – If the LGBMRegressor model has not been fitted before predicting.
TypeError – If the input matrix is not a PyTorch tensor.
ValueError – If the input matrix is not the correct shape.