| |
- builtins.object
-
- GradientBoostedClassifier
class GradientBoostedClassifier(builtins.object) |
|
GradientBoostedClassifier(X=None, y=None, n_estimators: int = 100, learning_rate: float = 0.1, max_depth: int = 3, min_samples_split: int = 2, random_seed: int = None)
A Gradient Boosted Decision Tree Classifier.
This model builds an ensemble of regression trees sequentially. Each tree
is trained to predict the pseudo-residuals (gradients of the loss function)
of the previous model's predictions.
Attributes:
X (np.ndarray): Training input features of shape (n_samples, n_features).
y (np.ndarray): Training target class labels of shape (n_samples,).
n_estimators (int): The number of boosting stages (trees) to perform.
learning_rate (float): Step size shrinkage to prevent overfitting.
max_depth (int): Maximum depth of the individual regression tree estimators.
min_samples_split (int): Minimum number of samples required to split an internal node in a tree.
random_seed (int or None): Controls the randomness for reproducibility (currently affects feature selection within trees if applicable).
trees_ (list): List storing the fitted regression tree instances for each boosting stage (and for each class in multiclass).
classes_ (np.ndarray): The unique class labels found in the target variable `y`.
n_classes_ (int): The number of unique classes.
init_estimator_ (float or np.ndarray): The initial prediction model (predicts log-odds).
loss_ (str): The loss function used ('log_loss' for binary, 'multinomial' for multi-class). |
|
Methods defined here:
- __init__(self, X=None, y=None, n_estimators: int = 100, learning_rate: float = 0.1, max_depth: int = 3, min_samples_split: int = 2, random_seed: int = None)
- Initializes the Gradient Boosted Classifier.
Args:
X (array-like): Training input features of shape (n_samples, n_features).
y (array-like): Training target class labels of shape (n_samples,).
n_estimators (int): Number of boosting stages (trees).
learning_rate (float): Step size shrinkage to prevent overfitting.
max_depth (int): Maximum depth of each individual regression tree estimator.
min_samples_split (int): Minimum samples required to split a node in a tree.
random_seed (int, optional): Seed for reproducibility. Defaults to None.
- calculate_metrics(self, y_true, y_pred, y_prob=None)
- Calculate common classification metrics.
Args:
y_true (array-like): True class labels.
y_pred (array-like): Predicted class labels.
y_prob (array-like, optional): Predicted probabilities for Log Loss calculation.
Returns:
dict: A dictionary containing calculated metrics (Accuracy, Precision, Recall, F1 Score, Log Loss if applicable).
- decision_function(self, X)
- Compute the raw decision scores (log-odds) for samples in X.
Args:
X (array-like): Input features of shape (n_samples, n_features).
Returns:
np.ndarray: The raw decision scores. Shape (n_samples,) for binary
or (n_samples, n_classes) for multi-class.
- fit(self, X=None, y=None, sample_weight=None, verbose=0)
- Fits the gradient boosted classifier to the training data.
Args:
X (array-like): Training input features of shape (n_samples, n_features).
y (array-like): Training target class labels of shape (n_samples,).
sample_weight (array-like, optional): Sample weights for the training data.
verbose (int): Controls the verbosity of the fitting process.
0 for no output, 1 for basic output.
Returns:
self: The fitted GradientBoostedClassifier instance.
- get_params(self)
- Get the parameters of the GradientBoostedClassifier.
- get_stats(self, y_true, X=None, y_pred=None, verbose=False)
- Calculate and optionally print evaluation metrics. Requires either X or y_pred.
Args:
y_true (array-like): True target values.
X (array-like, optional): Input features to generate predictions if y_pred is not provided.
y_pred (array-like, optional): Pre-computed predicted class labels.
verbose (bool): Whether to print the metrics.
Returns:
dict: A dictionary containing calculated metrics.
- predict(self, X)
- Predicts class labels for input features X.
Args:
X (array-like): Input features of shape (n_samples, n_features).
Returns:
np.ndarray: Predicted class labels of shape (n_samples,).
- predict_proba(self, X)
- Predict class probabilities for samples in X.
Args:
X (array-like): Input features of shape (n_samples, n_features).
Returns:
np.ndarray: Predicted class probabilities. Shape (n_samples, n_classes).
For binary, columns are [P(class 0), P(class 1)].
Data descriptors defined here:
- __dict__
- dictionary for instance variables
- __weakref__
- list of weak references to the object
| |