Python: module sega_learn.trees.gradientBoostedClassifier

sega_learn.trees.gradientBoostedClassifier

# sega_learn/trees/gradientBoostedClassifier.py

Modules

Classes

GradientBoostedClassifier

class GradientBoostedClassifier(builtins.object)

GradientBoostedClassifier(X=None, y=None, n_estimators: int = 100, learning_rate: float = 0.1, max_depth: int = 3, min_samples_split: int = 2, random_seed: int = None)

A Gradient Boosted Decision Tree Classifier.

This model builds an ensemble of regression trees sequentially. Each tree
is trained to predict the pseudo-residuals (gradients of the loss function)
of the previous model's predictions.

Attributes:
    X (np.ndarray): Training input features of shape (n_samples, n_features).
    y (np.ndarray): Training target class labels of shape (n_samples,).
    n_estimators (int): The number of boosting stages (trees) to perform.
    learning_rate (float): Step size shrinkage to prevent overfitting.
    max_depth (int): Maximum depth of the individual regression tree estimators.
    min_samples_split (int): Minimum number of samples required to split an internal node in a tree.
    random_seed (int or None): Controls the randomness for reproducibility (currently affects feature selection within trees if applicable).
    trees_ (list): List storing the fitted regression tree instances for each boosting stage (and for each class in multiclass).
    classes_ (np.ndarray): The unique class labels found in the target variable `y`.
    n_classes_ (int): The number of unique classes.
    init_estimator_ (float or np.ndarray): The initial prediction model (predicts log-odds).
    loss_ (str): The loss function used ('log_loss' for binary, 'multinomial' for multi-class).

Methods defined here:

__init__(self, X=None, y=None, n_estimators: int = 100, learning_rate: float = 0.1, max_depth: int = 3, min_samples_split: int = 2, random_seed: int = None): Initializes the Gradient Boosted Classifier.

Args:
    X (array-like): Training input features of shape (n_samples, n_features).
    y (array-like): Training target class labels of shape (n_samples,).
    n_estimators (int): Number of boosting stages (trees).
    learning_rate (float): Step size shrinkage to prevent overfitting.
    max_depth (int): Maximum depth of each individual regression tree estimator.
    min_samples_split (int): Minimum samples required to split a node in a tree.
    random_seed (int, optional): Seed for reproducibility. Defaults to None.

calculate_metrics(self, y_true, y_pred, y_prob=None): Calculate common classification metrics.

Args:
    y_true (array-like): True class labels.
    y_pred (array-like): Predicted class labels.
    y_prob (array-like, optional): Predicted probabilities for Log Loss calculation.

Returns:
    dict: A dictionary containing calculated metrics (Accuracy, Precision, Recall, F1 Score, Log Loss if applicable).

decision_function(self, X): Compute the raw decision scores (log-odds) for samples in X.

Args:
    X (array-like): Input features of shape (n_samples, n_features).

Returns:
    np.ndarray: The raw decision scores. Shape (n_samples,) for binary
                or (n_samples, n_classes) for multi-class.

fit(self, X=None, y=None, sample_weight=None, verbose=0): Fits the gradient boosted classifier to the training data.

Args:
    X (array-like): Training input features of shape (n_samples, n_features).
    y (array-like): Training target class labels of shape (n_samples,).
    sample_weight (array-like, optional): Sample weights for the training data.
    verbose (int): Controls the verbosity of the fitting process.
                   0 for no output, 1 for basic output.

Returns:
    self: The fitted GradientBoostedClassifier instance.

get_params(self): Get the parameters of the GradientBoostedClassifier.

get_stats(self, y_true, X=None, y_pred=None, verbose=False): Calculate and optionally print evaluation metrics. Requires either X or y_pred.

Args:
    y_true (array-like): True target values.
    X (array-like, optional): Input features to generate predictions if y_pred is not provided.
    y_pred (array-like, optional): Pre-computed predicted class labels.
    verbose (bool): Whether to print the metrics.

Returns:
    dict: A dictionary containing calculated metrics.

predict(self, X): Predicts class labels for input features X.

Args:
X (array-like): Input features of shape (n_samples, n_features).

Returns:
np.ndarray: Predicted class labels of shape (n_samples,).

predict_proba(self, X): Predict class probabilities for samples in X.

Args:
    X (array-like): Input features of shape (n_samples, n_features).

Returns:
    np.ndarray: Predicted class probabilities. Shape (n_samples, n_classes).
                For binary, columns are [P(class 0), P(class 1)].

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

Data
		sigmoid = <ufunc 'expit'>