Python: package sega

sega_learn.trees

Package Contents

adaBoostClassifier
adaBoostRegressor
gradientBoostedClassifier

gradientBoostedRegressor
isolationForest
randomForestClassifier

randomForestRegressor
treeClassifier
treeRegressor

Classes

builtins.object

sega_learn.trees.adaBoostClassifier.AdaBoostClassifier
sega_learn.trees.adaBoostRegressor.AdaBoostRegressor
sega_learn.trees.gradientBoostedClassifier.GradientBoostedClassifier
sega_learn.trees.gradientBoostedRegressor.GradientBoostedRegressor
sega_learn.trees.isolationForest.IsolationForest
sega_learn.trees.isolationForest.IsolationTree
sega_learn.trees.isolationForest.IsolationUtils
sega_learn.trees.randomForestClassifier.RandomForestClassifier
sega_learn.trees.randomForestRegressor.RandomForestRegressor
sega_learn.trees.treeClassifier.ClassifierTree
sega_learn.trees.treeClassifier.ClassifierTreeUtility
sega_learn.trees.treeRegressor.RegressorTree
sega_learn.trees.treeRegressor.RegressorTreeUtility

class AdaBoostClassifier(builtins.object)

AdaBoostClassifier(base_estimator=None, n_estimators=50, learning_rate=1.0, random_state=None, max_depth=3, min_samples_split=2)

AdaBoost classifier.

Builds an additive model by sequentially fitting weak classifiers (default: decision stumps)
on modified versions of the data. Each subsequent classifier focuses more on samples
that were misclassified by the previous ensemble.

Uses the SAMME algorithm which supports multi-class classification.

Attributes:
    base_estimator_ (object): The base estimator template used for fitting.
    n_estimators (int): The maximum number of estimators at which boosting is terminated.
    learning_rate (float): Weight applied to each classifier's contribution.
    estimators_ (list): The collection of fitted base estimators.
    estimator_weights_ (np.ndarray): Weights for each estimator.
    estimator_errors_ (np.ndarray): Classification error for each estimator.
    classes_ (np.ndarray): The class labels.
    n_classes_ (int): The number of classes.

Methods defined here:

__init__(self, base_estimator=None, n_estimators=50, learning_rate=1.0, random_state=None, max_depth=3, min_samples_split=2): Initialize the AdaBoostClassifier.

Args:
    base_estimator (object, optional): The base estimator from which the boosted ensemble is built.
                                      Support for sample weighting is required. If None, then
                                      the base estimator is DecisionTreeClassifier(max_depth=1).
    n_estimators (int, optional): The maximum number of estimators at which boosting is terminated.
                                  In case of perfect fit, the learning procedure is stopped early. Defaults to 50.
    learning_rate (float, optional): Weight applied to each classifier's contribution. Defaults to 1.0.
    random_state (int, optional): Controls the random seed given to the base estimator at each boosting iteration.
                                  Defaults to None.
    max_depth (int, optional): The maximum depth of the base estimator. Defaults to 3.
    min_samples_split (int, optional): The minimum number of samples required to split an internal node
                                       when using the default `ClassifierTree` base estimator. Defaults to 2.

decision_function(self, X): Compute the decision function of X.

fit(self, X, y): Build a boosted classifier from the training set (X, y).

get_stats(self, y_true, X=None, y_pred=None, verbose=False): Calculate and optionally print evaluation metrics. Requires either X or y_pred.

predict(self, X): Predict classes for X.

predict_proba(self, X): Predict class probabilities for X.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class AdaBoostRegressor(builtins.object)

AdaBoostRegressor(base_estimator=None, n_estimators=50, learning_rate=1.0, loss='linear', random_state=None, max_depth=3, min_samples_split=2)

AdaBoost regressor.

Builds an additive model by sequentially fitting weak regressors (default: decision trees)
on modified versions of the data. The weights of instances are adjusted at each iteration
so that subsequent regressors focus more on instances with larger errors.

Uses the AdaBoost.R2 algorithm.

Attributes:
    base_estimator_ (object): The base estimator template used for fitting.
    n_estimators (int): The maximum number of estimators at which boosting is terminated.
    learning_rate (float): Contribution of each regressor to the final prediction.
    loss (str): The loss function to use when updating the weights ('linear', 'square', 'exponential').
    estimators_ (list): The collection of fitted base estimators.
    estimator_weights_ (np.ndarray): Weights for each estimator (alpha values, specifically log(1/beta)).
    estimator_errors_ (np.ndarray): Loss value for each estimator on the weighted training data.

Methods defined here:

__init__(self, base_estimator=None, n_estimators=50, learning_rate=1.0, loss='linear', random_state=None, max_depth=3, min_samples_split=2): Initialize the AdaBoostRegressor.

Args:
    base_estimator (object, optional): The base estimator from which the boosted ensemble is built.
                                      Support for sample weighting is required. If None, then
                                      the base estimator is DecisionTreeRegressor(max_depth=3).
    n_estimators (int, optional): The maximum number of estimators. Defaults to 50.
    learning_rate (float, optional): Shrinks the contribution of each regressor by learning_rate. Defaults to 1.0.
    loss (str, optional): The loss function to use when updating sample weights ('linear', 'square', 'exponential').
                          Defaults to 'linear'.
    random_state (int, optional): Controls the random seed. Defaults to None.
    max_depth (int, optional): Maximum depth of the base estimator. Defaults to 3.
    min_samples_split (int, optional): Minimum number of samples required to split an internal node. Defaults to 2.

fit(self, X, y): Build a boosted regressor from the training set (X, y).

get_stats(self, y_true, X=None, y_pred=None, verbose=False): Calculate and optionally print evaluation metrics. Requires either X or y_pred.

predict(self, X): Predict regression target for X.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class ClassifierTree(builtins.object)

ClassifierTree(max_depth=5, min_samples_split=2)

A class representing a decision tree.

Args:
    max_depth: (int) - The maximum depth of the decision tree.

Methods:
    learn(X, y, par_node={}, depth=0): Builds the decision tree based on the given training data.
    classify(record): Classifies a record using the decision tree.

Methods defined here:

__init__(self, max_depth=5, min_samples_split=2): Initializes the ClassifierTree with a maximum depth.

fit(self, X, y, sample_weight=None): Fits the decision tree to the training data.

Args:
    X: (array-like) - The input features.
    y: (array-like) - The target labels.
    sample_weight: (array-like) - The sample weights (default: None).

learn(self, X, y, par_node=None, depth=0, sample_weight=None): Builds the decision tree based on the given training data.

Args:
    X: (array-like) - The input features.
    y: (array-like) - The target labels.
    par_node: (dict) - The parent node of the current subtree (default: {}).
    depth: (int) - The current depth of the subtree (default: 0).
    sample_weight: (array-like) - The sample weights (default: None).

Returns:
    dict: The learned decision tree.

predict(self, X): Predicts the labels for a given set of records using the decision tree.

Args:
X: (array-like) - The input features.

Returns:
list: A list of predicted labels for each record.

predict_proba(self, X): Predicts the probabilities for a given set of records using the decision tree.

Args:
    X: (array-like) - The input features.

Returns:
    list: A list of dictionaries where each dictionary represents the probability distribution
          over the classes for a record.

Static methods defined here:

classify(tree, record): Classifies a given record using the decision tree.

Args:
    tree: (dict) - The decision tree.
    record: (dict) - A dictionary representing the record to be classified.

Returns:
    The label assigned to the record based on the decision tree.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class ClassifierTreeUtility(builtins.object)

ClassifierTreeUtility(min_samples_split=2)

Utility class for computing entropy, partitioning classes, and calculating information gain.

Methods defined here:

__init__(self, min_samples_split=2): Initialize the utility class.

best_split(self, X, y, sample_weight=None): Finds the best attribute and value to split the data based on information gain.

Args:
    X: (array-like) - The input features.
    y: (array-like) - The target variable.
    sample_weight: (array-like) - The sample weights (default: None).

Returns:
    dict: A dictionary containing the best split attribute, split value, left and right subsets of X and y,
          and the information gain achieved by the split.

entropy(self, class_y, sample_weight=None): Computes the entropy for a given class.

Args:
    class_y: (array-like) - The class labels.
    sample_weight: (array-like) - The sample weights (default: None).

Returns:
    float: The entropy value.

information_gain(self, previous_y, current_y, sample_weight_prev=None, sample_weight_current=None): Calculates the information gain between the previous and current values of y.

Args:
    previous_y: (array-like) - The previous values of y.
    current_y: (array-like) - The current values of y.
    sample_weight_prev: (array-like) - The sample weights for the previous y values (default: None).
    sample_weight_current: (array-like) - The sample weights for the current y values (default: None).

Returns:
    float: The information gain between the previous and current values of y.

partition_classes(self, X, y, split_attribute, split_val, sample_weight=None): Partitions the dataset into two subsets based on a given split attribute and value.

Args:
    X: (array-like) - The input features.
    y: (array-like) - The target labels.
    split_attribute: (int) - The index of the attribute to split on.
    split_val: (float) - The value to split the attribute on.
    sample_weight: (array-like) - The sample weights (default: None).

Returns:
    X_left:  (array-like) - The subset of input features where the split attribute is less than or equal to the split value.
    X_right: (array-like) - The subset of input features where the split attribute is greater than the split value.
    y_left:  (array-like) - The subset of target labels corresponding to X_left.
    y_right: (array-like) - The subset of target labels corresponding to X_right.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class GradientBoostedClassifier(builtins.object)

GradientBoostedClassifier(X=None, y=None, n_estimators: int = 100, learning_rate: float = 0.1, max_depth: int = 3, min_samples_split: int = 2, random_seed: int = None)

A Gradient Boosted Decision Tree Classifier.

This model builds an ensemble of regression trees sequentially. Each tree
is trained to predict the pseudo-residuals (gradients of the loss function)
of the previous model's predictions.

Attributes:
    X (np.ndarray): Training input features of shape (n_samples, n_features).
    y (np.ndarray): Training target class labels of shape (n_samples,).
    n_estimators (int): The number of boosting stages (trees) to perform.
    learning_rate (float): Step size shrinkage to prevent overfitting.
    max_depth (int): Maximum depth of the individual regression tree estimators.
    min_samples_split (int): Minimum number of samples required to split an internal node in a tree.
    random_seed (int or None): Controls the randomness for reproducibility (currently affects feature selection within trees if applicable).
    trees_ (list): List storing the fitted regression tree instances for each boosting stage (and for each class in multiclass).
    classes_ (np.ndarray): The unique class labels found in the target variable `y`.
    n_classes_ (int): The number of unique classes.
    init_estimator_ (float or np.ndarray): The initial prediction model (predicts log-odds).
    loss_ (str): The loss function used ('log_loss' for binary, 'multinomial' for multi-class).

Methods defined here:

__init__(self, X=None, y=None, n_estimators: int = 100, learning_rate: float = 0.1, max_depth: int = 3, min_samples_split: int = 2, random_seed: int = None): Initializes the Gradient Boosted Classifier.

Args:
    X (array-like): Training input features of shape (n_samples, n_features).
    y (array-like): Training target class labels of shape (n_samples,).
    n_estimators (int): Number of boosting stages (trees).
    learning_rate (float): Step size shrinkage to prevent overfitting.
    max_depth (int): Maximum depth of each individual regression tree estimator.
    min_samples_split (int): Minimum samples required to split a node in a tree.
    random_seed (int, optional): Seed for reproducibility. Defaults to None.

calculate_metrics(self, y_true, y_pred, y_prob=None): Calculate common classification metrics.

Args:
    y_true (array-like): True class labels.
    y_pred (array-like): Predicted class labels.
    y_prob (array-like, optional): Predicted probabilities for Log Loss calculation.

Returns:
    dict: A dictionary containing calculated metrics (Accuracy, Precision, Recall, F1 Score, Log Loss if applicable).

decision_function(self, X): Compute the raw decision scores (log-odds) for samples in X.

Args:
    X (array-like): Input features of shape (n_samples, n_features).

Returns:
    np.ndarray: The raw decision scores. Shape (n_samples,) for binary
                or (n_samples, n_classes) for multi-class.

fit(self, X=None, y=None, sample_weight=None, verbose=0): Fits the gradient boosted classifier to the training data.

Args:
    X (array-like): Training input features of shape (n_samples, n_features).
    y (array-like): Training target class labels of shape (n_samples,).
    sample_weight (array-like, optional): Sample weights for the training data.
    verbose (int): Controls the verbosity of the fitting process.
                   0 for no output, 1 for basic output.

Returns:
    self: The fitted GradientBoostedClassifier instance.

get_params(self): Get the parameters of the GradientBoostedClassifier.

get_stats(self, y_true, X=None, y_pred=None, verbose=False): Calculate and optionally print evaluation metrics. Requires either X or y_pred.

Args:
    y_true (array-like): True target values.
    X (array-like, optional): Input features to generate predictions if y_pred is not provided.
    y_pred (array-like, optional): Pre-computed predicted class labels.
    verbose (bool): Whether to print the metrics.

Returns:
    dict: A dictionary containing calculated metrics.

predict(self, X): Predicts class labels for input features X.

Args:
X (array-like): Input features of shape (n_samples, n_features).

Returns:
np.ndarray: Predicted class labels of shape (n_samples,).

predict_proba(self, X): Predict class probabilities for samples in X.

Args:
    X (array-like): Input features of shape (n_samples, n_features).

Returns:
    np.ndarray: Predicted class probabilities. Shape (n_samples, n_classes).
                For binary, columns are [P(class 0), P(class 1)].

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class GradientBoostedRegressor(builtins.object)

GradientBoostedRegressor(X=None, y=None, num_trees: int = 100, max_depth: int = 3, learning_rate: float = 0.1, min_samples_split: int = 2, random_seed: int = None)

A class to represent a Gradient Boosted Decision Tree Regressor.

Attributes:
    random_seed (int): The random seed for the random number generator.
    num_trees (int): The number of decision trees in the ensemble.
    max_depth (int): The maximum depth of each decision tree.
    learning_rate (float): The learning rate for the gradient boosted model.
    min_samples_split (int): The minimum number of samples required to split a node.
    random_seed (int): The random seed for the random number generator.

Methods:
    fit(X=None, y=None, verbose=0): Fits the gradient boosted decision tree regressor to the training data.
    predict(X): Predicts the target values for the input features.
    calculate_metrics(y_true, y_pred): Calculates the evaluation metrics.
    get_stats(y_true, y_pred, verbose=False): Returns the evaluation metrics.

Methods defined here:

__init__(self, X=None, y=None, num_trees: int = 100, max_depth: int = 3, learning_rate: float = 0.1, min_samples_split: int = 2, random_seed: int = None): Initializes the Gradient Boosted Decision Tree Regressor.

Args:
    X: (np.ndarray), optional - Input feature data (default is None).
    y: (np.ndarray), optional - Target data (default is None).
    num_trees (int): Number of boosting stages (trees).
    max_depth (int): Maximum depth of each individual tree regressor.
    learning_rate (float): Step size shrinkage to prevent overfitting.
    min_samples_split (int): Minimum samples required to split a node.
    random_seed (int): Seed for reproducibility (currently affects feature selection within trees).

calculate_metrics(self, y_true, y_pred): Calculate common regression metrics.

Args:
    y_true (array-like): True target values.
    y_pred (array-like): Predicted target values.

Returns:
    dict: A dictionary containing calculated metrics (MSE, R^2, MAE, RMSE, MAPE).

fit(self, X=None, y=None, sample_weight=None, verbose=0): Fits the gradient boosted decision tree regressor to the training data.

This method trains the ensemble of decision trees by iteratively fitting each tree to the residuals
of the previous iteration. The residuals are updated after each iteration by subtracting the predictions
made by the current tree from the :target values.

Args:
    X (array-like): Training input features of shape (n_samples, n_features).
    y (array-like): Training target values of shape (n_samples,).
    sample_weight (array-like): Sample weights for each instance (not used in this implementation).
    verbose (int): Whether to print progress messages (e.g., residuals). 0 for no output, 1 for output, >1 for detailed output

Returns:
    self: The fitted GradientBoostedRegressor instance.

get_params(self): Get the parameters of the GradientBoostedRegressor.

get_stats(self, y_true, y_pred, verbose=False): Calculate and optionally print evaluation metrics.

Args:
    y_true (array-like): True target values.
    y_pred (array-like): Predicted target values.
    verbose (bool): Whether to print progress messages (e.g., residuals).

Returns:
    dict: A dictionary containing calculated metrics (MSE, R^2, MAE, RMSE, MAPE).

predict(self, X): Predicts target values for input features X using the fitted GBR model.

Args:
X (array-like): Input features of shape (n_samples, n_features).

Returns:
np.ndarray: Predicted target values of shape (n_samples,).

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class IsolationForest(builtins.object)

IsolationForest(n_trees=100, max_samples=None, max_depth=10, n_jobs=1, force_true_length=False)

IsolationForest is an implementation of the Isolation Forest algorithm for anomaly detection.

Attributes:
    n_trees (int): The number of isolation trees to build. Default is 100.
    max_samples (int or None): The maximum number of samples to draw for each tree. If None, defaults to the minimum of 256 or the number of samples in the dataset.
    max_depth (int): The maximum depth of each isolation tree. Default is 10.
    n_jobs (int): The number of parallel jobs to run. Set to -1 to use all available cores. Default is 1.
    force_true_length (bool): Whether to force the true path length calculation. Default is False.
    trees (list): A list to store the trained isolation trees.
    classes_ (numpy.ndarray): An array representing the classes (0 for normal, 1 for anomaly).

Methods:
    __init__(n_trees=100, max_samples=None, max_depth=10, n_jobs=1, force_true_length=False):
        Initializes the IsolationForest with the specified parameters.
    fit(X):
        Fits the isolation forest to the data.
            X (array-like): The input features.
    _fit_tree(X):
        Fits a single isolation tree to a subset of the data.
            X (array-like): The input features.
            IsolationTree: A trained isolation tree.
    anomaly_score(X):
        Computes the anomaly scores for given samples.
            X (array-like): The input samples.
            numpy.ndarray: An array of anomaly scores.
    predict(X, threshold=0.5):
        Predicts whether samples are anomalies.
            X (array-like): The input samples.
            threshold (float): The threshold for classifying anomalies (default: 0.5).
            numpy.ndarray: An array of predictions (1 if the sample is an anomaly, 0 otherwise).
    __sklearn_is_fitted__():
        Checks if the model has been fitted.
            bool: True if the model is fitted, False otherwise.

Methods defined here:

__init__(self, n_trees=100, max_samples=None, max_depth=10, n_jobs=1, force_true_length=False): Initializes the IsolationForest with the specified parameters.

Args:
    n_trees: (int), optional - The number of isolation trees to build (default: 100).
    max_samples: (int or None), optional - The maximum number of samples to draw for each tree.
        If None, defaults to the minimum of 256 or the number of samples in the dataset (default: None).
    max_depth: (int), optional - The maximum depth of each isolation tree (default: 10).
    n_jobs: (int), optional - The number of parallel jobs to run.
        Set to -1 to use all available cores (default: 1).
    force_true_length: (bool), optional - Whether to force the true path length calculation (default: False).

Attributes:
    n_trees: (int) - The number of isolation trees.
    max_samples: (int or None) - The maximum number of samples for each tree.
    max_depth: (int) - The maximum depth of the trees.
    force_true_length: (bool) - Indicates whether to use the true path length for scoring.
    trees: (list) - A list to store the trained isolation trees.
    n_jobs: (int) - The number of parallel jobs to run.
    classes_: (np.ndarray) - An array representing the classes (0 for normal, 1 for anomaly).

__sklearn_is_fitted__(self): Checks if the model has been fitted.

anomaly_score(self, X): Computes the anomaly scores for given samples.

Args:
X: (array-like) - The input samples.

Returns:
array: An array of anomaly scores.

fit(self, X, y=None): Fits the isolation forest to the data.

Args:
X: (array-like) - The input features.
y: (array-like) - The target labels (not used in this implementation).

predict(self, X, threshold=0.5): Predicts whether samples are anomalies.

Args:
    X: (array-like) - The input samples.
    threshold: (float) - The threshold for classifying anomalies (default: 0.5).

Returns:
    array: An array of predictions (1 if the sample is an anomaly, 0 otherwise).

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class IsolationTree(builtins.object)

IsolationTree(max_depth=10, force_true_length=False)

IsolationTree is a class that implements an isolation tree, which is a fundamental building block of the Isolation Forest algorithm.

The Isolation Forest is an unsupervised learning method used for anomaly detection.

Attributes:
    max_depth (int): The maximum depth of the tree. Default is 10.
    tree (dict): The learned isolation tree structure.
    force_true_length (bool): If True, the true path length is used for scoring
        instead of the average path length.

Methods:
    __init__(max_depth=10, force_true_length=False):
        Initializes the IsolationTree with the specified maximum depth and
        scoring method.
    fit(X, depth=0):
        Fits the isolation tree to the input data by recursively partitioning
        the data based on randomly selected features and split values.
    path_length(X, tree=None, depth=0):
        Computes the path length for a given sample by traversing the tree
        structure. The path length is used to determine how isolated a sample is.

Methods defined here:

__init__(self, max_depth=10, force_true_length=False): Initializes the Isolation Forest with specified parameters.

Args:
    max_depth: (int), optional - Maximum depth of the tree (default is 10).
    force_true_length: (bool), optional - If True, use the true path length for scoring (default is False).

Attributes:
    max_depth: (int) - Maximum depth of the tree.
    tree: (object or None) - The tree structure used in the Isolation Forest (default is None).
    force_true_length: (bool) - Indicates whether to use the true path length for scoring.

fit(self, X, depth=0): Fits the isolation tree to the data.

Args:
    X: (array-like) - The input features.
    depth: (int) - The current depth of the tree (default: 0).

Returns:
    dict: The learned isolation tree.

path_length(self, X, tree=None, depth=0): Computes the path length for a given sample.

Args:
    X: (array-like) - The input sample.
    tree: (dict) - The current node of the tree (default: None).
    depth: (int) - The current depth of the tree (default: 0).

Returns:
    int: The path length.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class IsolationUtils(builtins.object)

Utility functions for the Isolation Forest algorithm.

Static methods defined here:

compute_avg_path_length(size): Computes the average path length of unsuccessful searches in a binary search tree.

Args:
size: (int) - The size of the tree.

Returns:
average_path_length: (float) - The average path length.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class RandomForestClassifier(builtins.object)

RandomForestClassifier(forest_size=100, max_depth=10, min_samples_split=2, n_jobs=-1, random_seed=None, X=None, y=None)

RandomForestClassifier is a custom implementation of a Random Forest classifier.

Attributes:
    n_estimators (int): The number of trees in the forest.
    max_depth (int): The maximum depth of each tree.
    n_jobs (int): The number of jobs to run in parallel. Defaults to -1 (use all available processors).
    random_state (int or None): The seed for random number generation. Defaults to None.
    trees (list): A list of trained decision trees.
    bootstraps (list): A list of bootstrapped indices for out-of-bag (OOB) scoring.
    X (numpy.ndarray or None): The feature matrix used for training.
    y (numpy.ndarray or None): The target labels used for training.
    accuracy (float): The accuracy of the model after fitting.
    precision (float): The precision of the model after fitting.
    recall (float): The recall of the model after fitting.
    f1_score (float): The F1 score of the model after fitting.
    log_loss (float or None): The log loss of the model after fitting (only for binary classification).

Methods:
    __init__(forest_size=100, max_depth=10, n_jobs=-1, random_seed=None, X=None, y=None):
        Initializes the RandomForestClassifier object with the specified parameters.
    fit(X=None, y=None, verbose=False):
        Fits the random forest model to the provided data using parallel processing.
    calculate_metrics(y_true, y_pred):
        Calculates evaluation metrics (accuracy, precision, recall, F1 score, and log loss) for classification.
    predict(X):
        Predicts class labels for the provided data using the trained random forest.
    get_stats(verbose=False):
        Returns the evaluation metrics (accuracy, precision, recall, F1 score, and log loss) as a dictionary.

Methods defined here:

__init__(self, forest_size=100, max_depth=10, min_samples_split=2, n_jobs=-1, random_seed=None, X=None, y=None): Initializes the RandomForest object.

calculate_metrics(self, y_true, y_pred): Calculate evaluation metrics for classification.

fit(self, X=None, y=None, sample_weight=None, verbose=False): Fit the random forest with parallel processing.

get_params(self): Get the parameters of the RandomForestClassifier.

get_stats(self, verbose=False): Return the evaluation metrics.

predict(self, X): Predict class labels for the provided data.

predict_proba(self, X): Predict class probabilities for the provided data.

Args:
    X (array-like): The input features.

Returns:
    np.ndarray: A 2D array where each row represents the probability distribution
                over the classes for a record.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class RandomForestRegressor(builtins.object)

RandomForestRegressor(forest_size=100, max_depth=10, min_samples_split=2, n_jobs=-1, random_seed=None, X=None, y=None)

A class representing a Random Forest model for regression.

Attributes:
    n_estimators (int): The number of trees in the forest.
    max_depth (int): The maximum depth of each tree.
    min_samples_split (int): The minimum number of samples required to split an internal node.
    n_jobs (int): The number of jobs to run in parallel for fitting.
    random_state (int): Seed for random number generation for reproducibility.
    trees (list): List holding the fitted RegressorTree instances.
    X (numpy.ndarray or None): The feature matrix used for training.
    y (numpy.ndarray or None): The target labels used for training.

Methods:
    fit(X=None, y=None, verbose=False): Fits the random forest to the data.
    calculate_metrics(y_true, y_pred): Calculates the evaluation metrics.
    predict(X): Predicts the target values for the input features.
    get_stats(verbose=False): Returns the evaluation metrics.

Methods defined here:

__init__(self, forest_size=100, max_depth=10, min_samples_split=2, n_jobs=-1, random_seed=None, X=None, y=None): Initialize the Random Forest Regressor.

calculate_metrics(self, y_true, y_pred): Calculate common regression metrics.

Args:
    y_true (array-like): True target values.
    y_pred (array-like): Predicted target values.

Returns:
    dict: A dictionary containing calculated metrics (MSE, R^2, MAE, RMSE, MAPE).

fit(self, X=None, y=None, sample_weight=None, verbose=False): Fit the random forest to the training data X and y.

Args:
    X (array-like): Training input features of shape (n_samples, n_features).
    y (array-like): Training target values of shape (n_samples,).
    sample_weight (array-like): Sample weights for each instance in X.
    verbose (bool): Whether to print progress messages.

Returns:
    self: The fitted RandomForestRegressor instance.

get_params(self): Get the parameters of the Random Forest Regressor.

Returns:
dict: A dictionary containing the parameters of the model.

get_stats(self, y_true, y_pred, verbose=False): Calculate and optionally print evaluation metrics.

Args:
    y_true (array-like): True target values.
    y_pred (array-like): Predicted target values.
    verbose (bool): Whether to print progress messages (e.g., residuals).

Returns:
    dict: A dictionary containing calculated metrics (MSE, R^2, MAE, RMSE, MAPE).

predict(self, X): Predict target values for input features X using the trained random forest.

Args:
X (array-like): Input features of shape (n_samples, n_features).

Returns:
np.ndarray: Predicted target values of shape (n_samples,).

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class RegressorTree(builtins.object)

RegressorTree(max_depth=5, min_samples_split=2)

A class representing a decision tree for regression.

Args:
    max_depth: (int) - The maximum depth of the decision tree.
    min_samples_split: (int) - The minimum number of samples required to split a node.
    n_features: (int) - The number of features in the dataset.
    X: (array-like) - The input features.
    y: (array-like) - The target labels.

Methods:
    fit(X, y, verbose=False): Fits the decision tree to the training data.
    predict(X): Predicts the target values for the input features.
    _traverse_tree(x, node): Traverses the decision tree for a single sample x.
    _leran_recursive(indices, depth): Recursive helper function for learning.

Methods defined here:

__init__(self, max_depth=5, min_samples_split=2): Initialize the decision tree.

Args:
max_depth (int): The maximum depth of the decision tree.
min_samples_split (int): The minimum number of samples required to split a node.

fit(self, X, y, sample_weight=None, verbose=False): Fit the decision tree to the training data.

Args:
    X: (array-like) - The input features.
    y: (array-like) - The target labels.
    sample_weight: (array-like) - The sample weights (default: None).
    verbose: (bool) - If True, print detailed logs during fitting.

Returns:
    dict: The learned decision tree.

predict(self, X): Predict the target value for a record or batch of records using the decision tree.

Args:
X: (array-like) - The input features.

Returns:
np.ndarray: The predicted target values.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class RegressorTreeUtility(builtins.object)

RegressorTreeUtility(X, y, min_samples_split, n_features)

Utility class containing helper functions for building the Regressor Tree.

Handles variance calculation, leaf value calculation, and finding the best split.

Methods defined here:

__init__(self, X, y, min_samples_split, n_features): Initialize the utility class with references to data and parameters.

Args:
    X (np.ndarray): Reference to the feature data.
    y (np.ndarray): Reference to the target data.
    min_samples_split (int): Minimum number of samples required to split a node.
    n_features (int): Total number of features in X.

best_split(self, indices, sample_weight=None): Finds the best split for the data subset defined by indices.

calculate_leaf_value(self, indices, sample_weight=None): Calculate the weighted mean value for a leaf node.

calculate_variance(self, indices, sample_weight=None): Calculate weighted variance for the subset defined by indices.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

Data
		__all__ = ['ClassifierTreeUtility', 'ClassifierTree', 'RegressorTreeUtility', 'RegressorTree', 'RandomForestClassifier', 'RandomForestRegressor', 'GradientBoostedClassifier', 'GradientBoostedRegressor', 'IsolationForest', 'IsolationTree', 'IsolationUtils', 'AdaBoostClassifier', 'AdaBoostRegressor']