Python: module sega_learn.trees.randomForestClassifier

sega_learn.trees.randomForestClassifier

This module contains the implementation of a Random Forest Classifier.

The module includes the following classes:
- RandomForest: A class representing a Random Forest model.
- RandomForestWithInfoGain: A class representing a Random Forest model that returns information gain for vis.
- runRandomForest: A class that runs the Random Forest algorithm.

Modules

multiprocessing

numpy

Classes

builtins.object

RandomForestClassifier

class RandomForestClassifier(builtins.object)

RandomForestClassifier(forest_size=100, max_depth=10, min_samples_split=2, n_jobs=-1, random_seed=None, X=None, y=None)

RandomForestClassifier is a custom implementation of a Random Forest classifier.

Attributes:
    n_estimators (int): The number of trees in the forest.
    max_depth (int): The maximum depth of each tree.
    n_jobs (int): The number of jobs to run in parallel. Defaults to -1 (use all available processors).
    random_state (int or None): The seed for random number generation. Defaults to None.
    trees (list): A list of trained decision trees.
    bootstraps (list): A list of bootstrapped indices for out-of-bag (OOB) scoring.
    X (numpy.ndarray or None): The feature matrix used for training.
    y (numpy.ndarray or None): The target labels used for training.
    accuracy (float): The accuracy of the model after fitting.
    precision (float): The precision of the model after fitting.
    recall (float): The recall of the model after fitting.
    f1_score (float): The F1 score of the model after fitting.
    log_loss (float or None): The log loss of the model after fitting (only for binary classification).

Methods:
    __init__(forest_size=100, max_depth=10, n_jobs=-1, random_seed=None, X=None, y=None):
        Initializes the RandomForestClassifier object with the specified parameters.
    fit(X=None, y=None, verbose=False):
        Fits the random forest model to the provided data using parallel processing.
    calculate_metrics(y_true, y_pred):
        Calculates evaluation metrics (accuracy, precision, recall, F1 score, and log loss) for classification.
    predict(X):
        Predicts class labels for the provided data using the trained random forest.
    get_stats(verbose=False):
        Returns the evaluation metrics (accuracy, precision, recall, F1 score, and log loss) as a dictionary.

Methods defined here:

__init__(self, forest_size=100, max_depth=10, min_samples_split=2, n_jobs=-1, random_seed=None, X=None, y=None): Initializes the RandomForest object.

calculate_metrics(self, y_true, y_pred): Calculate evaluation metrics for classification.

fit(self, X=None, y=None, sample_weight=None, verbose=False): Fit the random forest with parallel processing.

get_params(self): Get the parameters of the RandomForestClassifier.

get_stats(self, verbose=False): Return the evaluation metrics.

predict(self, X): Predict class labels for the provided data.

predict_proba(self, X): Predict class probabilities for the provided data.

Args:
    X (array-like): The input features.

Returns:
    np.ndarray: A 2D array where each row represents the probability distribution
                over the classes for a record.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object