Python: module sega_learn.trees.treeClassifier

sega_learn.trees.treeClassifier

# Importing the required libraries

Modules

Classes

ClassifierTree
ClassifierTreeUtility

class ClassifierTree(builtins.object)

ClassifierTree(max_depth=5, min_samples_split=2)

A class representing a decision tree.

Args:
    max_depth: (int) - The maximum depth of the decision tree.

Methods:
    learn(X, y, par_node={}, depth=0): Builds the decision tree based on the given training data.
    classify(record): Classifies a record using the decision tree.

Methods defined here:

__init__(self, max_depth=5, min_samples_split=2): Initializes the ClassifierTree with a maximum depth.

fit(self, X, y, sample_weight=None): Fits the decision tree to the training data.

Args:
    X: (array-like) - The input features.
    y: (array-like) - The target labels.
    sample_weight: (array-like) - The sample weights (default: None).

learn(self, X, y, par_node=None, depth=0, sample_weight=None): Builds the decision tree based on the given training data.

Args:
    X: (array-like) - The input features.
    y: (array-like) - The target labels.
    par_node: (dict) - The parent node of the current subtree (default: {}).
    depth: (int) - The current depth of the subtree (default: 0).
    sample_weight: (array-like) - The sample weights (default: None).

Returns:
    dict: The learned decision tree.

predict(self, X): Predicts the labels for a given set of records using the decision tree.

Args:
X: (array-like) - The input features.

Returns:
list: A list of predicted labels for each record.

predict_proba(self, X): Predicts the probabilities for a given set of records using the decision tree.

Args:
    X: (array-like) - The input features.

Returns:
    list: A list of dictionaries where each dictionary represents the probability distribution
          over the classes for a record.

Static methods defined here:

classify(tree, record): Classifies a given record using the decision tree.

Args:
    tree: (dict) - The decision tree.
    record: (dict) - A dictionary representing the record to be classified.

Returns:
    The label assigned to the record based on the decision tree.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class ClassifierTreeUtility(builtins.object)

ClassifierTreeUtility(min_samples_split=2)

Utility class for computing entropy, partitioning classes, and calculating information gain.

Methods defined here:

__init__(self, min_samples_split=2): Initialize the utility class.

best_split(self, X, y, sample_weight=None): Finds the best attribute and value to split the data based on information gain.

Args:
    X: (array-like) - The input features.
    y: (array-like) - The target variable.
    sample_weight: (array-like) - The sample weights (default: None).

Returns:
    dict: A dictionary containing the best split attribute, split value, left and right subsets of X and y,
          and the information gain achieved by the split.

entropy(self, class_y, sample_weight=None): Computes the entropy for a given class.

Args:
    class_y: (array-like) - The class labels.
    sample_weight: (array-like) - The sample weights (default: None).

Returns:
    float: The entropy value.

information_gain(self, previous_y, current_y, sample_weight_prev=None, sample_weight_current=None): Calculates the information gain between the previous and current values of y.

Args:
    previous_y: (array-like) - The previous values of y.
    current_y: (array-like) - The current values of y.
    sample_weight_prev: (array-like) - The sample weights for the previous y values (default: None).
    sample_weight_current: (array-like) - The sample weights for the current y values (default: None).

Returns:
    float: The information gain between the previous and current values of y.

partition_classes(self, X, y, split_attribute, split_val, sample_weight=None): Partitions the dataset into two subsets based on a given split attribute and value.

Args:
    X: (array-like) - The input features.
    y: (array-like) - The target labels.
    split_attribute: (int) - The index of the attribute to split on.
    split_val: (float) - The value to split the attribute on.
    sample_weight: (array-like) - The sample weights (default: None).

Returns:
    X_left:  (array-like) - The subset of input features where the split attribute is less than or equal to the split value.
    X_right: (array-like) - The subset of input features where the split attribute is greater than the split value.
    y_left:  (array-like) - The subset of target labels corresponding to X_left.
    y_right: (array-like) - The subset of target labels corresponding to X_right.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object