| |
- builtins.object
-
- ClassifierTree
- ClassifierTreeUtility
class ClassifierTree(builtins.object) |
|
ClassifierTree(max_depth=5, min_samples_split=2)
A class representing a decision tree.
Args:
max_depth: (int) - The maximum depth of the decision tree.
Methods:
learn(X, y, par_node={}, depth=0): Builds the decision tree based on the given training data.
classify(record): Classifies a record using the decision tree. |
|
Methods defined here:
- __init__(self, max_depth=5, min_samples_split=2)
- Initializes the ClassifierTree with a maximum depth.
- fit(self, X, y, sample_weight=None)
- Fits the decision tree to the training data.
Args:
X: (array-like) - The input features.
y: (array-like) - The target labels.
sample_weight: (array-like) - The sample weights (default: None).
- learn(self, X, y, par_node=None, depth=0, sample_weight=None)
- Builds the decision tree based on the given training data.
Args:
X: (array-like) - The input features.
y: (array-like) - The target labels.
par_node: (dict) - The parent node of the current subtree (default: {}).
depth: (int) - The current depth of the subtree (default: 0).
sample_weight: (array-like) - The sample weights (default: None).
Returns:
dict: The learned decision tree.
- predict(self, X)
- Predicts the labels for a given set of records using the decision tree.
Args:
X: (array-like) - The input features.
Returns:
list: A list of predicted labels for each record.
- predict_proba(self, X)
- Predicts the probabilities for a given set of records using the decision tree.
Args:
X: (array-like) - The input features.
Returns:
list: A list of dictionaries where each dictionary represents the probability distribution
over the classes for a record.
Static methods defined here:
- classify(tree, record)
- Classifies a given record using the decision tree.
Args:
tree: (dict) - The decision tree.
record: (dict) - A dictionary representing the record to be classified.
Returns:
The label assigned to the record based on the decision tree.
Data descriptors defined here:
- __dict__
- dictionary for instance variables
- __weakref__
- list of weak references to the object
|
class ClassifierTreeUtility(builtins.object) |
|
ClassifierTreeUtility(min_samples_split=2)
Utility class for computing entropy, partitioning classes, and calculating information gain. |
|
Methods defined here:
- __init__(self, min_samples_split=2)
- Initialize the utility class.
- best_split(self, X, y, sample_weight=None)
- Finds the best attribute and value to split the data based on information gain.
Args:
X: (array-like) - The input features.
y: (array-like) - The target variable.
sample_weight: (array-like) - The sample weights (default: None).
Returns:
dict: A dictionary containing the best split attribute, split value, left and right subsets of X and y,
and the information gain achieved by the split.
- entropy(self, class_y, sample_weight=None)
- Computes the entropy for a given class.
Args:
class_y: (array-like) - The class labels.
sample_weight: (array-like) - The sample weights (default: None).
Returns:
float: The entropy value.
- information_gain(self, previous_y, current_y, sample_weight_prev=None, sample_weight_current=None)
- Calculates the information gain between the previous and current values of y.
Args:
previous_y: (array-like) - The previous values of y.
current_y: (array-like) - The current values of y.
sample_weight_prev: (array-like) - The sample weights for the previous y values (default: None).
sample_weight_current: (array-like) - The sample weights for the current y values (default: None).
Returns:
float: The information gain between the previous and current values of y.
- partition_classes(self, X, y, split_attribute, split_val, sample_weight=None)
- Partitions the dataset into two subsets based on a given split attribute and value.
Args:
X: (array-like) - The input features.
y: (array-like) - The target labels.
split_attribute: (int) - The index of the attribute to split on.
split_val: (float) - The value to split the attribute on.
sample_weight: (array-like) - The sample weights (default: None).
Returns:
X_left: (array-like) - The subset of input features where the split attribute is less than or equal to the split value.
X_right: (array-like) - The subset of input features where the split attribute is greater than the split value.
y_left: (array-like) - The subset of target labels corresponding to X_left.
y_right: (array-like) - The subset of target labels corresponding to X_right.
Data descriptors defined here:
- __dict__
- dictionary for instance variables
- __weakref__
- list of weak references to the object
| |