| |
- builtins.object
-
- Augmenter
- RandomOverSampler
- RandomUnderSampler
- SMOTE
class Augmenter(builtins.object) |
|
Augmenter(techniques, verbose=False)
General class for data augmentation techniques.
This class allows for the application of multiple augmentation techniques in sequence. |
|
Methods defined here:
- __init__(self, techniques, verbose=False)
- Initializes the Augmenter with a list of techniques and verbosity option.
- augment(self, X, y)
- Applies multiple augmentation techniques in sequence.
Args:
X: (np.ndarray) - Feature matrix.
y: (np.ndarray) - Target vector.
Returns:
tuple: (np.ndarray, np.ndarray) - Augmented feature matrix and target vector.
Data descriptors defined here:
- __dict__
- dictionary for instance variables
- __weakref__
- list of weak references to the object
|
class RandomOverSampler(builtins.object) |
|
RandomOverSampler(random_state=None)
Randomly over-sample the minority class by duplicating examples.
This technique helps to balance the class distribution by randomly duplicating samples from the minority class.
It is a simple yet effective method to address class imbalance in datasets.
Algorithm Steps:
- Step 1: Identify the minority class and its samples.
- Step 2: Calculate the number of samples needed to balance the class distribution.
- Step 3: Randomly select samples from the minority class with replacement.
- Step 4: Duplicate the selected samples to create a balanced dataset. |
|
Methods defined here:
- __init__(self, random_state=None)
- Initializes the RandomOverSampler with an optional random state.
- fit_resample(self, X, y)
- Resamples the dataset to balance the class distribution by duplicating minority class samples.
Args:
X: (array-like) - Feature matrix.
y: (array-like) - Target vector.
Returns:
tuple: (np.ndarray, np.ndarray) - Resampled feature matrix and target vector.
Data descriptors defined here:
- __dict__
- dictionary for instance variables
- __weakref__
- list of weak references to the object
|
class RandomUnderSampler(builtins.object) |
|
RandomUnderSampler(random_state=None)
Randomly under-sample the majority class by removing examples.
This technique helps to balance the class distribution by randomly removing samples from the majority class.
It is a simple yet effective method to address class imbalance in datasets.
Algorithm Steps:
- Step 1: Identify the majority class and its samples.
- Step 2: Calculate the number of samples to remove to balance the class distribution.
- Step 3: Randomly select samples from the majority class without replacement.
- Step 4: Remove the selected samples to create a balanced dataset. |
|
Methods defined here:
- __init__(self, random_state=None)
- Initializes the RandomUnderSampler with an optional random state.
- fit_resample(self, X, y)
- Resamples the dataset to balance the class distribution by removing majority class samples.
Args:
X: (array-like) - Feature matrix.
y: (array-like) - Target vector.
Returns:
tuple: (np.ndarray, np.ndarray) - Resampled feature matrix and target vector.
Data descriptors defined here:
- __dict__
- dictionary for instance variables
- __weakref__
- list of weak references to the object
|
class SMOTE(builtins.object) |
|
SMOTE(random_state=None, k_neighbors=5)
Synthetic Minority Over-sampling Technique (SMOTE) for balancing class distribution.
SMOTE generates synthetic samples for the minority class by interpolating between existing samples.
This helps to create a more balanced dataset, which can improve the performance of machine learning models.
Algorithm Steps:
- Step 1: Identify the minority class and its samples.
- Step 2: For each sample in the minority class, find its k nearest neighbors (using Euclidean distance.)
- Step 3: Randomly select one or more of these neighbors.
- Step 4: Create synthetic samples by interpolating between the original sample and the selected neighbors. |
|
Methods defined here:
- __init__(self, random_state=None, k_neighbors=5)
- Initializes the SMOTE with an optional random state and number of neighbors.
- fit_resample(self, X, y, force_equal=False)
- Resamples the dataset to balance the class distribution by generating synthetic samples.
Args:
X: (array-like) - Feature matrix.
y: (array-like) - Target vector.
force_equal: (bool), optional - If True, resample until classes are equal (default is False).
Returns:
tuple: (np.ndarray, np.ndarray) - Resampled feature matrix and target vector.
Data descriptors defined here:
- __dict__
- dictionary for instance variables
- __weakref__
- list of weak references to the object
| |