Python: module sega_learn.utils.dataAugmentation

sega_learn.utils.dataAugmentation

Modules

Classes

Augmenter
RandomOverSampler
RandomUnderSampler
SMOTE

Augmenter(techniques, verbose=False)

General class for data augmentation techniques.

This class allows for the application of multiple augmentation techniques in sequence.

Methods defined here:

__init__(self, techniques, verbose=False): Initializes the Augmenter with a list of techniques and verbosity option.

augment(self, X, y): Applies multiple augmentation techniques in sequence.

Args:
    X: (np.ndarray) - Feature matrix.
    y: (np.ndarray) - Target vector.

Returns:
    tuple: (np.ndarray, np.ndarray) - Augmented feature matrix and target vector.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class RandomOverSampler(builtins.object)

RandomOverSampler(random_state=None)

Randomly over-sample the minority class by duplicating examples.

This technique helps to balance the class distribution by randomly duplicating samples from the minority class.
It is a simple yet effective method to address class imbalance in datasets.

Algorithm Steps:
    - Step 1: Identify the minority class and its samples.
    - Step 2: Calculate the number of samples needed to balance the class distribution.
    - Step 3: Randomly select samples from the minority class with replacement.
    - Step 4: Duplicate the selected samples to create a balanced dataset.

Methods defined here:

__init__(self, random_state=None): Initializes the RandomOverSampler with an optional random state.

fit_resample(self, X, y): Resamples the dataset to balance the class distribution by duplicating minority class samples.

Args:
    X: (array-like) - Feature matrix.
    y: (array-like) - Target vector.

Returns:
    tuple: (np.ndarray, np.ndarray) - Resampled feature matrix and target vector.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class RandomUnderSampler(builtins.object)

RandomUnderSampler(random_state=None)

Randomly under-sample the majority class by removing examples.

This technique helps to balance the class distribution by randomly removing samples from the majority class.
It is a simple yet effective method to address class imbalance in datasets.

Algorithm Steps:
    - Step 1: Identify the majority class and its samples.
    - Step 2: Calculate the number of samples to remove to balance the class distribution.
    - Step 3: Randomly select samples from the majority class without replacement.
    - Step 4: Remove the selected samples to create a balanced dataset.

Methods defined here:

__init__(self, random_state=None): Initializes the RandomUnderSampler with an optional random state.

fit_resample(self, X, y): Resamples the dataset to balance the class distribution by removing majority class samples.

Args:
    X: (array-like) - Feature matrix.
    y: (array-like) - Target vector.

Returns:
    tuple: (np.ndarray, np.ndarray) - Resampled feature matrix and target vector.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class SMOTE(builtins.object)

SMOTE(random_state=None, k_neighbors=5)

Synthetic Minority Over-sampling Technique (SMOTE) for balancing class distribution.

SMOTE generates synthetic samples for the minority class by interpolating between existing samples.
This helps to create a more balanced dataset, which can improve the performance of machine learning models.

Algorithm Steps:
    - Step 1: Identify the minority class and its samples.
    - Step 2: For each sample in the minority class, find its k nearest neighbors (using Euclidean distance.)
    - Step 3: Randomly select one or more of these neighbors.
    - Step 4: Create synthetic samples by interpolating between the original sample and the selected neighbors.

Methods defined here:

__init__(self, random_state=None, k_neighbors=5): Initializes the SMOTE with an optional random state and number of neighbors.

fit_resample(self, X, y, force_equal=False): Resamples the dataset to balance the class distribution by generating synthetic samples.

Args:
    X: (array-like) - Feature matrix.
    y: (array-like) - Target vector.
    force_equal: (bool), optional - If True, resample until classes are equal (default is False).

Returns:
    tuple: (np.ndarray, np.ndarray) - Resampled feature matrix and target vector.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object