Python: module sega_learn.utils.dataPreprocessing

sega_learn.utils.dataPreprocessing

Modules

numpy

Classes

builtins.object

Scaler(method='standard')

A class for scaling data by standardization and normalization.

Methods defined here:

__init__(self, method='standard'): Initializes the scaler with the specified method.

Args:
method: (str) - The scaling method to use. Options are 'standard', 'minmax', or 'normalize'.

fit(self, X): Fits the scaler to the data.

Args:
X: (numpy.ndarray) - The data to fit the scaler to.

fit_transform(self, X): Fits the scaler to the data and then transforms it.

Args:
X: (numpy.ndarray) - The data to fit and transform.

Returns:
X_transformed: (numpy.ndarray) - The transformed data.

inverse_transform(self, X): Inverse transforms the data using the fitted scaler.

Args:
X: (numpy.ndarray) - The data to inverse transform.

Returns:
X_inverse: (numpy.ndarray) - The inverse transformed data.

transform(self, X): Transforms the data using the fitted scaler.

Args:
X: (numpy.ndarray) - The data to transform.

Returns:
X_transformed: (numpy.ndarray) - The transformed data.

Data descriptors defined here:

Functions

normalize(X, norm='l2'): Normalizes the input data using the specified norm.

Args:
    X: (numpy.ndarray) - The input data to be normalized.
    norm: (str), optional - The type of norm to use for normalization (default is 'l2').
        Options:
            - 'l2': L2 normalization (Euclidean norm).
            - 'l1': L1 normalization (Manhattan norm).
            - 'max': Max normalization (divides by the maximum absolute value).
            - 'minmax': Min-max normalization (scales to [0, 1]).

Returns:
    X: (numpy.ndarray) - The normalized data.

one_hot_encode(X, cols=None): One-hot encodes non-numerical columns in a DataFrame or numpy array.

Drops the original columns after encoding.

Args:
    X: (pandas.DataFrame or numpy.ndarray) - The data to be encoded.
    cols: (list), optional - The list of column indices to be encoded (default is None).
        If None, all non-numerical columns will be encoded.

Returns:
    X: (pandas.DataFrame or numpy.ndarray) - The data with one-hot encoded columns.