Python: module sega_learn.utils.dataPrep

sega_learn.utils.dataPrep

Modules

numpy

Classes

builtins.object

A class for preparing data for machine learning models.

Methods defined here:

df_to_ndarray(df, y_col=0): Converts a DataFrame to a NumPy array.

Args:
    df: (pandas.DataFrame) - The DataFrame to be converted.
    y_col: (int), optional - The index of the label column (default is 0).

Returns:
    X: (numpy.ndarray) - The feature columns as a NumPy array.
    y: (numpy.ndarray) - The label column as a NumPy array.

find_categorical_columns(data): Finds the indices of non-numerical columns in a DataFrame or numpy array.

Args:
data: (pandas.DataFrame or numpy.ndarray) - The data to be checked.

Returns:
categorical_cols: (list) - The list of indices of non-numerical columns.

k_split(X, y, k=5): Splits the data into k folds for cross-validation.

Args:
    X: (numpy.ndarray) - The feature columns.
    y: (numpy.ndarray) - The label column.
    k: (int), optional - The number of folds (default is 5).

Returns:
    X_folds: (list) - A list of k folds of feature columns.
    y_folds: (list) - A list of k folds of label columns.

one_hot_encode(data, cols): One-hot encodes non-numerical columns in a DataFrame or numpy array.

Drops the original columns after encoding.

Args:
    data: (pandas.DataFrame or numpy.ndarray) - The data to be encoded.
    cols: (list) - The list of column indices to be encoded.

Returns:
    data: (pandas.DataFrame or numpy.ndarray) - The data with one-hot encoded columns.

prepare_data(csv_file, label_col_index, cols_to_encode=None, write_to_csv=True): Prepares the data by loading a CSV file, one-hot encoding non-numerical columns, and optionally writing the prepared data to a new CSV file.

Args:
    csv_file: (str) - The path of the CSV file to load.
    label_col_index: (int) - The index of the label column.
    cols_to_encode: (list), optional - The list of column indices to one-hot encode (default is None).
    write_to_csv: (bool), optional - Whether to write the prepared data to a new CSV file (default is True).

Returns:
    df: (pandas.DataFrame) - The prepared DataFrame.
    prepared_csv_file: (str) - The path of the prepared CSV file. If write_to_csv is False, returns "N/A".

write_data(df, csv_file, print_path=False): Writes the DataFrame to a CSV file.

Args:
    df: (pandas.DataFrame) - The DataFrame to be written.
    csv_file: (str) - The path of the CSV file to write to.
    print_path: (bool), optional - If True, prints the file path (default is False).

Data descriptors defined here: