|
Methods defined here:
- df_to_ndarray(df, y_col=0)
- Converts a DataFrame to a NumPy array.
Args:
df: (pandas.DataFrame) - The DataFrame to be converted.
y_col: (int), optional - The index of the label column (default is 0).
Returns:
X: (numpy.ndarray) - The feature columns as a NumPy array.
y: (numpy.ndarray) - The label column as a NumPy array.
- find_categorical_columns(data)
- Finds the indices of non-numerical columns in a DataFrame or numpy array.
Args:
data: (pandas.DataFrame or numpy.ndarray) - The data to be checked.
Returns:
categorical_cols: (list) - The list of indices of non-numerical columns.
- k_split(X, y, k=5)
- Splits the data into k folds for cross-validation.
Args:
X: (numpy.ndarray) - The feature columns.
y: (numpy.ndarray) - The label column.
k: (int), optional - The number of folds (default is 5).
Returns:
X_folds: (list) - A list of k folds of feature columns.
y_folds: (list) - A list of k folds of label columns.
- one_hot_encode(data, cols)
- One-hot encodes non-numerical columns in a DataFrame or numpy array.
Drops the original columns after encoding.
Args:
data: (pandas.DataFrame or numpy.ndarray) - The data to be encoded.
cols: (list) - The list of column indices to be encoded.
Returns:
data: (pandas.DataFrame or numpy.ndarray) - The data with one-hot encoded columns.
- prepare_data(csv_file, label_col_index, cols_to_encode=None, write_to_csv=True)
- Prepares the data by loading a CSV file, one-hot encoding non-numerical columns, and optionally writing the prepared data to a new CSV file.
Args:
csv_file: (str) - The path of the CSV file to load.
label_col_index: (int) - The index of the label column.
cols_to_encode: (list), optional - The list of column indices to one-hot encode (default is None).
write_to_csv: (bool), optional - Whether to write the prepared data to a new CSV file (default is True).
Returns:
df: (pandas.DataFrame) - The prepared DataFrame.
prepared_csv_file: (str) - The path of the prepared CSV file. If write_to_csv is False, returns "N/A".
- write_data(df, csv_file, print_path=False)
- Writes the DataFrame to a CSV file.
Args:
df: (pandas.DataFrame) - The DataFrame to be written.
csv_file: (str) - The path of the CSV file to write to.
print_path: (bool), optional - If True, prints the file path (default is False).
Data descriptors defined here:
- __dict__
- dictionary for instance variables
- __weakref__
- list of weak references to the object
|