Python: package sega_learn.neural

This class contains various activation functions and their corresponding derivatives for use in neural networks.

Methods:
    relu: Rectified Linear Unit activation function. Returns the input directly if it's positive, otherwise returns 0.
    leaky_relu: Leaky ReLU activation function. A variant of ReLU that allows a small gradient when the input is negative.
    tanh: Hyperbolic tangent activation function. Maps input to range [-1, 1]. Commonly used for normalized input.
    sigmoid: Sigmoid activation function. Maps input to range [0, 1]. Commonly used for binary classification.
    softmax: Softmax activation function. Maps input into a probability distribution over multiple classes.

Static methods defined here:

leaky_relu(z, alpha=0.01): Leaky ReLU activation function: f(z) = z if z > 0, else alpha * z.

Allows a small, non-zero gradient when the input is negative to address the dying ReLU problem.

leaky_relu_derivative(z, alpha=0.01): Derivative of the Leaky ReLU function: f'(z) = 1 if z > 0, else alpha.

Returns 1 for positive input, and alpha for negative input.

relu(z): ReLU (Rectified Linear Unit) activation function: f(z) = max(0, z).

Returns the input directly if it's positive, otherwise returns 0.

relu_derivative(z): Derivative of the ReLU function: f'(z) = 1 if z > 0, else 0.

Returns 1 for positive input, and 0 for negative input.

sigmoid(z): Sigmoid activation function: f(z) = 1 / (1 + exp(-z)).

Maps input to the range [0, 1], commonly used for binary classification.

sigmoid_derivative(z): Derivative of the sigmoid function: f'(z) = sigmoid(z) * (1 - sigmoid(z)).

Used for backpropagation through the sigmoid activation.

softmax(z): Softmax activation function: f(z)_i = exp(z_i) / sum(exp(z_j)) for all j.

Maps input into a probability distribution over multiple classes. Used for multiclass classification.

tanh(z): Hyperbolic tangent (tanh) activation function: f(z) = (exp(z) - exp(-z)) / (exp(z) + exp(-z)).

Maps input to the range [-1, 1], typically used for normalized input.

tanh_derivative(z): Derivative of the tanh function: f'(z) = 1 - tanh(z)^2.

Used for backpropagation through the tanh activation.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class AdadeltaOptimizer(builtins.object)

AdadeltaOptimizer(learning_rate=1.0, rho=0.95, epsilon=1e-06, reg_lambda=0.0)

Adadelta optimizer class for training neural networks.

Formula:
    E[g^2]_t = rho * E[g^2]_{t-1} + (1 - rho) * g^2
    Delta_x = - (sqrt(E[delta_x^2]_{t-1} + epsilon) / sqrt(E[g^2]_t + epsilon)) * g
    E[delta_x^2]_t = rho * E[delta_x^2]_{t-1} + (1 - rho) * Delta_x^2
Derived from: https://arxiv.org/abs/1212.5701

Args:
    learning_rate (float, optional): The learning rate for the optimizer. Defaults to 1.0.
    rho (float, optional): The decay rate. Defaults to 0.95.
    epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-6.
    reg_lambda (float, optional): The regularization parameter. Defaults to 0.0.

Methods defined here:

__init__(self, learning_rate=1.0, rho=0.95, epsilon=1e-06, reg_lambda=0.0): Initializes the optimizer with the specified hyperparameters.

Args:
    learning_rate (float, optional): The learning rate for the optimizer. Defaults to 1.0.
    rho (float, optional): The decay rate for the running averages. Defaults to 0.95.
    epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-6.
    reg_lambda (float, optional): The regularization parameter for weight decay. Defaults to 0.0.

initialize(self, layers): Initializes the running averages for each layer's weights.

Args:
layers (list): List of layers in the neural network.

Returns:
None

update(self, layer, dW, db, index): Updates the weights and biases of a layer using the Adadelta optimization algorithm.

Args:
    layer (Layer): The layer to update.
    dW (ndarray): The gradient of the weights.
    db (ndarray): The gradient of the biases.
    index (int): The index of the layer.

Returns:
    None

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class AdamOptimizer(builtins.object)

AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, reg_lambda=0.01)

Adam optimizer class for training neural networks.

Formula: w = w - alpha * m_hat / (sqrt(v_hat) + epsilon) - lambda * w
Derived from: https://arxiv.org/abs/1412.6980

Args:
    learning_rate (float, optional): The learning rate for the optimizer. Defaults to 0.001.
    beta1 (float, optional): The exponential decay rate for the first moment estimates. Defaults to 0.9.
    beta2 (float, optional): The exponential decay rate for the second moment estimates. Defaults to 0.999.
    epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-8.
    reg_lambda (float, optional): The regularization parameter. Defaults to 0.01.

Methods defined here:

__init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, reg_lambda=0.01): Initializes the optimizer with the given hyperparameters.

Args:
    learning_rate (float, optional): The learning rate (alpha) for the optimizer. Defaults to 0.001.
    beta1 (float, optional): Exponential decay rate for the first moment estimates. Defaults to 0.9.
    beta2 (float, optional): Exponential decay rate for the second moment estimates. Defaults to 0.999.
    epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-8.
    reg_lambda (float, optional): Regularization parameter; higher values indicate stronger regularization. Defaults to 0.01.

Attributes:
    learning_rate (float): The learning rate for the optimizer.
    beta1 (float): Exponential decay rate for the first moment estimates.
    beta2 (float): Exponential decay rate for the second moment estimates.
    epsilon (float): A small value to prevent division by zero.
    reg_lambda (float): Regularization parameter for controlling overfitting.
    m (list): List to store first moment estimates for each parameter.
    v (list): List to store second moment estimates for each parameter.
    t (int): Time step counter for the optimizer.

initialize(self, layers): Initializes the first and second moment estimates for each layer's weights.

Args:
layers (list): List of layers in the neural network.

Returns:
None

update(self, layer, dW, db, index): Updates the weights and biases of a layer using the Adam optimization algorithm.

Args:
    layer (Layer): The layer to update.
    dW (ndarray): The gradient of the weights.
    db (ndarray): The gradient of the biases.
    index (int): The index of the layer.
Returns: None

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class BCEWithLogitsLoss(builtins.object)

Custom binary cross entropy loss with logits implementation using numpy.

Formula: -mean(y * log(p) + (1 - y) * log(1 - p))

Methods:
__call__(self, logits, targets): Calculate the binary cross entropy loss.

Methods defined here:

__call__(self, logits, targets): Calculate the binary cross entropy loss.

Args:
    logits (np.ndarray): The logits (predicted values) of shape (num_samples,).
    targets (np.ndarray): The target labels of shape (num_samples,).

Returns:
    float: The binary cross entropy loss.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class BaseBackendNeuralNetwork(sega_learn.neural_networks.neuralNetworkBase.NeuralNetworkBase)

BaseBackendNeuralNetwork(layers, dropout_rate=0.2, reg_lambda=0.01, activations=None, loss_function=None, regressor=False)

A class representing a backend implementation of a neural network with support for forward propagation, backward propagation, training, evaluation, and hyperparameter tuning.

This class extends the `NeuralNetworkBase` class and provides additional functionality
for managing layers, applying dropout, calculating loss, and optimizing weights and biases.

Attributes:
    layers (list): List of layer objects in the neural network.
    layer_outputs (list): Outputs of each layer during forward propagation.
    weights (list): Weights of each layer.
    biases (list): Biases of each layer.
    train_loss (list): Training loss values over epochs.
    train_accuracy (list): Training accuracy values over epochs.
    val_loss (list): Validation loss values over epochs.
    val_accuracy (list): Validation accuracy values over epochs.
    train_precision (list): Training precision values over epochs.
    train_recall (list): Training recall values over epochs.
    train_f1 (list): Training F1-score values over epochs.
    val_precision (list): Validation precision values over epochs.
    val_recall (list): Validation recall values over epochs.
    val_f1 (list): Validation F1-score values over epochs.
    learning_rates (list): Learning rates over epochs.

Methods:
    __init__(layers, dropout_rate=0.2, reg_lambda=0.01, activations=None, loss_function=None, regressor=False):
        Initializes the neural network with the specified layers, dropout rate,
        regularization parameter, activation functions, and optional loss function.
    initialize_new_layers():
        Initializes the layers of the neural network with random weights and biases.
    forward(X, training=True):
        Performs forward propagation through the neural network.
    backward(y):
        Performs backward propagation to calculate gradients for weight and bias updates.
    fit(X_train, y_train, X_val=None, y_val=None, optimizer=None, epochs=100, batch_size=32, ...):
        Fits the neural network to the training data.
    train(X_train, y_train, X_val=None, y_val=None, optimizer=None, epochs=100, batch_size=32, ...):
        Trains the neural network model with optional validation and early stopping.
    evaluate(X, y):
        Evaluates the model on the given data and returns accuracy and predictions.
    predict(X):
        Predicts the output for the given input data.
    calculate_loss(X, y):
        Calculates the loss with L2 regularization for the given input and target labels.
    _create_optimizer(optimizer_type, learning_rate, JIT=False):
        Helper method to create optimizer instances based on the specified type.
    tune_hyperparameters(X_train, y_train, X_val, y_val, param_grid, ...):
        Performs hyperparameter tuning using grid search.
    train_with_animation_capture(X_train, y_train, X_val=None, y_val=None, ...):
        Trains the neural network while capturing training metrics in real-time animation.

Method resolution order:: BaseBackendNeuralNetwork; sega_learn.neural_networks.neuralNetworkBase.NeuralNetworkBase; builtins.object

Methods defined here:

__init__(self, layers, dropout_rate=0.2, reg_lambda=0.01, activations=None, loss_function=None, regressor=False): Initializes the Numba backend neural network.

Args:
    layers: (list) - List of layer sizes or Layer objects.
    dropout_rate: (float) - Dropout rate for regularization.
    reg_lambda: (float) - L2 regularization parameter.
    activations: (list) - List of activation functions for each layer.
    loss_function: (callable) optional - Custom loss function to use (default is None, which uses the default calculate_loss implementation).
    regressor: (bool) - Whether the model is a regressor (default is False).

backward(self, y): Performs backward propagation to calculate the gradients.

Args:
y: (ndarray) - Target labels of shape (m, output_size).

calculate_loss(self, X, y): Calculates the loss with L2 regularization.

Args:
    X: (np.ndarray) - Input feature data.
    y: (np.ndarray) - Target labels.

Returns:
    loss: (float) - The calculated loss value with L2 regularization.

evaluate(self, X, y): Evaluates the model's performance on the given data.

Args:
    X: (np.ndarray) - Input feature data for evaluation.
    y: (np.ndarray) - True target labels corresponding to the input data.

Returns:
    metric: (float) - The evaluation metric (accuracy for classification, MSE for regression).
    predicted: (np.ndarray) - The predicted labels or values for the input data.

fit(self, X_train, y_train, X_val=None, y_val=None, optimizer=None, epochs=100, batch_size=32, early_stopping_threshold=10, lr_scheduler=None, p=False, use_tqdm=False, n_jobs=1, track_metrics=False, track_adv_metrics=False, save_animation=False, save_path='training_animation.mp4', fps=1, dpi=100, frame_every=1): Fits the neural network to the training data.

forward(self, X, training=True): Performs forward propagation through the neural network.

Args:
    X: (ndarray): - Input data of shape (batch_size, input_size).
    training: (bool) - Whether the network is in training mode (applies dropout).

Returns:
    ndarray: Output predictions of shape (batch_size, output_size).

initialize_new_layers(self): Initializes the layers of the neural network.

Each layer is created with the specified number of neurons and activation function.

predict(self, X): Generates predictions for the given input data.

Args:
    X: (np.ndarray) - Input feature data for which predictions are to be made.

Returns:
    outputs: (np.ndarray) - Predicted outputs. If the model is binary, returns the raw outputs.
                Otherwise, returns the class indices with the highest probability.

train(self, X_train, y_train, X_val=None, y_val=None, optimizer=None, epochs=100, batch_size=32, early_stopping_threshold=10, lr_scheduler=None, p=True, use_tqdm=True, n_jobs=1, track_metrics=False, track_adv_metrics=False, save_animation=False, save_path='training_animation.mp4', fps=1, dpi=100, frame_every=1): Trains the neural network model.

Args:
    X_train: (ndarray) - Training data features.
    y_train: (ndarray) - Training data labels.
    X_val: (ndarray) - Validation data features, optional.
    y_val: (ndarray) - Validation data labels, optional.
    optimizer: (Optimizer) - Optimizer for updating parameters (default: Adam, lr=0.0001).
    epochs: (int) - Number of training epochs (default: 100).
    batch_size: (int) - Batch size for mini-batch gradient descent (default: 32).
    early_stopping_threshold: (int) - Patience for early stopping (default: 10).
    lr_scheduler: (Scheduler) - Learning rate scheduler (default: None).
    p: (bool) - Whether to print training progress (default: True).
    use_tqdm: (bool) - Whether to use tqdm for progress bar (default: True).
    n_jobs: (int) - Number of jobs for parallel processing (default: 1).
    track_metrics: (bool) - Whether to track training metrics (default: False).
    track_adv_metrics: (bool) - Whether to track advanced metrics (default: False).
    save_animation: (bool) - Whether to save the animation of metrics (default: False).
    save_path: (str) - Path to save the animation file. File extension must be .mp4 or .gif (default: 'training_animation.mp4').
    fps: (int) - Frames per second for the saved animation (default: 1).
    dpi: (int) - DPI for the saved animation (default: 100).
    frame_every: (int) - Capture frame every N epochs (to reduce file size) (default: 1).

train_with_animation_capture(self, X_train, y_train, X_val=None, y_val=None, optimizer=None, epochs=100, batch_size=32, early_stopping_threshold=10, lr_scheduler=None, save_path='training_animation.mp4', fps=1, dpi=100, frame_every=1): Trains the neural network model while capturing training metrics in real-time animation.

Args:
    X_train: (np.ndarray) - Training feature data.
    y_train: (np.ndarray) - Training target data.
    X_val: (np.ndarray), optional - Validation feature data (default is None).
    y_val: (np.ndarray), optional - Validation target data (default is None).
    optimizer: (Optimizer), optional - Optimizer for updating parameters (default is None).
    epochs: (int), optional - Number of training epochs (default is 100).
    batch_size: (int), optional - Batch size for mini-batch gradient descent (default is 32).
    early_stopping_threshold: (int), optional - Patience for early stopping (default is 10).
    lr_scheduler: (Scheduler), optional - Learning rate scheduler (default is None).
    save_path: (str), optional - Path to save the animation file (default is 'training_animation.mp4').
    fps: (int), optional - Frames per second for the saved animation (default is 1).
    dpi: (int), optional - DPI for the saved animation (default is 100).
    frame_every: (int), optional - Capture frame every N epochs (default is 1).

Returns:
    None

tune_hyperparameters(self, X_train, y_train, X_val, y_val, param_grid, layer_configs=None, optimizer_types=None, lr_range=(0.0001, 0.01, 5), epochs=30, batch_size=32): Performs hyperparameter tuning using grid search.

Args:
    X_train: (np.ndarray) - Training feature data.
    y_train: (np.ndarray) - Training target data.
    X_val: (np.ndarray) - Validation feature data.
    y_val: (np.ndarray) - Validation target data.
    param_grid: (dict) - Dictionary of parameters to try.
    layer_configs: (list), optional - List of layer configurations (default is None).
    optimizer_types: (list), optional - List of optimizer types (default is None).
    lr_range: (tuple) - Tuple of (min_lr, max_lr, num_steps) for learning rates.
    epochs: (int) - Maximum epochs for each trial.
    batch_size: (int) - Batch size for training.

Returns:
    best_params: (dict) - Best hyperparameters found.
    best_accuracy: (float) - Best validation accuracy.

Methods inherited from sega_learn.neural_networks.neuralNetworkBase.NeuralNetworkBase:

apply_dropout(self, X): Applies dropout to the activation X.

Args:
X: (ndarray) - Activation values.

Returns:
ndarray: Activation values after applying dropout.

calculate_precision_recall_f1(self, X, y): Calculates precision, recall, and F1 score.

Args:
    X: (ndarray) - Input data
    y: (ndarray) - Target labels
Returns:
    precision: (float) - Precision score
    recall: (float) - Recall score
    f1: (float) - F1 score

compute_l2_reg(self, weights): Computes the L2 regularization term.

Args:
weights: (list) - List of weight matrices.

Returns:
float: L2 regularization term.

create_scheduler(self, scheduler_type, optimizer, **kwargs): Creates a learning rate scheduler.

initialize_layers(self): Initializes the weights and biases of the layers.

plot_metrics(self, save_dir=None): Plots the training and validation metrics.

Data descriptors inherited from sega_learn.neural_networks.neuralNetworkBase.NeuralNetworkBase:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class ConvLayer(builtins.object)

ConvLayer(in_channels, out_channels, kernel_size, stride=1, padding=0, activation='relu')

A convolutional layer implementation for neural networks.

This layer performs 2D convolution operations, commonly used in convolutional neural networks (CNNs).
The implementation uses the im2col technique for efficient computation, transforming the convolution operation into matrix multiplication.
An optional activation function is applied element-wise to the output.

Args:
    in_channels (int): Number of input channels (depth of input volume).
    out_channels (int): Number of output channels (number of filters).
    kernel_size (int): Size of the convolutional kernel (square kernel assumed).
    stride (int, optional): Stride of the convolution. Default: 1.
    padding (int, optional): Zero-padding added to both sides of the input. Default: 0.
    activation (str, optional): Activation function to use. Options are "relu", "sigmoid", "tanh", or None. Default: "relu".

Attributes:
    in_channels (int): Number of input channels.
    out_channels (int): Number of output channels.
    kernel_size (int): Size of the square convolutional kernel.
    stride (int): Stride of the convolution.
    padding (int): Zero-padding added to both sides of the input.
    weights (numpy.ndarray): Learnable weights of shape (out_channels, in_channels, kernel_size, kernel_size).
    biases (numpy.ndarray): Learnable biases of shape (out_channels, 1).
    activation (str): Type of activation function.
    weight_gradients (numpy.ndarray): Gradients with respect to weights.
    bias_gradients (numpy.ndarray): Gradients with respect to biases.
    input_cache (numpy.ndarray): Cached input for use in backward pass.
    X_cols (numpy.ndarray): Cached column-transformed input.
    X_padded (numpy.ndarray): Cached padded input.
    h_out (int): Height of output feature maps.
    w_out (int): Width of output feature maps.
    input_size (int): Size of input (same as in_channels).
    output_size (int): Size of output (same as out_channels).

Methods:
    zero_grad(): Reset gradients to zero.
    _im2col(x, h_out, w_out): Convert image regions to columns for efficient convolution.
    forward(X): Perform forward pass of the convolutional layer.
    _col2im(dcol, x_shape): Convert column back to image format for the backward pass.
    backward(d_out, reg_lambda=0): Perform backward pass of the convolutional layer.

Methods defined here:

__init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, activation='relu'): Initializes a convolutional layer object for neural networks.

This layer performs 2D convolution operations, commonly used in convolutional neural networks (CNNs).

Args:
    in_channels: (int) - Number of input channels (depth of input volume).
    out_channels: (int) - Number of output channels (number of filters).
    kernel_size: (int) - Size of the convolutional kernel (square kernel assumed).
    stride: (int), optional - Stride of the convolution (default is 1).
    padding: (int), optional - Zero-padding added to both sides of the input (default is 0).
    activation: (str), optional - Activation function to use (default is "relu").

Attributes:
    in_channels: (int) - Number of input channels.
    out_channels: (int) - Number of output channels.
    kernel_size: (int) - Size of the square convolutional kernel.
    stride: (int) - Stride of the convolution.
    padding: (int) - Zero-padding added to both sides of the input.
    weights: (np.ndarray) - Learnable weights of shape (out_channels, in_channels, kernel_size, kernel_size).
    biases: (np.ndarray) - Learnable biases of shape (out_channels, 1).
    activation: (str) - Type of activation function.
    weight_gradients: (np.ndarray or None) - Gradients with respect to weights, initialized to None.
    bias_gradients: (np.ndarray or None) - Gradients with respect to biases, initialized to None.
    input_cache: (np.ndarray or None) - Cached input for use in backward pass.
    input_size: (int) - Size of input (same as in_channels).
    output_size: (int) - Size of output (same as out_channels).

activate(self, Z): Apply activation function.

backward(self, d_out, reg_lambda=0): Optimized backward pass using im2col technique.

Args:
    d_out: (np.ndarray) - Gradient of the loss with respect to the layer output,
                      shape (batch_size, out_channels, h_out, w_out)
    reg_lambda: (float, optional) - Regularization parameter.

Returns:
    dX: Gradient with respect to the input X.

forward(self, X): Perform forward pass of the convolutional layer.

Args:
X: numpy array with shape (batch_size, in_channels, height, width)

Returns:
Output feature maps after convolution and activation.

zero_grad(self): Reset the gradients of the weights and biases to zero.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class CrossEntropyLoss(builtins.object)

Custom cross entropy loss implementation using numpy for multi-class classification.

Formula: -sum(y * log(p) + (1 - y) * log(1 - p)) / m
Methods:
__call__(self, logits, targets): Calculate the cross entropy loss.

Methods defined here:

__call__(self, logits, targets): Calculate the cross entropy loss.

Args:
    logits (np.ndarray): The logits (predicted values) of shape (num_samples, num_classes).
    targets (np.ndarray): The target labels of shape (num_samples,).

Returns:
    float: The cross entropy loss.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class CuPyActivation(builtins.object)

Activation functions for neural networks using CuPy.

Static methods defined here:

leaky_relu(z, alpha=0.01): Leaky ReLU activation function: f(z) = z if z > 0, else alpha * z.

Allows a small, non-zero gradient when the input is negative to address the dying ReLU problem.

leaky_relu_derivative(z, alpha=0.01): Derivative of the Leaky ReLU function: f'(z) = 1 if z > 0, else alpha.

Returns 1 for positive input, and alpha for negative input.

relu(z): ReLU (Rectified Linear Unit) activation function: f(z) = max(0, z).

Returns the input directly if it's positive, otherwise returns 0.

relu_derivative(z): Derivative of the ReLU function: f'(z) = 1 if z > 0, else 0.

Returns 1 for positive input, and 0 for negative input.

sigmoid(z): Sigmoid activation function: f(z) = 1 / (1 + exp(-z)).

Maps input to the range [0, 1], commonly used for binary classification.

sigmoid_derivative(z): Derivative of the sigmoid function: f'(z) = sigmoid(z) * (1 - sigmoid(z)).

Used for backpropagation through the sigmoid activation.

softmax(z): Softmax activation function: f(z)_i = exp(z_i) / sum(exp(z_j)) for all j.

Maps input into a probability distribution over multiple classes. Used for multiclass classification.

tanh(z): Hyperbolic tangent (tanh) activation function: f(z) = (exp(z) - exp(-z)) / (exp(z) + exp(-z)).

Maps input to the range [-1, 1], typically used for normalized input.

tanh_derivative(z): Derivative of the tanh function: f'(z) = 1 - tanh(z)^2.

Used for backpropagation through the tanh activation.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class CuPyAdadeltaOptimizer(builtins.object)

CuPyAdadeltaOptimizer(learning_rate=1.0, rho=0.95, epsilon=1e-06, reg_lambda=0.0)

Adadelta optimizer class for training neural networks.

Formula:
    E[g^2]_t = rho * E[g^2]_{t-1} + (1 - rho) * g^2
    Delta_x = - (sqrt(E[delta_x^2]_{t-1} + epsilon) / sqrt(E[g^2]_t + epsilon)) * g
    E[delta_x^2]_t = rho * E[delta_x^2]_{t-1} + (1 - rho) * Delta_x^2
Derived from: https://arxiv.org/abs/1212.5701

Args:
    learning_rate (float, optional): The learning rate for the optimizer. Defaults to 1.0.
    rho (float, optional): The decay rate. Defaults to 0.95.
    epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-6.
    reg_lambda (float, optional): The regularization parameter. Defaults to 0.0.

Methods defined here:

__init__(self, learning_rate=1.0, rho=0.95, epsilon=1e-06, reg_lambda=0.0): Initializes the optimizer with the specified hyperparameters.

Args:
    learning_rate: (float), optional - The learning rate for the optimizer (default is 1.0).
    rho: (float), optional - The decay rate for the moving average of squared gradients (default is 0.95).
    epsilon: (float), optional - A small constant to prevent division by zero (default is 1e-6).
    reg_lambda: (float), optional - The regularization parameter for weight decay (default is 0.0).

Attributes:
    E_g2: (None or np.ndarray) - The moving average of squared gradients, initialized as None.
    E_delta_x2: (None or np.ndarray) - The moving average of squared parameter updates, initialized as None.

initialize(self, layers): Initializes the optimizer's internal state for the given layers.

Args:
layers: (list) - A list of layers, each containing weights and biases.

update_layers(self, layers, dWs, dbs): Updates the weights and biases of the layers using Adadelta optimization.

Args:
    layers: (list) - A list of layers to update.
    dWs: (list) - Gradients of the weights for each layer.
    dbs: (list) - Gradients of the biases for each layer.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class CuPyAdamOptimizer(builtins.object)

CuPyAdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, reg_lambda=0.01)

Adam optimizer class for training neural networks.

Formula: w = w - alpha * m_hat / (sqrt(v_hat) + epsilon) - lambda * w
Derived from: https://arxiv.org/abs/1412.6980

Args:
    learning_rate (float, optional): The learning rate for the optimizer. Defaults to 0.001.
    beta1 (float, optional): The exponential decay rate for the first moment estimates. Defaults to 0.9.
    beta2 (float, optional): The exponential decay rate for the second moment estimates. Defaults to 0.999.
    epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-8.
    reg_lambda (float, optional): The regularization parameter. Defaults to 0.01.

Methods defined here:

__init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, reg_lambda=0.01): Initializes the optimizer with the specified hyperparameters.

Args:
    learning_rate: (float), optional - The step size for updating weights (default is 0.001).
    beta1: (float), optional - Exponential decay rate for the first moment estimates (default is 0.9).
    beta2: (float), optional - Exponential decay rate for the second moment estimates (default is 0.999).
    epsilon: (float), optional - A small constant to prevent division by zero (default is 1e-8).
    reg_lambda: (float), optional - Regularization parameter for weight decay (default is 0.01).

initialize(self, layers): Initializes the optimizer's internal state for the given layers.

Args:
layers: (list) - A list of layers, each containing weights and biases.

update_layers(self, layers, dWs, dbs): Updates the weights and biases of the layers using Adam optimization.

Args:
    layers: (list) - A list of layers to update.
    dWs: (list) - Gradients of the weights for each layer.
    dbs: (list) - Gradients of the biases for each layer.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class CuPyBCEWithLogitsLoss(builtins.object)

Optimized binary cross entropy loss with logits implementation using cupy.

Formula: -mean(y * log(sigmoid(x)) + (1 - y) * log(1 - sigmoid(x)))

Methods:
__call__(self, logits, targets): Calculate the binary cross entropy loss.

Methods defined here:

__call__(self, logits, targets): Calculate the binary cross entropy loss.

Args:
    logits (cp.ndarray): The logits (predicted values) of shape (num_samples,).
    targets (cp.ndarray): The target labels of shape (num_samples,).

Returns:
    float: The binary cross entropy loss.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class CuPyBackendNeuralNetwork(sega_learn.neural_networks.neuralNetworkBase.NeuralNetworkBase)

CuPyBackendNeuralNetwork(layers, dropout_rate=0.2, reg_lambda=0.01, activations=None)

CuPyBackendNeuralNetwork is a neural network implementation that uses CuPy for GPU-accelerated computations.

It inherits from NeuralNetworkBase and provides functionality for forward and backward propagation,
training, evaluation, and optimization using CuPy arrays and operations.

Attributes:
    layers (list): List of layers in the neural network.
    compiled (bool): Indicates whether the network is compiled.
    trainable_layers (list): List of layers with trainable parameters.
    layer_outputs (list): Cache for forward pass outputs.
    is_binary (bool): Indicates if the network is for binary classification.
    weights (list): List of weights for trainable layers.
    biases (list): List of biases for trainable layers.
    dWs_cache (list): Cache for weight gradients.
    dbs_cache (list): Cache for bias gradients.
    stream_pool_size (int): Number of CUDA streams for asynchronous processing.
    stream_pool (list): Pool of CUDA streams for asynchronous operations.

Methods:
    __init__(layers, dropout_rate=0.2, reg_lambda=0.01, activations=None):
        Initializes the CuPyBackendNeuralNetwork with specified layers, dropout rate, regularization, and activations.
    initialize_new_layers():
        Initializes the layers of the neural network with specified sizes and activation functions.
    apply_dropout(X):
        Applies dropout regularization to the input data.
    forward(X, training=True):
        Performs forward propagation through the neural network.
    backward(y):
        Performs backward propagation to calculate gradients for weights and biases.
    _process_batches_async(X_shuffled, y_shuffled, batch_size, weights, biases, activations, dropout_rate, is_binary, reg_lambda, dWs_acc, dbs_acc):
        Processes batches asynchronously using CUDA streams for forward and backward propagation.
    is_not_instance_of_classes(obj, classes):
        Checks if an object is not an instance of any class in a given list of classes.
    train(X_train, y_train, X_val=None, y_val=None, optimizer=None, epochs=100, batch_size=32, early_stopping_threshold=10, lr_scheduler=None, p=True, use_tqdm=True, n_jobs=1, track_metrics=False, track_adv_metrics=False, save_animation=False, save_path="training_animation.mp4", fps=1, dpi=100, frame_every=1):
        Trains the neural network model with specified parameters and options.
    evaluate(X, y):
        Evaluates the model performance on the given input data and labels.
    _evaluate_cupy(y_hat, y_true, is_binary):
        Evaluates model performance using CuPy arrays for predictions and true labels.
    predict(X):
        Predicts the output for the given input data.
    calculate_loss(X, y):
        Calculates the loss with L2 regularization for the given input data and labels.
    _create_optimizer(optimizer_type, learning_rate, JIT=False):
        Helper method to create optimizer instances based on the specified type and learning rate.

Method resolution order:: CuPyBackendNeuralNetwork; sega_learn.neural_networks.neuralNetworkBase.NeuralNetworkBase; builtins.object

Methods defined here:

__init__(self, layers, dropout_rate=0.2, reg_lambda=0.01, activations=None): Initializes the CuPy backend neural network.

Args:
    layers: (list) - List of layer sizes or Layer objects.
    dropout_rate: (float) - Dropout rate for regularization (default is 0.2).
    reg_lambda: (float) - L2 regularization parameter (default is 0.01).
    activations: (list), optional - List of activation functions for each layer (default is None).

Returns:
    None

apply_dropout(self, X): Applies dropout regularization to the input data.

backward(self, y): Performs backward propagation to calculate the gradients.

Args:
y (ndarray): Target labels of shape (m, output_size).

calculate_loss(self, X, y): Calculates the loss with L2 regularization.

Args:
    X (ndarray): Input data.
    y (ndarray): Target labels.

Returns:
    float: The calculated loss value.

evaluate(self, X, y): Evaluates the model performance on the given data.

Args:
    X: (np.ndarray or cp.ndarray) - Input feature data.
    y: (np.ndarray or cp.ndarray) - Target labels.

Returns:
    accuracy: (float) - The accuracy of the model.
    predicted: (np.ndarray) - Predicted labels as a NumPy array.

forward(self, X, training=True): Performs forward propagation through the neural network.

Args:
    X (ndarray): Input data of shape (batch_size, input_size).
    training (bool): Whether the network is in training mode (applies dropout).

Returns:
    ndarray: Output predictions of shape (batch_size, output_size).

initialize_new_layers(self): Initializes the layers of the neural network.

Each layer is created with the specified number of neurons and activation function.

predict(self, X): Predicts the output for the given input data.

Args:
X (ndarray): Input data.

Returns:
ndarray: Predicted outputs.

train(self, X_train, y_train, X_val=None, y_val=None, optimizer=None, epochs=100, batch_size=32, early_stopping_threshold=10, lr_scheduler=None, p=True, use_tqdm=True, n_jobs=1, track_metrics=False, track_adv_metrics=False, save_animation=False, save_path='training_animation.mp4', fps=1, dpi=100, frame_every=1): Trains the neural network model.

Args:
    X_train: (ndarray) - Training data features.
    y_train: (ndarray) - Training data labels.
    X_val: (ndarray) - Validation data features, optional.
    y_val: (ndarray) - Validation data labels, optional.
    optimizer: (Optimizer) - Optimizer for updating parameters (default: JITAdam, lr=0.0001).
    epochs: (int) - Number of training epochs (default: 100).
    batch_size: (int) - Batch size for mini-batch gradient descent (default: 32).
    early_stopping_threshold: (int) - Patience for early stopping (default: 10).
    lr_scheduler: (Scheduler) - Learning rate scheduler (default: None).
    p: (bool) - Whether to print training progress (default: True).
    use_tqdm: (bool) - Whether to use tqdm for progress bar (default: True).
    n_jobs: (int) - Number of jobs for parallel processing (default: 1).
    track_metrics: (bool) - Whether to track training metrics (default: False).
    track_adv_metrics: (bool) - Whether to track advanced metrics (default: False).
    save_animation: (bool) - Whether to save the animation of metrics (default: False).
    save_path: (str) - Path to save the animation file. File extension must be .mp4 or .gif (default: 'training_animation.mp4').
    fps: (int) - Frames per second for the saved animation (default: 1).
    dpi: (int) - DPI for the saved animation (default: 100).
    frame_every: (int) - Capture frame every N epochs (to reduce file size) (default: 1).

Static methods defined here:

is_not_instance_of_classes(obj, classes): Checks if an object is not an instance of any class in a list of classes.

Args:
    obj: The object to check.
    classes: A list of classes.

Returns:
    bool: True if the object is not an instance of any class in the list of classes, False otherwise.

Methods inherited from sega_learn.neural_networks.neuralNetworkBase.NeuralNetworkBase:

calculate_precision_recall_f1(self, X, y): Calculates precision, recall, and F1 score.

Args:
    X: (ndarray) - Input data
    y: (ndarray) - Target labels
Returns:
    precision: (float) - Precision score
    recall: (float) - Recall score
    f1: (float) - F1 score

compute_l2_reg(self, weights): Computes the L2 regularization term.

Args:
weights: (list) - List of weight matrices.

Returns:
float: L2 regularization term.

create_scheduler(self, scheduler_type, optimizer, **kwargs): Creates a learning rate scheduler.

initialize_layers(self): Initializes the weights and biases of the layers.

plot_metrics(self, save_dir=None): Plots the training and validation metrics.

Data descriptors inherited from sega_learn.neural_networks.neuralNetworkBase.NeuralNetworkBase:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class CuPyCrossEntropyLoss(builtins.object)

Optimized cross entropy loss implementation using cupy for multi-class classification.

Formula: -sum(y * log(p)) / m
Methods:
__call__(self, logits, targets): Calculate the cross entropy loss.

Methods defined here:

__call__(self, logits, targets): Calculate the cross entropy loss.

Args:
    logits (cp.ndarray): The logits (predicted values) of shape (num_samples, num_classes).
    targets (cp.ndarray): The target labels of shape (num_samples, num_classes) or (num_samples,).

Returns:
    float: The cross entropy loss.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class CuPyDenseLayer(builtins.object)

CuPyDenseLayer(input_size, output_size, activation='relu')

Initializes a Layer object.

Args:
    input_size (int): The size of the input to the layer.
    output_size (int): The size of the output from the layer.
    activation (str): The activation function to be used in the layer.

Methods defined here:

__init__(self, input_size, output_size, activation='relu'): Initializes the layer with weights, biases, and activation function.

Args:
    input_size: (int) - The number of input features to the layer.
    output_size: (int) - The number of output features from the layer.
    activation: (str), optional - The activation function to use (default is "relu").
    Supported values: "relu", "leaky_relu", or others.

Attributes:
    weights: (cp.ndarray) - The weight matrix initialized using He initialization for "relu" or "leaky_relu".
    biases: (cp.ndarray) - The bias vector initialized to zeros.
    weight_gradients: (cp.ndarray) - Gradients of the weights, initialized to zeros.
    bias_gradients: (cp.ndarray) - Gradients of the biases, initialized to zeros.
    input_size: (int) - The number of input features to the layer.
    output_size: (int) - The number of output features from the layer.
    activation: (str) - The activation function used in the layer.

activate(self, Z): Apply activation function.

activation_derivative(self, Z): Apply activation derivative.

zero_grad(self): Reset the gradients of the weights and biases to zero.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class CuPySGDOptimizer(builtins.object)

CuPySGDOptimizer(learning_rate=0.001, momentum=0.0, reg_lambda=0.0)

Stochastic Gradient Descent (SGD) optimizer class for training neural networks.

Formula: v = momentum * v - learning_rate * dW, w = w + v - learning_rate * reg_lambda * w

Args:
    learning_rate (float, optional): The learning rate for the optimizer. Defaults to 0.001.
    momentum (float, optional): The momentum factor. Defaults to 0.0.
    reg_lambda (float, optional): The regularization parameter. Defaults to 0.0.

Methods defined here:

__init__(self, learning_rate=0.001, momentum=0.0, reg_lambda=0.0): Initializes the optimizer with specified hyperparameters.

Args:
    learning_rate: (float), optional - The step size for updating weights (default is 0.001).
    momentum: (float), optional - The momentum factor for accelerating gradient descent (default is 0.0).
    reg_lambda: (float), optional - The regularization strength to prevent overfitting (default is 0.0).

Attributes:
    velocity: (None or np.ndarray) - The velocity term used for momentum-based updates (initialized as None).

initialize(self, layers): Initializes the optimizer's velocity for the given layers.

Args:
layers: (list) - A list of layers, each containing weights and biases.

update_layers(self, layers, dWs, dbs): Updates the weights and biases of the layers using SGD optimization.

Args:
    layers: (list) - A list of layers to update.
    dWs: (list) - Gradients of the weights for each layer.
    dbs: (list) - Gradients of the biases for each layer.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class DenseLayer(builtins.object)

DenseLayer(input_size, output_size, activation='relu')

Initializes a fully connected layer object, where each neuron is connected to all neurons in the previous layer.

Each layer consists of weights, biases, and an activation function.

Args:
    input_size (int): The size of the input to the layer.
    output_size (int): The size of the output from the layer.
    activation (str): The activation function to be used in the layer.

Attributes:
    weights (np.ndarray): Weights of the layer.
    biases (np.ndarray): Biases of the layer.
    activation (str): Activation function name.
    weight_gradients (np.ndarray): Gradients of the weights.
    bias_gradients (np.ndarray): Gradients of the biases.
    input_cache (np.ndarray): Cached input for backpropagation.
    output_cache (np.ndarray): Cached output for backpropagation.

Methods:
    zero_grad(): Resets the gradients of the weights and biases to zero.
    forward(X): Performs the forward pass of the layer.
    backward(dA, reg_lambda): Performs the backward pass of the layer.
    activate(Z): Applies the activation function.
    activation_derivative(Z): Applies the derivative of the activation function.

Methods defined here:

__init__(self, input_size, output_size, activation='relu'): Initializes the layer with weights, biases, and activation function.

Args:
    input_size: (int) - The number of input features to the layer.
    output_size: (int) - The number of output features from the layer.
    activation: (str), optional - The activation function to use (default is "relu").

Attributes:
    weights: (np.ndarray) - The weight matrix initialized using He initialization for ReLU or Leaky ReLU, or standard initialization otherwise.
    biases: (np.ndarray) - The bias vector initialized to zeros.
    input_size: (int) - The number of input features to the layer.
    output_size: (int) - The number of output features from the layer.
    activation: (str) - The activation function to use.
    weight_gradients: (np.ndarray or None) - Gradients of the weights, initialized to None.
    bias_gradients: (np.ndarray or None) - Gradients of the biases, initialized to None.

activate(self, Z): Apply activation function.

activation_derivative(self, Z): Apply activation derivative.

backward(self, dA, reg_lambda): Backward pass of the layer.

forward(self, X): Forward pass of the layer.

zero_grad(self): Reset the gradients of the weights and biases to zero.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class FlattenLayer(builtins.object)

A layer that flattens multi-dimensional input into a 2D array (batch_size, flattened_size).

Useful for transitioning from convolutional layers to dense layers.

Attributes:
    input_shape: (tuple) - Shape of the input data (excluding batch size).
    output_size: (int) - Size of the flattened output vector.
    input_cache: (np.ndarray) - Cached input for backpropagation.
    input_size: (int) - Size of the input (same as input_shape).
    output_size: (int) - Size of the output (same as output_size).

Methods defined here:

__init__(self): Initializes the layer with default attributes.

Attributes:
    input_shape: (tuple or None) - Shape of the input data, to be set dynamically during the forward pass.
    output_size: (int or None) - Size of the output data, to be set dynamically during the forward pass.
    input_cache: (any or None) - Cache to store input data for use during backpropagation.
    input_size: (int or None) - Flattened size of the input, calculated as channels * height * width.
    output_size: (int or None) - Flattened size of the output, same as input_size.

backward(self, dA, reg_lambda=0): Reshapes the gradient back to the original input shape.

Args:
    dA (np.ndarray): Gradient of the loss with respect to the layer's output,
                    shape (batch_size, flattened_size)
    reg_lambda (float): Regularization parameter (unused in FlattenLayer).

Returns:
    np.ndarray: Gradient with respect to the input, reshaped to original input shape.

forward(self, X): Flattens the input tensor.

Args:
    X: (np.ndarray) - Input data of shape (batch_size, channels, height, width)
                   or any multi-dimensional shape after batch dimension.

Returns:
    np.ndarray: Flattened output of shape (batch_size, flattened_size)

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class HuberLoss(builtins.object)

Custom Huber loss implementation using numpy.

Formula: mean(0.5 * (y_true - y_pred)**2) if abs(y_true - y_pred) <= delta else mean(delta * (abs(y_true - y_pred) - delta / 2))

Methods:
__call__(self, y_true, y_pred, delta=1.0): Calculate the Huber loss.

Methods defined here:

__call__(self, y_true, y_pred, delta=1.0): Calculate the Huber loss.

Args:
    y_true (np.ndarray): The true labels of shape (num_samples,).
    y_pred (np.ndarray): The predicted values of shape (num_samples,).
    delta (float): The threshold for the Huber loss.

Returns:
    float: The Huber loss.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITAdadeltaOptimizer(JITAdadeltaOptimizer)

JITAdadeltaOptimizer(*args, **kwargs)

Adadelta optimizer class for training neural networks.

Formula:
    E[g^2]_t = rho * E[g^2]_{t-1} + (1 - rho) * g^2
    Delta_x = - (sqrt(E[delta_x^2]_{t-1} + epsilon) / sqrt(E[g^2]_t + epsilon)) * g
    E[delta_x^2]_t = rho * E[delta_x^2]_{t-1} + (1 - rho) * Delta_x^2
Derived from: https://arxiv.org/abs/1212.5701
Args:
    learning_rate (float, optional): The learning rate for the optimizer. Defaults to 1.0.
    rho (float, optional): The decay rate. Defaults to 0.95.
    epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-6.
    reg_lambda (float, optional): The regularization parameter. Defaults to 0.0.

Method resolution order:: JITAdadeltaOptimizer; JITAdadeltaOptimizer; builtins.object

Data and other attributes defined here:

class_type = jitclass.JITAdadeltaOptimizer#2aed164c710<learni...float64, 3d, C),E_delta_x2:array(float64, 3d, C)>

Methods inherited from JITAdadeltaOptimizer:

__init__(self, learning_rate=1.0, rho=0.95, epsilon=1e-06, reg_lambda=0.0): Initializes the optimizer with specified hyperparameters.

Args:
    learning_rate: (float), optional - The learning rate for the optimizer (default is 1.0).
    rho: (float), optional - The decay rate for the running averages (default is 0.95).
    epsilon: (float), optional - A small value to prevent division by zero (default is 1e-6).
    reg_lambda: (float), optional - The regularization parameter (default is 0.0).

Attributes:
    E_g2: (np.ndarray) - Running average of squared gradients.
    E_delta_x2: (np.ndarray) - Running average of squared parameter updates.

initialize(self, layers): Initializes the running averages for each layer's weights.

Args:
layers: (list) - List of layers in the neural network.

Returns:
None

update(self, layer, dW, db, index): Updates the weights and biases of a layer using the Adadelta optimization algorithm.

Args:
    layer: (Layer) - The layer to update.
    dW: (np.ndarray) - The gradient of the weights.
    db: (np.ndarray) - The gradient of the biases.
    index: (int) - The index of the layer.

Returns:
    None

update_layers(self, layers, dWs, dbs): Updates all layers' weights and biases using the Adadelta optimization algorithm.

Args:
    layers: (list) - List of layers in the neural network.
    dWs: (list of np.ndarray) - Gradients of the weights for each layer.
    dbs: (list of np.ndarray) - Gradients of the biases for each layer.

Returns:
    None

Data descriptors inherited from JITAdadeltaOptimizer:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITAdamOptimizer(JITAdamOptimizer)

JITAdamOptimizer(*args, **kwargs)

Adam optimizer class for training neural networks.

Formula: w = w - alpha * m_hat / (sqrt(v_hat) + epsilon) - lambda * w
Derived from: https://arxiv.org/abs/1412.6980
Args:
    learning_rate (float, optional): The learning rate for the optimizer. Defaults to 0.001.
    beta1 (float, optional): The exponential decay rate for the first moment estimates. Defaults to 0.9.
    beta2 (float, optional): The exponential decay rate for the second moment estimates. Defaults to 0.999.
    epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-8.
    reg_lambda (float, optional): The regularization parameter. Defaults to 0.01.

Method resolution order:: JITAdamOptimizer; JITAdamOptimizer; builtins.object

Data and other attributes defined here:

class_type = jitclass.JITAdamOptimizer#2aed163c2d0<learning_r...t64, 2d, A),db:array(float64, 2d, A),index:int32>

Methods inherited from JITAdamOptimizer:

__init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, reg_lambda=0.01): Initializes the optimizer with the specified hyperparameters.

Args:
    learning_rate: (float), optional - The learning rate for the optimizer (default is 0.001).
    beta1: (float), optional - Exponential decay rate for the first moment estimates (default is 0.9).
    beta2: (float), optional - Exponential decay rate for the second moment estimates (default is 0.999).
    epsilon: (float), optional - A small value to prevent division by zero (default is 1e-8).
    reg_lambda: (float), optional - Regularization parameter; larger values imply stronger regularization (default is 0.01).

initialize(self, layers): Initializes the first and second moment estimates for each layer's weights.

Args:
layers: (list) - List of layers in the neural network.

Returns:
None

update(self, layer, dW, db, index): Updates the weights and biases of a layer using the Adam optimization algorithm.

Args:
    layer: (Layer) - The layer to update.
    dW: (np.ndarray) - The gradient of the weights.
    db: (np.ndarray) - The gradient of the biases.
    index: (int) - The index of the layer.

Returns:
    None

update_layers(self, layers, dWs, dbs): Updates all layers' weights and biases using the Adam optimization algorithm.

Args:
    layers: (list) - List of layers in the neural network.
    dWs: (list of np.ndarray) - Gradients of the weights for each layer.
    dbs: (list of np.ndarray) - Gradients of the biases for each layer.

Returns:
    None

Data descriptors inherited from JITAdamOptimizer:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITBCEWithLogitsLoss(builtins.object)

Custom binary cross entropy loss with logits implementation using numba.

Formula: -mean(y * log(p) + (1 - y) * log(1 - p))

Methods:
calculate_loss(self, logits, targets): Calculate the binary cross entropy loss.

Methods defined here:

__init__(self): Initializes the class with default values for logits and targets.

Attributes:
    logits (numpy.ndarray): A 2D array initialized to zeros with shape (1, 1),
                            representing the predicted values.
    targets (numpy.ndarray): A 2D array initialized to zeros with shape (1, 1),
                             representing the true target values.

calculate_loss(self, logits, targets): Calculate the binary cross entropy loss.

Args:
    logits (np.ndarray): The logits (predicted values) of shape (num_samples,).
    targets (np.ndarray): The target labels of shape (num_samples,).

Returns:
    float: The binary cross entropy loss.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITConvLayer(JITConvLayer)

JITConvLayer(*args, **kwargs)

A convolutional layer implementation for neural networks using Numba JIT compilation.

Method resolution order:: JITConvLayer; JITConvLayer; builtins.object

Data and other attributes defined here:

class_type = jitclass.JITConvLayer#2aecfc74710<in_channels:in...2,w_out:int32,input_size:int32,output_size:int32>

Methods inherited from JITConvLayer:

__init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, activation='relu'): Initializes the convolutional layer with weights, biases, and activation function.

Args:
    in_channels: (int) - Number of input channels.
    out_channels: (int) - Number of output channels.
    kernel_size: (int) - Size of the convolutional kernel (assumes square kernels).
    stride: (int), optional - Stride of the convolution (default is 1).
    padding: (int), optional - Padding added to the input (default is 0).
    activation: (str), optional - Activation function to use (default is "relu").

Attributes:
    weights: (np.ndarray) - Convolutional weight matrix initialized using He initialization.
    biases: (np.ndarray) - Bias vector initialized to zeros.
    activation: (str) - Activation function for the layer.
    weight_gradients: (np.ndarray) - Gradients of the weights, initialized to zeros.
    bias_gradients: (np.ndarray) - Gradients of the biases, initialized to zeros.
    input_cache: (np.ndarray) - Cached input values for backpropagation, initialized to zeros.
    X_cols: (np.ndarray) - Cached column-transformed input for backpropagation, initialized to zeros.
    X_padded: (np.ndarray) - Cached padded input for backpropagation, initialized to zeros.
    h_out: (int) - Height of the output feature map, initialized to 0.
    w_out: (int) - Width of the output feature map, initialized to 0.
    input_size: (int) - Number of input channels.
    output_size: (int) - Number of output channels.

activate(self, Z): Apply activation function.

activation_derivative(self, Z): Apply activation derivative.

backward(self, d_out, reg_lambda=0): Backward pass for convolutional layer.

Args:
    d_out (np.ndarray): Gradient of the loss with respect to the layer output
    reg_lambda (float, optional): Regularization parameter

Returns:
    dX: Gradient with respect to the input X

forward(self, X): Forward pass for convolutional layer.

Args:
X: numpy array with shape (batch_size, in_channels, height, width)

Returns:
Output feature maps after convolution and activation.

zero_grad(self): Reset the gradients of the weights and biases to zero.

Data descriptors inherited from JITConvLayer:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITCrossEntropyLoss(builtins.object)

Custom cross entropy loss implementation using numba for multi-class classification.

Formula: -sum(y * log(p) + (1 - y) * log(1 - p)) / m
Methods:
calculate_loss(self, logits, targets): Calculate the cross entropy loss.

Methods defined here:

__init__(self): Initializes the instance variables for the class.

Args:
    logits: (np.ndarray) - A 2D array initialized to zeros with shape (1, 1),
               representing the predicted values or outputs of the model.
    targets: (np.ndarray) - A 2D array initialized to zeros with shape (1, 1),
                representing the ground truth or target values.

calculate_loss(self, logits, targets): Calculate the cross entropy loss.

Args:
    logits (np.ndarray): The logits (predicted values) of shape (num_samples, num_classes).
    targets (np.ndarray): The target labels of shape (num_samples,).

Returns:
    float: The cross entropy loss.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITDenseLayer(JITDenseLayer)

JITDenseLayer(*args, **kwargs)

Initializes a fully connected layer object, where each neuron is connected to all neurons in the previous layer.

Each layer consists of weights, biases, and an activation function.

Args:
    input_size (int): The size of the input to the layer.
    output_size (int): The size of the output from the layer.
    activation (str): The activation function to be used in the layer.

Attributes:
    weights (np.ndarray): Weights of the layer.
    biases (np.ndarray): Biases of the layer.
    activation (str): Activation function name.
    weight_gradients (np.ndarray): Gradients of the weights.
    bias_gradients (np.ndarray): Gradients of the biases.
    input_cache (np.ndarray): Cached input for backpropagation.
    output_cache (np.ndarray): Cached output for backpropagation.

Methods:
    zero_grad(): Resets the gradients of the weights and biases to zero.
    forward(X): Performs the forward pass of the layer.
    backward(dA, reg_lambda): Performs the backward pass of the layer.
    activate(Z): Applies the activation function.
    activation_derivative(Z): Applies the derivative of the activation function.

Method resolution order:: JITDenseLayer; JITDenseLayer; builtins.object

Data and other attributes defined here:

class_type = jitclass.JITDenseLayer#2aecfc53c10<weights:array...oat64, 2d, C),input_size:int32,output_size:int32>

Methods inherited from JITDenseLayer:

__init__(self, input_size, output_size, activation='relu'): Initializes the layer with weights, biases, and activation function.

Args:
    input_size: (int) - The number of input features to the layer.
    output_size: (int) - The number of output features from the layer.
    activation: (str), optional - The activation function to use (default is "relu").

Attributes:
    weights: (np.ndarray) - The weight matrix initialized using He initialization for ReLU or Leaky ReLU,
                or Xavier initialization for other activations.
    biases: (np.ndarray) - The bias vector initialized to zeros.
    activation: (str) - The activation function for the layer.
    weight_gradients: (np.ndarray) - Gradients of the weights, initialized to zeros.
    bias_gradients: (np.ndarray) - Gradients of the biases, initialized to zeros.
    input_cache: (np.ndarray) - Cached input values for backpropagation, initialized to zeros.
    output_cache: (np.ndarray) - Cached output values for backpropagation, initialized to zeros.
    input_size: (int) - The number of input features to the layer.
    output_size: (int) - The number of output features from the layer.

activate(self, Z): Apply activation function.

activation_derivative(self, Z): Apply activation derivative.

backward(self, dA, reg_lambda): Perform the backward pass of the layer.

forward(self, X): Perform the forward pass of the layer.

zero_grad(self): Reset the gradients of the weights and biases to zero.

Data descriptors inherited from JITDenseLayer:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITFlattenLayer(JITFlattenLayer)

JITFlattenLayer(*args, **kwargs)

A layer that flattens multi-dimensional input into a 2D array (batch_size, flattened_size).

Useful for transitioning from convolutional layers to dense layers.

Method resolution order:: JITFlattenLayer; JITFlattenLayer; builtins.object

Data and other attributes defined here:

class_type = jitclass.JITFlattenLayer#2aecfc69b50<input_shape...put_cache:array(float64, 4d, A),input_size:int32>

Methods inherited from JITFlattenLayer:

__init__(self): Initializes the layer with placeholder values for input and output dimensions.

Attributes:
    input_shape: (tuple) - Shape of the input data, initialized to (0, 0, 0).
               This will be set during the forward pass.
    output_size: (int) - Size of the output, initialized to 0.
             This will be set during the forward pass.
    input_size: (int) - Size of the input, initialized to 0.
            This will be set during the forward pass.
    input_cache: (any) - Cache for input data, to be set during the forward pass.

backward(self, dA, reg_lambda=0): Reshapes the gradient back to the original input shape.

Args:
    dA (np.ndarray): Gradient of the loss with respect to the layer's output,
                   shape (batch_size, flattened_size)
    reg_lambda (float): Regularization parameter (unused in FlattenLayer).

Returns:
    np.ndarray: Gradient with respect to the input, reshaped to original input shape.

forward(self, X): Flattens the input tensor.

Args:
X (np.ndarray): Input data of shape (batch_size, channels, height, width)

Returns:
np.ndarray: Flattened output of shape (batch_size, flattened_size)

Data descriptors inherited from JITFlattenLayer:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITHuberLoss(builtins.object)

JITHuberLoss(delta=1.0)

Custom Huber loss implementation using numba.

Attributes:
delta (float): The threshold parameter for Huber loss. Default is 1.0.

Methods defined here:

__init__(self, delta=1.0): Initializes the JITHuberLoss instance.

Args:
delta (float): The threshold at which the loss function transitions
from quadratic to linear. Default is 1.0.

calculate_loss(self, y_pred, y_true): Calculate the Huber loss using the stored delta.

Args:
    y_pred (np.ndarray): Predicted values.
    y_true (np.ndarray): True target values.

Returns:
    float: The calculated Huber loss.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITMeanAbsoluteErrorLoss(builtins.object)

Custom mean absolute error loss implementation using numba.

Methods defined here:

calculate_loss(self, y_pred, y_true): Calculate the mean absolute error loss.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITMeanSquaredErrorLoss(builtins.object)

Custom mean squared error loss implementation using numba.

Methods defined here:

calculate_loss(self, y_pred, y_true): Calculate the mean squared error loss.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITRNNLayer(builtins.object)

JITRNNLayer(input_size, hidden_size, activation='tanh')

A recurrent layer implementation for neural networks using Numba JIT compilation.

Methods defined here:

__init__(self, input_size, hidden_size, activation='tanh'): Will be implemented later.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITSGDOptimizer(JITSGDOptimizer)

JITSGDOptimizer(*args, **kwargs)

Stochastic Gradient Descent (SGD) optimizer class for training neural networks.

Formula: w = w - learning_rate * dW, b = b - learning_rate * db
Args:
    learning_rate (float, optional): The learning rate for the optimizer. Defaults to 0.001.
    momentum (float, optional): The momentum factor. Defaults to 0.0.
    reg_lambda (float, optional): The regularization parameter. Defaults to 0.0.

Method resolution order:: JITSGDOptimizer; JITSGDOptimizer; builtins.object

Data and other attributes defined here:

class_type = jitclass.JITSGDOptimizer#2aed163e6d0<learning_ra...eg_lambda:float64,velocity:array(float64, 3d, C)>

Methods inherited from JITSGDOptimizer:

__init__(self, learning_rate=0.001, momentum=0.0, reg_lambda=0.0): Initializes the optimizer with specified hyperparameters.

Args:
    learning_rate: (float), optional - The learning rate for the optimizer (default is 0.001).
    momentum: (float), optional - The momentum factor for the optimizer (default is 0.0).
    reg_lambda: (float), optional - The regularization parameter (default is 0.0).

Attributes:
    learning_rate: (float) - The learning rate for the optimizer.
    momentum: (float) - The momentum factor for the optimizer.
    reg_lambda: (float) - The regularization parameter.
    velocity: (np.ndarray) - The velocity used for momentum updates, initialized to zeros.

initialize(self, layers): Initializes the velocity for each layer's weights.

Args:
layers: (list) - List of layers in the neural network.

Returns:
None

update(self, layer, dW, db, index): Updates the weights and biases of a layer using the SGD optimization algorithm.

Args:
    layer: (Layer) - The layer to update.
    dW: (np.ndarray) - The gradient of the weights.
    db: (np.ndarray) - The gradient of the biases.
    index: (int) - The index of the layer.

Returns:
   None

update_layers(self, layers, dWs, dbs): Updates all layers' weights and biases using the SGD optimization algorithm.

Args:
    layers: (list) - List of layers in the neural network.
    dWs: (list of np.ndarray) - Gradients of the weights for each layer.
    dbs: (list of np.ndarray) - Gradients of the biases for each layer.

Returns:
    None

Data descriptors inherited from JITSGDOptimizer:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class MeanAbsoluteErrorLoss(builtins.object)

Custom mean absolute error loss implementation using numpy.

Formula: mean(abs(y_true - y_pred))

Methods:
__call__(self, y_true, y_pred): Calculate the mean absolute error loss.

Methods defined here:

__call__(self, y_true, y_pred): Calculate the mean absolute error loss.

Args:
    y_true (np.ndarray): The true labels of shape (num_samples,).
    y_pred (np.ndarray): The predicted values of shape (num_samples,).

Returns:
    float: The mean absolute error loss.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class MeanSquaredErrorLoss(builtins.object)

Custom mean squared error loss implementation using numpy.

Formula: mean((y_true - y_pred) ** 2)

Methods:
__call__(self, y_true, y_pred): Calculate the mean squared error loss.

Methods defined here:

__call__(self, y_true, y_pred): Calculate the mean squared error loss.

Args:
    y_true (np.ndarray): The true labels of shape (num_samples,).
    y_pred (np.ndarray): The predicted values of shape (num_samples,).

Returns:
    float: The mean squared error loss.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class NeuralNetworkBase(builtins.object)

NeuralNetworkBase(layers, dropout_rate=0.0, reg_lambda=0.0, activations=None, loss_function=None, regressor=False)

NeuralNetworkBase is an abstract base class for building neural networks.

It provides a framework for initializing layers, performing forward and backward propagation,
training, evaluating, and predicting with a neural network. Subclasses should implement
the abstract methods to define specific behavior.

Attributes:
    layer_sizes (list): Sizes of the layers in the network.
    dropout_rate (float): Dropout rate for regularization.
    reg_lambda (float): Regularization strength for L2 regularization.
    activations (list): Activation functions for each layer.
    layers (list): List of layer objects or configurations.
    weights (list): List of weight matrices for the layers.
    biases (list): List of bias vectors for the layers.
    layer_outputs (ndarray): Outputs of each layer during forward propagation.
    is_binary (bool): Whether the network is for binary classification.

Methods:
    __init__(layers, dropout_rate=0.0, reg_lambda=0.0, activations=None, loss_function=None, regressor=False):
        Initializes the neural network with the given layers and parameters.
    initialize_layers():
        Abstract method to initialize the weights and biases of the layers.
    forward(X, training=True):
        Abstract method to perform forward propagation through the network.
    backward(y):
        Abstract method to perform backward propagation through the network.
    train(X_train, y_train, X_val=None, y_val=None, optimizer=None, epochs=100,
            batch_size=32, early_stopping_threshold=10, lr_scheduler=None, p=True,
            use_tqdm=True, n_jobs=1, track_metrics=False, track_adv_metrics=False):
        Abstract method to train the neural network using the provided training data.
    evaluate(X, y):
        Abstract method to evaluate the neural network on the provided data.
    predict(X):
        Abstract method to make predictions using the trained neural network.
    calculate_loss(X, y):
        Abstract method to calculate the loss of the neural network.
    apply_dropout(X):
        Applies dropout to the activation values for regularization.
    compute_l2_reg(weights):
        Computes the L2 regularization term for the given weights.
    calculate_precision_recall_f1(X, y):
        Calculates precision, recall, and F1 score for the predictions.
    create_scheduler(scheduler_type, optimizer, **kwargs):
        Creates a learning rate scheduler based on the specified type.
    plot_metrics(save_dir=None):
        Plots the training and validation metrics, including loss, accuracy,
        learning rate, and optionally precision, recall, and F1 score.

Methods defined here:

__init__(self, layers, dropout_rate=0.0, reg_lambda=0.0, activations=None, loss_function=None, regressor=False): Initializes the neural network with the specified layers, dropout rate, regularization, and activations.

Args:
    layers: (list) - A list of integers representing the sizes of each layer or a list of Layer objects.
    dropout_rate: (float), optional - The dropout rate for regularization (default is 0.0).
    reg_lambda: (float), optional - The regularization strength (default is 0.0).
    activations: (list of str), optional - A list of activation functions for each layer (default is None, which sets "relu" for hidden layers and "softmax" for the output layer).
    loss_function: (callable), optional - Custom loss function to use (default is None, which uses the default calculate_loss implementation).
    regressor: (bool), optional - If True, the network is treated as a regressor (default is False).

Raises:
    ValueError: If `layers` is not a list of integers or a list of Layer objects.

apply_dropout(self, X): Applies dropout to the activation X.

Args:
X: (ndarray) - Activation values.

Returns:
ndarray: Activation values after applying dropout.

backward(self, y): Performs backward propagation through the network.

calculate_loss(self, X, y): Calculates the loss of the neural network.

calculate_precision_recall_f1(self, X, y): Calculates precision, recall, and F1 score.

Args:
    X: (ndarray) - Input data
    y: (ndarray) - Target labels
Returns:
    precision: (float) - Precision score
    recall: (float) - Recall score
    f1: (float) - F1 score

compute_l2_reg(self, weights): Computes the L2 regularization term.

Args:
weights: (list) - List of weight matrices.

Returns:
float: L2 regularization term.

create_scheduler(self, scheduler_type, optimizer, **kwargs): Creates a learning rate scheduler.

evaluate(self, X, y): Evaluates the neural network on the provided data.

forward(self, X, training=True): Performs forward propagation through the network.

initialize_layers(self): Initializes the weights and biases of the layers.

plot_metrics(self, save_dir=None): Plots the training and validation metrics.

predict(self, X): Makes predictions using the trained neural network.

train(self, X_train, y_train, X_val=None, y_val=None, optimizer=None, epochs=100, batch_size=32, early_stopping_threshold=10, lr_scheduler=None, p=True, use_tqdm=True, n_jobs=1, track_metrics=False, track_adv_metrics=False): Trains the neural network using the provided training data.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class NumbaBackendNeuralNetwork(sega_learn.neural_networks.neuralNetworkBase.NeuralNetworkBase)

NumbaBackendNeuralNetwork(layers, dropout_rate=0.2, reg_lambda=0.01, activations=None, loss_function=None, regressor=False, compile_numba=True, progress_bar=True)

A neural network implementation using Numba for Just-In-Time (JIT) compilation to optimize performance.

This class supports forward and backward propagation, training, evaluation, and hyperparameter tuning
with various optimizers and activation functions.

Attributes:
    compiled (bool): Indicates whether Numba functions are compiled.
    trainable_layers (list): Layers with trainable parameters (weights and biases).
    progress_bar (bool): Whether to display a progress bar during training.

Methods:
    __init__(layers, dropout_rate, reg_lambda, activations, compile_numba, progress_bar):
        Initializes the neural network with the specified parameters.
    store_init_layers():
        Stores the initial layers and their parameters for restoration after initialization.
    restore_layers():
        Restores the layers and their parameters after initialization.
    initialize_new_layers():
        Initializes the layers of the neural network with specified sizes and activation functions.
    forward(X, training):
        Performs forward propagation through the neural network.
    backward(y):
        Performs backward propagation to calculate gradients.
    is_not_instance_of_classes(obj, classes):
        Checks if an object is not an instance of any class in a list of classes.
    train(X_train, y_train, X_val, y_val, optimizer, epochs, batch_size, early_stopping_threshold,
          lr_scheduler, p, use_tqdm, n_jobs, track_metrics, track_adv_metrics, save_animation,
          save_path, fps, dpi, frame_every):
        Trains the neural network model with the specified parameters.
    evaluate(X, y):
        Evaluates the neural network on the given data and returns accuracy and predictions.
    predict(X):
        Predicts the output for the given input data.
    calculate_loss(X, y):
        Calculates the loss with L2 regularization.
    _create_optimizer(optimizer_type, learning_rate, JIT):
        Helper method to create optimizer instances.
    tune_hyperparameters(X_train, y_train, X_val, y_val, param_grid, layer_configs, optimizer_types,
                         lr_range, epochs, batch_size):
        Performs hyperparameter tuning using grid search.
    compile_numba_functions(progress_bar):
        Compiles all Numba JIT functions to improve performance.

Method resolution order:: NumbaBackendNeuralNetwork; sega_learn.neural_networks.neuralNetworkBase.NeuralNetworkBase; builtins.object

Methods defined here:

__init__(self, layers, dropout_rate=0.2, reg_lambda=0.01, activations=None, loss_function=None, regressor=False, compile_numba=True, progress_bar=True): Initializes the Numba backend neural network.

Args:
    layers: (list) - List of layer sizes or Layer objects.
    dropout_rate: (float) - Dropout rate for regularization.
    reg_lambda: (float) - L2 regularization parameter.
    activations: (list) - List of activation functions for each layer.
    loss_function: (callable) optional - Custom loss function (default: selects based on task).
    regressor: (bool) - Whether the model is a regressor (default is False).
    compile_numba: (bool) - Whether to compile Numba functions.
    progress_bar: (bool) - Whether to display a progress bar.

backward(self, y): Performs backward propagation to calculate the gradients.

Args:
y (ndarray): Target labels of shape (m, output_size).

calculate_loss(self, X, y): Calculates the loss with L2 regularization.

Args:
    X (ndarray): Input data.
    y (ndarray): Target labels.

Returns:
    float: The calculated loss value.

compile_numba_functions(self, progress_bar=True): Compiles all Numba JIT functions to improve performance.

Args:
progress_bar (bool): Whether to display a progress bar.

evaluate(self, X, y): Evaluates the neural network on the given data.

Args:
    X (ndarray): Input data.
    y (ndarray): Target labels.

Returns:
    tuple: Accuracy and predicted labels.

forward(self, X, training=True): Performs forward propagation through the neural network.

Args:
    X (ndarray): Input data of shape (batch_size, input_size).
    training (bool): Whether the network is in training mode (applies dropout).

Returns:
    ndarray: Output predictions of shape (batch_size, output_size).

initialize_new_layers(self): Initializes the layers of the neural network.

Each layer is created with the specified number of neurons and activation function.

predict(self, X): Predicts the output for the given input data.

Args:
X (ndarray): Input data.

Returns:
ndarray: Predicted outputs.

restore_layers(self): Restores the layers after initialization.

store_init_layers(self): Stores the layers to restore after initialization.

train(self, X_train, y_train, X_val=None, y_val=None, optimizer=None, epochs=100, batch_size=32, early_stopping_threshold=10, lr_scheduler=None, p=True, use_tqdm=True, n_jobs=1, track_metrics=False, track_adv_metrics=False, save_animation=False, save_path='training_animation.mp4', fps=1, dpi=100, frame_every=1): Trains the neural network model.

Args:
    X_train: (ndarray) - Training data features.
    y_train: (ndarray) - Training data labels.
    X_val: (ndarray) - Validation data features, optional.
    y_val: (ndarray) - Validation data labels, optional.
    optimizer: (Optimizer) - Optimizer for updating parameters (default: JITAdam, lr=0.0001).
    epochs: (int) - Number of training epochs (default: 100).
    batch_size: (int) - Batch size for mini-batch gradient descent (default: 32).
    early_stopping_threshold: (int) - Patience for early stopping (default: 10).
    lr_scheduler: (Scheduler) - Learning rate scheduler (default: None).
    p: (bool) - Whether to print training progress (default: True).
    use_tqdm: (bool) - Whether to use tqdm for progress bar (default: True).
    n_jobs: (int) - Number of jobs for parallel processing (default: 1).
    track_metrics: (bool) - Whether to track training metrics (default: False).
    track_adv_metrics: (bool) - Whether to track advanced metrics (default: False).
    save_animation: (bool) - Whether to save the animation of metrics (default: False).
    save_path: (str) - Path to save the animation file. File extension must be .mp4 or .gif (default: 'training_animation.mp4').
    fps: (int) - Frames per second for the saved animation (default: 1).
    dpi: (int) - DPI for the saved animation (default: 100).
    frame_every: (int) - Capture frame every N epochs (to reduce file size) (default: 1).

tune_hyperparameters(self, X_train, y_train, X_val, y_val, param_grid, layer_configs=None, optimizer_types=None, lr_range=(0.0001, 0.01, 5), epochs=30, batch_size=32): Performs hyperparameter tuning using grid search.

Args:
    X_train: (np.ndarray) - Training feature data.
    y_train: (np.ndarray) - Training target data.
    X_val: (np.ndarray) - Validation feature data.
    y_val: (np.ndarray) - Validation target data.
    param_grid: (dict) - Dictionary of parameters to try.
    layer_configs: (list), optional - List of layer configurations (default is None).
    optimizer_types: (list), optional - List of optimizer types (default is None).
    lr_range: (tuple), optional - (min_lr, max_lr, num_steps) for learning rates (default is (0.0001, 0.01, 5)).
    epochs: (int), optional - Max epochs for each trial (default is 30).
    batch_size: (int), optional - Batch size for training (default is 32).

Returns:
    best_params: (dict) - Best hyperparameters found.
    best_accuracy: (float) - Best validation accuracy.

Static methods defined here:

is_not_instance_of_classes(obj, classes): Checks if an object is not an instance of any class in a list of classes.

Args:
    obj: The object to check.
    classes: A list of classes.

Returns:
    bool: True if the object is not an instance of any class in the list of classes, False otherwise.

Methods inherited from sega_learn.neural_networks.neuralNetworkBase.NeuralNetworkBase:

apply_dropout(self, X): Applies dropout to the activation X.

Args:
X: (ndarray) - Activation values.

Returns:
ndarray: Activation values after applying dropout.

calculate_precision_recall_f1(self, X, y): Calculates precision, recall, and F1 score.

Args:
    X: (ndarray) - Input data
    y: (ndarray) - Target labels
Returns:
    precision: (float) - Precision score
    recall: (float) - Recall score
    f1: (float) - F1 score

compute_l2_reg(self, weights): Computes the L2 regularization term.

Args:
weights: (list) - List of weight matrices.

Returns:
float: L2 regularization term.

create_scheduler(self, scheduler_type, optimizer, **kwargs): Creates a learning rate scheduler.

initialize_layers(self): Initializes the weights and biases of the layers.

plot_metrics(self, save_dir=None): Plots the training and validation metrics.

Data descriptors inherited from sega_learn.neural_networks.neuralNetworkBase.NeuralNetworkBase:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class RNNLayer(builtins.object)

RNNLayer(input_size, hidden_size, activation='tanh')

Will be implemented later.

Methods defined here:

__init__(self, input_size, hidden_size, activation='tanh'): Will be implemented later.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class SGDOptimizer(builtins.object)

SGDOptimizer(learning_rate=0.001, momentum=0.0, reg_lambda=0.0)

Stochastic Gradient Descent (SGD) optimizer class for training neural networks.

Formula: w = w - learning_rate * dW, b = b - learning_rate * db

Args:
    learning_rate (float, optional): The learning rate for the optimizer. Defaults to 0.001.
    momentum (float, optional): The momentum factor. Defaults to 0.0.
    reg_lambda (float, optional): The regularization parameter. Defaults to 0.0.

Methods defined here:

__init__(self, learning_rate=0.001, momentum=0.0, reg_lambda=0.0): Initializes the optimizer with specified parameters.

Args:
    learning_rate (float, optional): The step size for updating weights. Defaults to 0.001.
    momentum (float, optional): The momentum factor to accelerate gradient descent. Defaults to 0.0.
    reg_lambda (float, optional): The regularization parameter to prevent overfitting. Defaults to 0.0.

initialize(self, layers): Initializes the velocity for each layer's weights.

Args:
layers (list): List of layers in the neural network.

Returns:
None

update(self, layer, dW, db, index): Updates the weights and biases of a layer using the SGD optimization algorithm.

Args:
    layer (Layer): The layer to update.
    dW (ndarray): The gradient of the weights.
    db (ndarray): The gradient of the biases.
    index (int): The index of the layer.

Returns:
    None

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class lr_scheduler_exp(builtins.object)

lr_scheduler_exp(optimizer, lr_decay=0.1, lr_decay_epoch=10)

Learning rate scheduler class for training neural networks.

Reduces the learning rate exponentially by lr_decay every lr_decay_epoch epochs.

Methods defined here:

__init__(self, optimizer, lr_decay=0.1, lr_decay_epoch=10): Initializes the scheduler with the given optimizer and learning rate decay parameters.

Args:
    optimizer (Optimizer): The optimizer whose learning rate will be scheduled.
    lr_decay (float, optional): The factor by which the learning rate will be multiplied at each decay step. Default is 0.1.
    lr_decay_epoch (int, optional): The number of epochs after which the learning rate will be decayed. Default is 10.

__repr__(self): Returns a string representation of the scheduler.

reduce(self): Reduces the learning rate exponentially.

step(self, epoch): Adjusts the learning rate based on the current epoch. Decays the learning rate by lr_decay every lr_decay_epoch epochs.

Args:
epoch (int): The current epoch number.
Returns: None

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class lr_scheduler_plateau(builtins.object)

lr_scheduler_plateau(lr_scheduler, patience=5, threshold=0.01)

A custom learning rate scheduler that adjusts the learning rate based on the plateau of the loss function.

Args:
    lr_scheduler (object): The learning rate scheduler object.
    patience (int): The number of epochs to wait for improvement before reducing the learning rate. Default is 5.
    threshold (float): The minimum improvement threshold required to update the best loss. Default is 0.01.

Methods:
    step(loss): Updates the learning rate based on the loss value.

Methods defined here:

__init__(self, lr_scheduler, patience=5, threshold=0.01): Initializes the scheduler with the given learning rate scheduler, patience, and threshold.

Args:
    lr_scheduler (torch.optim.lr_scheduler): The learning rate scheduler to be used.
    patience (int, optional): Number of epochs to wait for improvement before taking action. Defaults to 5.
    threshold (float, optional): Minimum change in the monitored value to qualify as an improvement. Defaults to 0.01.

__repr__(self): Returns a string representation of the scheduler.

step(self, epoch, loss): Updates the learning rate based on the loss value.

Args:
epoch (int): The current epoch number.
loss (float): The current loss value.

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class lr_scheduler_step(builtins.object)

lr_scheduler_step(optimizer, lr_decay=0.1, lr_decay_epoch=10)

Learning rate scheduler class for training neural networks.

Reduces the learning rate by a factor of lr_decay every lr_decay_epoch epochs.

Args:
    optimizer (Optimizer): The optimizer to adjust the learning rate for.
    lr_decay (float, optional): The factor to reduce the learning rate by. Defaults to 0.1.
    lr_decay_epoch (int, optional): The number of epochs to wait before decaying the learning rate. Defaults to 10

Methods defined here:

__init__(self, optimizer, lr_decay=0.1, lr_decay_epoch=10): Initializes the scheduler with the given optimizer and learning rate decay parameters.

Args:
    optimizer (Optimizer): The optimizer whose learning rate will be scheduled.
    lr_decay (float, optional): The factor by which the learning rate will be multiplied at each decay step. Default is 0.1.
    lr_decay_epoch (int, optional): The number of epochs after which the learning rate will be decayed. Default is 10.

__repr__(self): Returns a string representation of the scheduler.

reduce(self): Reduces the learning rate by the decay factor.

step(self, epoch): Adjusts the learning rate based on the current epoch. Decays the learning rate by lr_decay every lr_decay_epoch epochs.

Args:
epoch (int): The current epoch number.

Returns:
None

Data descriptors defined here:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

Data
		__all__ = ['AdamOptimizer', 'SGDOptimizer', 'AdadeltaOptimizer', 'lr_scheduler_exp', 'lr_scheduler_plateau', 'lr_scheduler_step', 'CrossEntropyLoss', 'BCEWithLogitsLoss', 'MeanSquaredErrorLoss', 'MeanAbsoluteErrorLoss', 'HuberLoss', 'DenseLayer', 'FlattenLayer', 'ConvLayer', 'RNNLayer', 'Activation', 'NeuralNetworkBase', 'BaseBackendNeuralNetwork', 'JITAdamOptimizer', 'JITSGDOptimizer', ...]