Python: module sega_learn.neural_networks.optimizers

sega_learn.neural_networks.optimizers_jit

Modules

numpy

Classes

JITAdadeltaOptimizer(builtins.object)

JITAdadeltaOptimizer

JITAdamOptimizer(builtins.object)

JITAdamOptimizer

JITSGDOptimizer(builtins.object)

JITSGDOptimizer

class JITAdadeltaOptimizer(JITAdadeltaOptimizer)

JITAdadeltaOptimizer(*args, **kwargs)

Adadelta optimizer class for training neural networks.

Formula:
    E[g^2]_t = rho * E[g^2]_{t-1} + (1 - rho) * g^2
    Delta_x = - (sqrt(E[delta_x^2]_{t-1} + epsilon) / sqrt(E[g^2]_t + epsilon)) * g
    E[delta_x^2]_t = rho * E[delta_x^2]_{t-1} + (1 - rho) * Delta_x^2
Derived from: https://arxiv.org/abs/1212.5701
Args:
    learning_rate (float, optional): The learning rate for the optimizer. Defaults to 1.0.
    rho (float, optional): The decay rate. Defaults to 0.95.
    epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-6.
    reg_lambda (float, optional): The regularization parameter. Defaults to 0.0.

Method resolution order:: JITAdadeltaOptimizer; JITAdadeltaOptimizer; builtins.object

Data and other attributes defined here:

class_type = jitclass.JITAdadeltaOptimizer#1ed8a7c8610<learni...float64, 3d, C),E_delta_x2:array(float64, 3d, C)>

Methods inherited from JITAdadeltaOptimizer:

__init__(self, learning_rate=1.0, rho=0.95, epsilon=1e-06, reg_lambda=0.0): Initializes the optimizer with specified hyperparameters.

Args:
    learning_rate: (float), optional - The learning rate for the optimizer (default is 1.0).
    rho: (float), optional - The decay rate for the running averages (default is 0.95).
    epsilon: (float), optional - A small value to prevent division by zero (default is 1e-6).
    reg_lambda: (float), optional - The regularization parameter (default is 0.0).

Attributes:
    E_g2: (np.ndarray) - Running average of squared gradients.
    E_delta_x2: (np.ndarray) - Running average of squared parameter updates.

initialize(self, layers): Initializes the running averages for each layer's weights.

Args:
layers: (list) - List of layers in the neural network.

Returns:
None

update(self, layer, dW, db, index): Updates the weights and biases of a layer using the Adadelta optimization algorithm.

Args:
    layer: (Layer) - The layer to update.
    dW: (np.ndarray) - The gradient of the weights.
    db: (np.ndarray) - The gradient of the biases.
    index: (int) - The index of the layer.

Returns:
    None

update_layers(self, layers, dWs, dbs): Updates all layers' weights and biases using the Adadelta optimization algorithm.

Args:
    layers: (list) - List of layers in the neural network.
    dWs: (list of np.ndarray) - Gradients of the weights for each layer.
    dbs: (list of np.ndarray) - Gradients of the biases for each layer.

Returns:
    None

Data descriptors inherited from JITAdadeltaOptimizer:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITAdamOptimizer(JITAdamOptimizer)

JITAdamOptimizer(*args, **kwargs)

Adam optimizer class for training neural networks.

Formula: w = w - alpha * m_hat / (sqrt(v_hat) + epsilon) - lambda * w
Derived from: https://arxiv.org/abs/1412.6980
Args:
    learning_rate (float, optional): The learning rate for the optimizer. Defaults to 0.001.
    beta1 (float, optional): The exponential decay rate for the first moment estimates. Defaults to 0.9.
    beta2 (float, optional): The exponential decay rate for the second moment estimates. Defaults to 0.999.
    epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-8.
    reg_lambda (float, optional): The regularization parameter. Defaults to 0.01.

Method resolution order:: JITAdamOptimizer; JITAdamOptimizer; builtins.object

Data and other attributes defined here:

class_type = jitclass.JITAdamOptimizer#1ed897c3e90<learning_r...t64, 2d, A),db:array(float64, 2d, A),index:int32>

Methods inherited from JITAdamOptimizer:

__init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, reg_lambda=0.01): Initializes the optimizer with the specified hyperparameters.

Args:
    learning_rate: (float), optional - The learning rate for the optimizer (default is 0.001).
    beta1: (float), optional - Exponential decay rate for the first moment estimates (default is 0.9).
    beta2: (float), optional - Exponential decay rate for the second moment estimates (default is 0.999).
    epsilon: (float), optional - A small value to prevent division by zero (default is 1e-8).
    reg_lambda: (float), optional - Regularization parameter; larger values imply stronger regularization (default is 0.01).

initialize(self, layers): Initializes the first and second moment estimates for each layer's weights.

Args:
layers: (list) - List of layers in the neural network.

Returns:
None

update(self, layer, dW, db, index): Updates the weights and biases of a layer using the Adam optimization algorithm.

Args:
    layer: (Layer) - The layer to update.
    dW: (np.ndarray) - The gradient of the weights.
    db: (np.ndarray) - The gradient of the biases.
    index: (int) - The index of the layer.

Returns:
    None

update_layers(self, layers, dWs, dbs): Updates all layers' weights and biases using the Adam optimization algorithm.

Args:
    layers: (list) - List of layers in the neural network.
    dWs: (list of np.ndarray) - Gradients of the weights for each layer.
    dbs: (list of np.ndarray) - Gradients of the biases for each layer.

Returns:
    None

Data descriptors inherited from JITAdamOptimizer:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

class JITSGDOptimizer(JITSGDOptimizer)

JITSGDOptimizer(*args, **kwargs)

Stochastic Gradient Descent (SGD) optimizer class for training neural networks.

Formula: w = w - learning_rate * dW, b = b - learning_rate * db
Args:
    learning_rate (float, optional): The learning rate for the optimizer. Defaults to 0.001.
    momentum (float, optional): The momentum factor. Defaults to 0.0.
    reg_lambda (float, optional): The regularization parameter. Defaults to 0.0.

Method resolution order:: JITSGDOptimizer; JITSGDOptimizer; builtins.object

Data and other attributes defined here:

class_type = jitclass.JITSGDOptimizer#1ed8a7be510<learning_ra...eg_lambda:float64,velocity:array(float64, 3d, C)>

Methods inherited from JITSGDOptimizer:

__init__(self, learning_rate=0.001, momentum=0.0, reg_lambda=0.0): Initializes the optimizer with specified hyperparameters.

Args:
    learning_rate: (float), optional - The learning rate for the optimizer (default is 0.001).
    momentum: (float), optional - The momentum factor for the optimizer (default is 0.0).
    reg_lambda: (float), optional - The regularization parameter (default is 0.0).

Attributes:
    learning_rate: (float) - The learning rate for the optimizer.
    momentum: (float) - The momentum factor for the optimizer.
    reg_lambda: (float) - The regularization parameter.
    velocity: (np.ndarray) - The velocity used for momentum updates, initialized to zeros.

initialize(self, layers): Initializes the velocity for each layer's weights.

Args:
layers: (list) - List of layers in the neural network.

Returns:
None

update(self, layer, dW, db, index): Updates the weights and biases of a layer using the SGD optimization algorithm.

Args:
    layer: (Layer) - The layer to update.
    dW: (np.ndarray) - The gradient of the weights.
    db: (np.ndarray) - The gradient of the biases.
    index: (int) - The index of the layer.

Returns:
   None

update_layers(self, layers, dWs, dbs): Updates all layers' weights and biases using the SGD optimization algorithm.

Args:
    layers: (list) - List of layers in the neural network.
    dWs: (list of np.ndarray) - Gradients of the weights for each layer.
    dbs: (list of np.ndarray) - Gradients of the biases for each layer.

Returns:
    None

Data descriptors inherited from JITSGDOptimizer:

__dict__: dictionary for instance variables

__weakref__: list of weak references to the object

Functions

adadelta_update_layers(E_g2, E_delta_x2, layers, dWs, dbs, learning_rate, rho, epsilon, reg_lambda): Performs parallelized Adadelta updates for all layers.

Args:
    E_g2: (np.ndarray) - Running average of squared gradients.
    E_delta_x2: (np.ndarray) - Running average of squared parameter updates.
    layers: (list) - List of layers in the neural network.
    dWs: (list of np.ndarray) - Gradients of the weights for each layer.
    dbs: (list of np.ndarray) - Gradients of the biases for each layer.
    learning_rate: (float) - Learning rate for the optimizer.
    rho: (float) - Decay rate.
    epsilon: (float) - Small value to prevent division by zero.
    reg_lambda: (float) - Regularization parameter.

Returns:
    None

adam_update_layers(m, v, t, layers, dWs, dbs, learning_rate, beta1, beta2, epsilon, reg_lambda): Performs parallelized Adam updates for all layers.

Args:
    m: (np.ndarray) - First moment estimates.
    v: (np.ndarray) - Second moment estimates.
    t: (int) - Current time step.
    layers: (list) - List of layers in the neural network.
    dWs: (list of np.ndarray) - Gradients of the weights for each layer.
    dbs: (list of np.ndarray) - Gradients of the biases for each layer.
    learning_rate: (float) - Learning rate for the optimizer.
    beta1: (float) - Exponential decay rate for the first moment estimates.
    beta2: (float) - Exponential decay rate for the second moment estimates.
    epsilon: (float) - Small value to prevent division by zero.
    reg_lambda: (float) - Regularization parameter.

Returns:
    None

sgd_update_layers(velocity, layers, dWs, dbs, learning_rate, momentum, reg_lambda): Performs parallelized SGD updates for all layers.

Args:
    velocity: (np.ndarray) - Velocity for momentum.
    layers: (list) - List of layers in the neural network.
    dWs: (list of np.ndarray) - Gradients of the weights for each layer.
    dbs: (list of np.ndarray) - Gradients of the biases for each layer.
    learning_rate: (float) - Learning rate for the optimizer.
    momentum: (float) - Momentum factor.
    reg_lambda: (float) - Regularization parameter.

Returns:
    None

Data
		CACHE = False float64 = float64 int32 = int32 spec_adadelta = [('learning_rate', float64), ('rho', float64), ('epsilon', float64), ('reg_lambda', float64), ('E_g2', Array(float64, 3, 'C', False, aligned=True)), ('E_delta_x2', Array(float64, 3, 'C', False, aligned=True))] spec_adam = [('learning_rate', float64), ('beta1', float64), ('beta2', float64), ('epsilon', float64), ('reg_lambda', float64), ('m', Array(float64, 3, 'C', False, aligned=True)), ('v', Array(float64, 3, 'C', False, aligned=True)), ('t', int32), ('dW', Array(float64, 2, 'A', False, aligned=True)), ('db', Array(float64, 2, 'A', False, aligned=True)), ('index', int32)] spec_sgd = [('learning_rate', float64), ('momentum', float64), ('reg_lambda', float64), ('velocity', Array(float64, 3, 'C', False, aligned=True))]