| |
- builtins.object
-
- AdadeltaOptimizer
- AdamOptimizer
- SGDOptimizer
class AdadeltaOptimizer(builtins.object) |
|
AdadeltaOptimizer(learning_rate=1.0, rho=0.95, epsilon=1e-06, reg_lambda=0.0)
Adadelta optimizer class for training neural networks.
Formula:
E[g^2]_t = rho * E[g^2]_{t-1} + (1 - rho) * g^2
Delta_x = - (sqrt(E[delta_x^2]_{t-1} + epsilon) / sqrt(E[g^2]_t + epsilon)) * g
E[delta_x^2]_t = rho * E[delta_x^2]_{t-1} + (1 - rho) * Delta_x^2
Derived from: https://arxiv.org/abs/1212.5701
Args:
learning_rate (float, optional): The learning rate for the optimizer. Defaults to 1.0.
rho (float, optional): The decay rate. Defaults to 0.95.
epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-6.
reg_lambda (float, optional): The regularization parameter. Defaults to 0.0. |
|
Methods defined here:
- __init__(self, learning_rate=1.0, rho=0.95, epsilon=1e-06, reg_lambda=0.0)
- Initializes the optimizer with the specified hyperparameters.
Args:
learning_rate (float, optional): The learning rate for the optimizer. Defaults to 1.0.
rho (float, optional): The decay rate for the running averages. Defaults to 0.95.
epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-6.
reg_lambda (float, optional): The regularization parameter for weight decay. Defaults to 0.0.
- initialize(self, layers)
- Initializes the running averages for each layer's weights.
Args:
layers (list): List of layers in the neural network.
Returns:
None
- update(self, layer, dW, db, index)
- Updates the weights and biases of a layer using the Adadelta optimization algorithm.
Args:
layer (Layer): The layer to update.
dW (ndarray): The gradient of the weights.
db (ndarray): The gradient of the biases.
index (int): The index of the layer.
Returns:
None
Data descriptors defined here:
- __dict__
- dictionary for instance variables
- __weakref__
- list of weak references to the object
|
class AdamOptimizer(builtins.object) |
|
AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, reg_lambda=0.01)
Adam optimizer class for training neural networks.
Formula: w = w - alpha * m_hat / (sqrt(v_hat) + epsilon) - lambda * w
Derived from: https://arxiv.org/abs/1412.6980
Args:
learning_rate (float, optional): The learning rate for the optimizer. Defaults to 0.001.
beta1 (float, optional): The exponential decay rate for the first moment estimates. Defaults to 0.9.
beta2 (float, optional): The exponential decay rate for the second moment estimates. Defaults to 0.999.
epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-8.
reg_lambda (float, optional): The regularization parameter. Defaults to 0.01. |
|
Methods defined here:
- __init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, reg_lambda=0.01)
- Initializes the optimizer with the given hyperparameters.
Args:
learning_rate (float, optional): The learning rate (alpha) for the optimizer. Defaults to 0.001.
beta1 (float, optional): Exponential decay rate for the first moment estimates. Defaults to 0.9.
beta2 (float, optional): Exponential decay rate for the second moment estimates. Defaults to 0.999.
epsilon (float, optional): A small value to prevent division by zero. Defaults to 1e-8.
reg_lambda (float, optional): Regularization parameter; higher values indicate stronger regularization. Defaults to 0.01.
Attributes:
learning_rate (float): The learning rate for the optimizer.
beta1 (float): Exponential decay rate for the first moment estimates.
beta2 (float): Exponential decay rate for the second moment estimates.
epsilon (float): A small value to prevent division by zero.
reg_lambda (float): Regularization parameter for controlling overfitting.
m (list): List to store first moment estimates for each parameter.
v (list): List to store second moment estimates for each parameter.
t (int): Time step counter for the optimizer.
- initialize(self, layers)
- Initializes the first and second moment estimates for each layer's weights.
Args:
layers (list): List of layers in the neural network.
Returns:
None
- update(self, layer, dW, db, index)
- Updates the weights and biases of a layer using the Adam optimization algorithm.
Args:
layer (Layer): The layer to update.
dW (ndarray): The gradient of the weights.
db (ndarray): The gradient of the biases.
index (int): The index of the layer.
Returns: None
Data descriptors defined here:
- __dict__
- dictionary for instance variables
- __weakref__
- list of weak references to the object
|
class SGDOptimizer(builtins.object) |
|
SGDOptimizer(learning_rate=0.001, momentum=0.0, reg_lambda=0.0)
Stochastic Gradient Descent (SGD) optimizer class for training neural networks.
Formula: w = w - learning_rate * dW, b = b - learning_rate * db
Args:
learning_rate (float, optional): The learning rate for the optimizer. Defaults to 0.001.
momentum (float, optional): The momentum factor. Defaults to 0.0.
reg_lambda (float, optional): The regularization parameter. Defaults to 0.0. |
|
Methods defined here:
- __init__(self, learning_rate=0.001, momentum=0.0, reg_lambda=0.0)
- Initializes the optimizer with specified parameters.
Args:
learning_rate (float, optional): The step size for updating weights. Defaults to 0.001.
momentum (float, optional): The momentum factor to accelerate gradient descent. Defaults to 0.0.
reg_lambda (float, optional): The regularization parameter to prevent overfitting. Defaults to 0.0.
- initialize(self, layers)
- Initializes the velocity for each layer's weights.
Args:
layers (list): List of layers in the neural network.
Returns:
None
- update(self, layer, dW, db, index)
- Updates the weights and biases of a layer using the SGD optimization algorithm.
Args:
layer (Layer): The layer to update.
dW (ndarray): The gradient of the weights.
db (ndarray): The gradient of the biases.
index (int): The index of the layer.
Returns:
None
Data descriptors defined here:
- __dict__
- dictionary for instance variables
- __weakref__
- list of weak references to the object
| |