Buckets:
LARS
LARS (Layer-wise Adaptive Rate Scaling) is an optimizer designed for training with large batch sizes to accelerate training. LARS uses a separate learning rate for each layer instead of each parameter. The learning rate is calculated from a trust ratio between the weight and gradient norm in a layer. This helps calibrate a stable update size.
LARS[[api-class]][[bitsandbytes.optim.LARS]]
class bitsandbytes.optim.LARSbitsandbytes.optim.LARS
initbitsandbytes.optim.LARS.inittorch.tensor) --
The input parameters to optimize.
- lr (
float) -- The learning rate. - momentum (
float, defaults to 0) -- The momentum value speeds up the optimizer by taking bigger steps. - dampening (
float, defaults to 0) -- The dampening value reduces the momentum of the optimizer. - weight_decay (
float, defaults to 1e-2) -- The weight decay value for the optimizer. - nesterov (
bool, defaults toFalse) -- Whether to use Nesterov momentum. - optim_bits (
int, defaults to 32) -- The number of bits of the optimizer state. - args (
object, defaults toNone) -- An object with additional arguments. - min_8bit_size (
int, defaults to 4096) -- The minimum number of elements of the parameter tensors for 8-bit optimization. - percentile_clipping (
int, defaults to 100) -- Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability. - max_unorm (
float, defaults to 0.02) -- The maximum gradient norm.0
Base LARS optimizer.
LARS8bit[[bitsandbytes.optim.LARS8bit]]
class bitsandbytes.optim.LARS8bitbitsandbytes.optim.LARS8bit
initbitsandbytes.optim.LARS8bit.inittorch.tensor) --
The input parameters to optimize.
- lr (
float) -- The learning rate. - momentum (
float, defaults to 0) -- The momentum value speeds up the optimizer by taking bigger steps. - dampening (
float, defaults to 0) -- The dampening value reduces the momentum of the optimizer. - weight_decay (
float, defaults to 1e-2) -- The weight decay value for the optimizer. - nesterov (
bool, defaults toFalse) -- Whether to use Nesterov momentum. - args (
object, defaults toNone) -- An object with additional arguments. - min_8bit_size (
int, defaults to 4096) -- The minimum number of elements of the parameter tensors for 8-bit optimization. - percentile_clipping (
int, defaults to 100) -- Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability. - max_unorm (
float, defaults to 0.02) -- The maximum gradient norm.0
8-bit LARS optimizer.
LARS32bit[[bitsandbytes.optim.LARS32bit]]
class bitsandbytes.optim.LARS32bitbitsandbytes.optim.LARS32bit
initbitsandbytes.optim.LARS32bit.inittorch.tensor) --
The input parameters to optimize.
- lr (
float) -- The learning rate. - momentum (
float, defaults to 0) -- The momentum value speeds up the optimizer by taking bigger steps. - dampening (
float, defaults to 0) -- The dampening value reduces the momentum of the optimizer. - weight_decay (
float, defaults to 1e-2) -- The weight decay value for the optimizer. - nesterov (
bool, defaults toFalse) -- Whether to use Nesterov momentum. - args (
object, defaults toNone) -- An object with additional arguments. - min_8bit_size (
int, defaults to 4096) -- The minimum number of elements of the parameter tensors for 8-bit optimization. - percentile_clipping (
int, defaults to 100) -- Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability. - max_unorm (
float, defaults to 0.02) -- The maximum gradient norm.0
32-bit LARS optimizer.
Xet Storage Details
- Size:
- 8.53 kB
- Xet hash:
- cb8750772eb81eb415ea6fbd942a92870d9ab0100f7bde5aeabf26ae1efe185b
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.