# AdEMAMix

[AdEMAMix](https://hf.co/papers/2409.03137) is a variant of the `Adam` optimizer.

bitsandbytes also supports paged optimizers which take advantage of CUDAs unified memory to transfer memory from the GPU to the CPU when GPU memory is exhausted.

## AdEMAMix[[api-class]][[bitsandbytes.optim.AdEMAMix]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class bitsandbytes.optim.AdEMAMix</name><anchor>bitsandbytes.optim.AdEMAMix</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/optim/ademamix.py#L107</source><parameters>[{"name": "params", "val": ": Iterable"}, {"name": "lr", "val": ": float = 0.001"}, {"name": "betas", "val": ": tuple = (0.9, 0.999, 0.9999)"}, {"name": "alpha", "val": ": float = 5.0"}, {"name": "t_alpha", "val": ": typing.Optional[int] = None"}, {"name": "t_beta3", "val": ": typing.Optional[int] = None"}, {"name": "eps", "val": ": float = 1e-08"}, {"name": "weight_decay", "val": ": float = 0.01"}, {"name": "optim_bits", "val": ": typing.Literal[8, 32] = 32"}, {"name": "min_8bit_size", "val": ": int = 4096"}, {"name": "is_paged", "val": ": bool = False"}]</parameters></docstring>



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>__init__</name><anchor>bitsandbytes.optim.AdEMAMix.__init__</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/optim/ademamix.py#L108</source><parameters>[{"name": "params", "val": ": Iterable"}, {"name": "lr", "val": ": float = 0.001"}, {"name": "betas", "val": ": tuple = (0.9, 0.999, 0.9999)"}, {"name": "alpha", "val": ": float = 5.0"}, {"name": "t_alpha", "val": ": typing.Optional[int] = None"}, {"name": "t_beta3", "val": ": typing.Optional[int] = None"}, {"name": "eps", "val": ": float = 1e-08"}, {"name": "weight_decay", "val": ": float = 0.01"}, {"name": "optim_bits", "val": ": typing.Literal[8, 32] = 32"}, {"name": "min_8bit_size", "val": ": int = 4096"}, {"name": "is_paged", "val": ": bool = False"}]</parameters></docstring>


</div></div>

## AdEMAMix8bit[[bitsandbytes.optim.AdEMAMix8bit]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class bitsandbytes.optim.AdEMAMix8bit</name><anchor>bitsandbytes.optim.AdEMAMix8bit</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/optim/ademamix.py#L274</source><parameters>[{"name": "params", "val": ": Iterable"}, {"name": "lr", "val": ": float = 0.001"}, {"name": "betas", "val": ": tuple = (0.9, 0.999, 0.9999)"}, {"name": "alpha", "val": ": float = 5.0"}, {"name": "t_alpha", "val": ": typing.Optional[int] = None"}, {"name": "t_beta3", "val": ": typing.Optional[int] = None"}, {"name": "eps", "val": ": float = 1e-08"}, {"name": "weight_decay", "val": ": float = 0.01"}, {"name": "min_8bit_size", "val": ": int = 4096"}, {"name": "is_paged", "val": ": bool = False"}]</parameters></docstring>



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>__init__</name><anchor>bitsandbytes.optim.AdEMAMix8bit.__init__</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/optim/ademamix.py#L275</source><parameters>[{"name": "params", "val": ": Iterable"}, {"name": "lr", "val": ": float = 0.001"}, {"name": "betas", "val": ": tuple = (0.9, 0.999, 0.9999)"}, {"name": "alpha", "val": ": float = 5.0"}, {"name": "t_alpha", "val": ": typing.Optional[int] = None"}, {"name": "t_beta3", "val": ": typing.Optional[int] = None"}, {"name": "eps", "val": ": float = 1e-08"}, {"name": "weight_decay", "val": ": float = 0.01"}, {"name": "min_8bit_size", "val": ": int = 4096"}, {"name": "is_paged", "val": ": bool = False"}]</parameters></docstring>


</div></div>

## AdEMAMix32bit[[bitsandbytes.optim.AdEMAMix32bit]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class bitsandbytes.optim.AdEMAMix32bit</name><anchor>bitsandbytes.optim.AdEMAMix32bit</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/optim/ademamix.py#L359</source><parameters>[{"name": "params", "val": ": Iterable"}, {"name": "lr", "val": ": float = 0.001"}, {"name": "betas", "val": ": tuple = (0.9, 0.999, 0.9999)"}, {"name": "alpha", "val": ": float = 5.0"}, {"name": "t_alpha", "val": ": typing.Optional[int] = None"}, {"name": "t_beta3", "val": ": typing.Optional[int] = None"}, {"name": "eps", "val": ": float = 1e-08"}, {"name": "weight_decay", "val": ": float = 0.01"}, {"name": "min_8bit_size", "val": ": int = 4096"}, {"name": "is_paged", "val": ": bool = False"}]</parameters></docstring>



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>__init__</name><anchor>bitsandbytes.optim.AdEMAMix32bit.__init__</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/optim/ademamix.py#L360</source><parameters>[{"name": "params", "val": ": Iterable"}, {"name": "lr", "val": ": float = 0.001"}, {"name": "betas", "val": ": tuple = (0.9, 0.999, 0.9999)"}, {"name": "alpha", "val": ": float = 5.0"}, {"name": "t_alpha", "val": ": typing.Optional[int] = None"}, {"name": "t_beta3", "val": ": typing.Optional[int] = None"}, {"name": "eps", "val": ": float = 1e-08"}, {"name": "weight_decay", "val": ": float = 0.01"}, {"name": "min_8bit_size", "val": ": int = 4096"}, {"name": "is_paged", "val": ": bool = False"}]</parameters></docstring>


</div></div>

## PagedAdEMAMix[[bitsandbytes.optim.PagedAdEMAMix]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class bitsandbytes.optim.PagedAdEMAMix</name><anchor>bitsandbytes.optim.PagedAdEMAMix</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/optim/ademamix.py#L330</source><parameters>[{"name": "params", "val": ": Iterable"}, {"name": "lr", "val": ": float = 0.001"}, {"name": "betas", "val": ": tuple = (0.9, 0.999, 0.9999)"}, {"name": "alpha", "val": ": float = 5.0"}, {"name": "t_alpha", "val": ": typing.Optional[int] = None"}, {"name": "t_beta3", "val": ": typing.Optional[int] = None"}, {"name": "eps", "val": ": float = 1e-08"}, {"name": "weight_decay", "val": ": float = 0.01"}, {"name": "optim_bits", "val": ": typing.Literal[8, 32] = 32"}, {"name": "min_8bit_size", "val": ": int = 4096"}]</parameters></docstring>



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>__init__</name><anchor>bitsandbytes.optim.PagedAdEMAMix.__init__</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/optim/ademamix.py#L331</source><parameters>[{"name": "params", "val": ": Iterable"}, {"name": "lr", "val": ": float = 0.001"}, {"name": "betas", "val": ": tuple = (0.9, 0.999, 0.9999)"}, {"name": "alpha", "val": ": float = 5.0"}, {"name": "t_alpha", "val": ": typing.Optional[int] = None"}, {"name": "t_beta3", "val": ": typing.Optional[int] = None"}, {"name": "eps", "val": ": float = 1e-08"}, {"name": "weight_decay", "val": ": float = 0.01"}, {"name": "optim_bits", "val": ": typing.Literal[8, 32] = 32"}, {"name": "min_8bit_size", "val": ": int = 4096"}]</parameters></docstring>


</div></div>

## PagedAdEMAMix8bit[[bitsandbytes.optim.PagedAdEMAMix8bit]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class bitsandbytes.optim.PagedAdEMAMix8bit</name><anchor>bitsandbytes.optim.PagedAdEMAMix8bit</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/optim/ademamix.py#L303</source><parameters>[{"name": "params", "val": ": Iterable"}, {"name": "lr", "val": ": float = 0.001"}, {"name": "betas", "val": ": tuple = (0.9, 0.999, 0.9999)"}, {"name": "alpha", "val": ": float = 5.0"}, {"name": "t_alpha", "val": ": typing.Optional[int] = None"}, {"name": "t_beta3", "val": ": typing.Optional[int] = None"}, {"name": "eps", "val": ": float = 1e-08"}, {"name": "weight_decay", "val": ": float = 0.01"}, {"name": "min_8bit_size", "val": ": int = 4096"}]</parameters></docstring>



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>__init__</name><anchor>bitsandbytes.optim.PagedAdEMAMix8bit.__init__</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/optim/ademamix.py#L304</source><parameters>[{"name": "params", "val": ": Iterable"}, {"name": "lr", "val": ": float = 0.001"}, {"name": "betas", "val": ": tuple = (0.9, 0.999, 0.9999)"}, {"name": "alpha", "val": ": float = 5.0"}, {"name": "t_alpha", "val": ": typing.Optional[int] = None"}, {"name": "t_beta3", "val": ": typing.Optional[int] = None"}, {"name": "eps", "val": ": float = 1e-08"}, {"name": "weight_decay", "val": ": float = 0.01"}, {"name": "min_8bit_size", "val": ": int = 4096"}]</parameters></docstring>


</div></div>

## PagedAdEMAMix32bit[[bitsandbytes.optim.PagedAdEMAMix32bit]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class bitsandbytes.optim.PagedAdEMAMix32bit</name><anchor>bitsandbytes.optim.PagedAdEMAMix32bit</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/optim/ademamix.py#L392</source><parameters>[{"name": "params", "val": ": Iterable"}, {"name": "lr", "val": ": float = 0.001"}, {"name": "betas", "val": ": tuple = (0.9, 0.999, 0.9999)"}, {"name": "alpha", "val": ": float = 5.0"}, {"name": "t_alpha", "val": ": typing.Optional[int] = None"}, {"name": "t_beta3", "val": ": typing.Optional[int] = None"}, {"name": "eps", "val": ": float = 1e-08"}, {"name": "weight_decay", "val": ": float = 0.01"}, {"name": "min_8bit_size", "val": ": int = 4096"}]</parameters></docstring>



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>__init__</name><anchor>bitsandbytes.optim.PagedAdEMAMix32bit.__init__</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/optim/ademamix.py#L393</source><parameters>[{"name": "params", "val": ": Iterable"}, {"name": "lr", "val": ": float = 0.001"}, {"name": "betas", "val": ": tuple = (0.9, 0.999, 0.9999)"}, {"name": "alpha", "val": ": float = 5.0"}, {"name": "t_alpha", "val": ": typing.Optional[int] = None"}, {"name": "t_beta3", "val": ": typing.Optional[int] = None"}, {"name": "eps", "val": ": float = 1e-08"}, {"name": "weight_decay", "val": ": float = 0.01"}, {"name": "min_8bit_size", "val": ": int = 4096"}]</parameters></docstring>


</div></div>

<EditOnGithub source="https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/reference/optim/ademamix.mdx" />