| | --- |
| | license: apache-2.0 |
| | tags: |
| | - model_hub_mixin |
| | - pytorch_model_hub_mixin |
| | - learned-optimizer |
| | --- |
| | |
| | # Description |
| | `VeLO` is a learned optimizer meta-trained on thousands of diverse machine learning tasks. It corresponds to the VeLO (Versatile Learned Optimizer) from [VeLO: Training Versatile Learned Optimizers by Scaling Up](https://arxiv.org/abs/2211.09760). |
| |
|
| | ### Learned optimizer meta training and architectural details |
| | | **Field** | **Value** | |
| | |------------------------------|---------------------------------------------------------------------------| |
| | | **Meta-training distribution** | Thousands of ML tasks including MLPs, CNNs, ResNets, VAEs, classification, regression | |
| | | **Number of meta-training TPU-months** | ~4000 | |
| | | **Target inner problem length** | 150000 (max) | |
| | | **Gradient estimator** | Evolution Strategies | |
| | | **Architecture** | LSTM-based hypernetwork | |
| |
|
| |
|
| | # Usage |
| | --- |
| | ## 1) Install PyLO |
| | The following |
| | ```bash |
| | git clone https://github.com/Belilovsky-Lab/pylo |
| | cd pylo |
| | pip install . |
| | python setup.py install --cuda |
| | ``` |
| |
|
| | ## (2) Use VeLO as a drop-in replacement for pytorch optimizers |
| |
|
| | ```python |
| | |
| | from pylo.optim import VeLO |
| | optimizer = VeLO(model.parameters(), lr=1.0 , num_steps=150_000) |
| | ``` |
| |
|
| | ## (3) A simple example |
| |
|
| | The following example is for illustration purposes and does not implement the correct parameterizaiton. For a correct implementation see https://github.com/Belilovsky-Lab/pylo/tree/main/examples |
| |
|
| | ```python |
| | import torch |
| | import torch.nn as nn |
| | import torch.optim as optim |
| | from torchvision import datasets, transforms |
| | from torch.utils.data import DataLoader |
| | |
| | # Model |
| | class MLP(nn.Module): |
| | def __init__(self): |
| | super().__init__() |
| | self.net = nn.Sequential( |
| | nn.Flatten(), |
| | nn.Linear(28 * 28, 128), |
| | nn.ReLU(), |
| | nn.Linear(128, 10) |
| | ) |
| | def forward(self, x): |
| | return self.net(x) |
| | |
| | model = MLP().to(device) |
| | |
| | ######################### |
| | Setup Learned Optimizer |
| | ######################### |
| | optimizer = VeLO(model.parameters(), lr=1.0 , num_steps=150_000) |
| | |
| | # Device |
| | device = torch.device('cuda') |
| | |
| | # Data |
| | transform = transforms.ToTensor() |
| | train_loader = DataLoader(datasets.MNIST(root='./data', train=True, download=True, transform=transform), |
| | batch_size=64, shuffle=True) |
| | |
| | criterion = nn.CrossEntropyLoss() |
| | # Training loop |
| | for epoch in range(1): # Just 1 epoch for simplicity |
| | for x, y in train_loader: |
| | x, y = x.to(device), y.to(device) |
| | optimizer.zero_grad() |
| | loss = criterion(model(x), y) |
| | loss.backward() |
| | optimizer.step(loss) |
| | |
| | print("Done!") |
| | ``` |
| | | |
| | |
| | **Note**: VeLO requires the total number of training steps as input to initialize its internal states and compute training progress features. |
| |
|
| | # Official Resources |
| |
|
| | - **Paper**: [VeLO: Training Versatile Learned Optimizers by Scaling Up](https://arxiv.org/abs/2211.09760) |
| | - **Adapted from Repository**: [Google Learned Optimization](https://github.com/google/learned_optimization/tree/main/learned_optimization/research/general_lopt) |
| | - **Available in PyLO** [PyLO] (https://github.com/Belilovsky-Lab/pylo) |
| |
|
| | # Important Notes |
| |
|
| | - **Step Requirement**: VeLO requires knowledge of total training steps for proper initialization |
| | - **Computational Cost**: Meta-trained using ~4000 TPU-months, representing significant computational investment |
| | - **Generalization**: Designed for meta-generalization across diverse optimization tasks |
| | - **No Hyperparameter Tuning**: Automatically adapts to problem specifics without manual tuning |
| |
|
| | # Cite |
| | If you found this optimizer useful in your research, please consider citing the original work: |
| | ```bibtex |
| | @article{metz2022velo, |
| | title={{VeLO}: Training Versatile Learned Optimizers by Scaling Up}, |
| | author={Luke Metz and James Harrison and C. Daniel Freeman and Amil Merchant and Lucas Beyer and James Bradbury and Naman Agrawal and Ben Poole and Igor Mordatch and Adam Roberts and Jascha Sohl-Dickstein}, |
| | journal={arXiv preprint arXiv:2211.09760}, |
| | year={2022}, |
| | url={https://arxiv.org/abs/2211.09760} |
| | } |
| | ``` |