amoudgl commited on
Commit
95e590f
·
verified ·
1 Parent(s): ede5a21

Update README

Browse files
Files changed (1) hide show
  1. README.md +91 -3
README.md CHANGED
@@ -1,8 +1,96 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
3
  ---
4
 
5
- Official weights for Celo2-base learned update rule proposed in paper:
6
- [Celo2: Towards Learned Optimization Free Lunch](https://huggingface.co/papers/2602.19142)
7
 
8
- Code repo: https://github.com/amoudgl/celo2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ library_name: optax
4
+ tags:
5
+ - optimizer
6
+ - learned-optimizer
7
+ - meta-learning
8
+ - jax
9
  ---
10
 
11
+ # Celo2-base: Towards Learned Optimization Free Lunch
 
12
 
13
+ <p>
14
+ <a href="https://arxiv.org/abs/2602.19142"><img alt="Paper" src="https://img.shields.io/badge/arXiv-2602.19142-b31b1b.svg"></a>
15
+ <a href="https://github.com/amoudgl/celo2"><img alt="Code" src="https://img.shields.io/badge/GitHub-black?logo=github&logoColor=white&labelColor=grey"></a>
16
+ <a href="https://opensource.org/licenses/MIT"><img alt="License: MIT" src="https://img.shields.io/badge/License-MIT-yellow.svg"></a>
17
+ </p>
18
+
19
+
20
+ Official pretrained weights for the **Celo2-base** learned update rule: This variant uses the learned update rule for all parameters without any optimization harness. For better performance, see [celo2](https://huggingface.co/amoudgl/celo2) that uses Newton-Schulz orthogonalization and AdamW for biases/embeddings.
21
+
22
+
23
+ ## Quickstart
24
+
25
+ Download checkpoint and install:
26
+ ```bash
27
+ pip install git+https://github.com/amoudgl/celo2.git
28
+ hf download amoudgl/celo2-base --local-dir ./celo2-base
29
+ ```
30
+
31
+ Use `load_checkpoint` method to fetch pretrained params from checkpoint path:
32
+ ```python
33
+ from celo2_optax import load_checkpoint
34
+ pretrained_params = load_checkpoint('./celo2-base/theta.state')
35
+ ```
36
+
37
+ Standard optax usage with `scale_by_celo2` method that takes pretrained params as input:
38
+ ```python
39
+ import optax
40
+ from celo2_optax import scale_by_celo2
41
+
42
+ optimizer = optax.chain(
43
+ scale_by_celo2(pretrained_params, orthogonalize=False),
44
+ optax.add_decayed_weights(weight_decay),
45
+ optax.scale_by_learning_rate(lr_schedule),
46
+ )
47
+ ```
48
+
49
+ ## Loading and inspecting MLP update rule weights
50
+
51
+ ```python
52
+ from celo2_optax import load_checkpoint
53
+ import jax
54
+
55
+ pretrained_params = load_checkpoint('./celo2-base/theta.state') # dictionary containing weights
56
+ print(jax.tree.map(lambda x: x.shape, pretrained_params))
57
+ ```
58
+
59
+ The checkpoint contains a small MLP stored under the `ff_mod_stack` key with weight matrices (`w0__*`, `w1`, `w2`) and biases (`b0`, `b1`, `b2`). Each `w0__*` key contains weights corresponding to particular input feature such as momentum, gradient, parameter, etc.
60
+
61
+ ## Meta-training config
62
+
63
+ | Key | Value |
64
+ | ----------------------- | ------------------------------------------------------------ |
65
+ | **Optimizer architecture** | MLP, 2 hidden layers, 8 units each |
66
+ | **Meta-training tasks** | 4 image classification tasks (MNIST, FMNIST, CIFAR-10, SVHN) |
67
+ | **Task architecture** | MLP (64-32-10) |
68
+ | **Meta-trainer** | Persistent Evolution Strategies (PES) |
69
+ | **Outer iterations** | 100K |
70
+ | **Truncation length** | 50 |
71
+ | **Min unroll length** | 100 |
72
+ | **Max unroll length** | 2000 |
73
+
74
+ For more details, see config JSON included in the repo [here](./config.json).
75
+
76
+ ## Files
77
+
78
+ | File | Description |
79
+ | ------------- | -------------------------------- |
80
+ | `theta.state` | Pretrained MLP optimizer weights |
81
+ | `config.json` | Meta-training configuration |
82
+
83
+
84
+ ## Citation
85
+
86
+ ```bibtex
87
+ @misc{moudgil2026celo2,
88
+ title={Celo2: Towards Learned Optimization Free Lunch},
89
+ author={Abhinav Moudgil and Boris Knyazev and Eugene Belilovsky},
90
+ year={2026},
91
+ eprint={2602.19142},
92
+ archivePrefix={arXiv},
93
+ primaryClass={cs.LG},
94
+ url={https://arxiv.org/abs/2602.19142},
95
+ }
96
+ ```