| license: other | |
| base_model: facebook/MobileLLM-R1-140M-base | |
| tags: | |
| - hypercloning | |
| - model-growth | |
| - initialization | |
| - mobilellm | |
| # MobileLLM-R1-140M-base — 2× HyperCloned | |
| Initialized from [facebook/MobileLLM-R1-140M-base](https://huggingface.co/facebook/MobileLLM-R1-140M-base) | |
| using **HyperCloning** ([Samragh et al., 2024](https://arxiv.org/abs/2409.12903)). | |
| ## Architecture | |
| | | Source | HyperCloned | | |
| |---|---|---| | |
| | hidden_size | 576 | 1152 | | |
| | num_attention_heads | 9 | 18 | | |
| | num_key_value_heads | 3 | 6 | | |
| | head_dim | 64 | 64 | | |
| | intermediate_size | 8192 | 16384 | | |
| | num_layers | 15 | 15 | | |
| | parameters | 140,248,512 | 454,790,016 | | |
| ## Method | |
| Each weight W is expanded via `W.repeat(n, n) / n` (the paper's W/2 for 2×). | |
| Heads double with embedding dimension. head_dim is preserved. | |
| Output logits match the source model at initialization. | |
| This is an **initialization checkpoint** — further training is needed. | |
| ```bibtex | |
| @article{samragh2024hypercloning, | |
| title={Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization}, | |
| author={Samragh et al.}, | |
| journal={arXiv:2409.12903}, | |
| year={2024} | |
| } | |
| ``` | |