--- license: other base_model: facebook/MobileLLM-R1-140M-base tags: - hypercloning - model-growth - initialization - mobilellm --- # MobileLLM-R1-140M-base — 2× HyperCloned Initialized from [facebook/MobileLLM-R1-140M-base](https://huggingface.co/facebook/MobileLLM-R1-140M-base) using **HyperCloning** ([Samragh et al., 2024](https://arxiv.org/abs/2409.12903)). ## Architecture | | Source | HyperCloned | |---|---|---| | hidden_size | 576 | 1152 | | num_attention_heads | 9 | 18 | | num_key_value_heads | 3 | 6 | | head_dim | 64 | 64 | | intermediate_size | 8192 | 16384 | | num_layers | 15 | 15 | | parameters | 140,248,512 | 454,790,016 | ## Method Each weight W is expanded via `W.repeat(n, n) / n` (the paper's W/2 for 2×). Heads double with embedding dimension. head_dim is preserved. Output logits match the source model at initialization. This is an **initialization checkpoint** — further training is needed. ```bibtex @article{samragh2024hypercloning, title={Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization}, author={Samragh et al.}, journal={arXiv:2409.12903}, year={2024} } ```