File size: 1,167 Bytes
3a5ce9d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | ---
license: other
base_model: facebook/MobileLLM-R1-140M-base
tags:
- hypercloning
- model-growth
- initialization
- mobilellm
---
# MobileLLM-R1-140M-base — 2× HyperCloned
Initialized from [facebook/MobileLLM-R1-140M-base](https://huggingface.co/facebook/MobileLLM-R1-140M-base)
using **HyperCloning** ([Samragh et al., 2024](https://arxiv.org/abs/2409.12903)).
## Architecture
| | Source | HyperCloned |
|---|---|---|
| hidden_size | 576 | 1152 |
| num_attention_heads | 9 | 18 |
| num_key_value_heads | 3 | 6 |
| head_dim | 64 | 64 |
| intermediate_size | 8192 | 16384 |
| num_layers | 15 | 15 |
| parameters | 140,248,512 | 454,790,016 |
## Method
Each weight W is expanded via `W.repeat(n, n) / n` (the paper's W/2 for 2×).
Heads double with embedding dimension. head_dim is preserved.
Output logits match the source model at initialization.
This is an **initialization checkpoint** — further training is needed.
```bibtex
@article{samragh2024hypercloning,
title={Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization},
author={Samragh et al.},
journal={arXiv:2409.12903},
year={2024}
}
```
|