File size: 1,167 Bytes
3a5ce9d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
license: other
base_model: facebook/MobileLLM-R1-140M-base
tags:
  - hypercloning
  - model-growth
  - initialization
  - mobilellm
---

# MobileLLM-R1-140M-base — 2× HyperCloned

Initialized from [facebook/MobileLLM-R1-140M-base](https://huggingface.co/facebook/MobileLLM-R1-140M-base)
using **HyperCloning** ([Samragh et al., 2024](https://arxiv.org/abs/2409.12903)).

## Architecture

| | Source | HyperCloned |
|---|---|---|
| hidden_size | 576 | 1152 |
| num_attention_heads | 9 | 18 |
| num_key_value_heads | 3 | 6 |
| head_dim | 64 | 64 |
| intermediate_size | 8192 | 16384 |
| num_layers | 15 | 15 |
| parameters | 140,248,512 | 454,790,016 |

## Method

Each weight W is expanded via `W.repeat(n, n) / n` (the paper's W/2 for 2×).
Heads double with embedding dimension. head_dim is preserved.
Output logits match the source model at initialization.

This is an **initialization checkpoint** — further training is needed.

```bibtex
@article{samragh2024hypercloning,
  title={Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization},
  author={Samragh et al.},
  journal={arXiv:2409.12903},
  year={2024}
}
```