HyperCloned from facebook/MobileLLM-R1-140M-base (emb×2, ffn×2)

3a5ce9d verified 17 days ago

1.17 kB

license: other
base_model: facebook/MobileLLM-R1-140M-base
tags:
  - hypercloning
  - model-growth
  - initialization
  - mobilellm

MobileLLM-R1-140M-base — 2× HyperCloned

Initialized from facebook/MobileLLM-R1-140M-base using HyperCloning (Samragh et al., 2024).

Architecture

	Source	HyperCloned
hidden_size	576	1152
num_attention_heads	9	18
num_key_value_heads	3	6
head_dim	64	64
intermediate_size	8192	16384
num_layers	15	15
parameters	140,248,512	454,790,016

Method

Each weight W is expanded via W.repeat(n, n) / n (the paper's W/2 for 2×). Heads double with embedding dimension. head_dim is preserved. Output logits match the source model at initialization.

This is an initialization checkpoint — further training is needed.

@article{samragh2024hypercloning,
  title={Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization},
  author={Samragh et al.},
  journal={arXiv:2409.12903},
  year={2024}
}