bedio's picture
HyperCloned from facebook/MobileLLM-R1-140M-base (emb×2, ffn×2)
3a5ce9d verified
metadata
license: other
base_model: facebook/MobileLLM-R1-140M-base
tags:
  - hypercloning
  - model-growth
  - initialization
  - mobilellm

MobileLLM-R1-140M-base — 2× HyperCloned

Initialized from facebook/MobileLLM-R1-140M-base using HyperCloning (Samragh et al., 2024).

Architecture

Source HyperCloned
hidden_size 576 1152
num_attention_heads 9 18
num_key_value_heads 3 6
head_dim 64 64
intermediate_size 8192 16384
num_layers 15 15
parameters 140,248,512 454,790,016

Method

Each weight W is expanded via W.repeat(n, n) / n (the paper's W/2 for 2×). Heads double with embedding dimension. head_dim is preserved. Output logits match the source model at initialization.

This is an initialization checkpoint — further training is needed.

@article{samragh2024hypercloning,
  title={Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization},
  author={Samragh et al.},
  journal={arXiv:2409.12903},
  year={2024}
}