HyperCloned from facebook/MobileLLM-R1-140M-base (emb×2, ffn×2)

3a5ce9d verified 17 days ago

1.17 kB

	---
	license: other
	base_model: facebook/MobileLLM-R1-140M-base
	tags:
	- hypercloning
	- model-growth
	- initialization
	- mobilellm
	---

	# MobileLLM-R1-140M-base — 2× HyperCloned

	Initialized from [facebook/MobileLLM-R1-140M-base](https://huggingface.co/facebook/MobileLLM-R1-140M-base)
	using HyperCloning ([Samragh et al., 2024](https://arxiv.org/abs/2409.12903)).

	## Architecture

	\| \| Source \| HyperCloned \|
	\|---\|---\|---\|
	\| hidden_size \| 576 \| 1152 \|
	\| num_attention_heads \| 9 \| 18 \|
	\| num_key_value_heads \| 3 \| 6 \|
	\| head_dim \| 64 \| 64 \|
	\| intermediate_size \| 8192 \| 16384 \|
	\| num_layers \| 15 \| 15 \|
	\| parameters \| 140,248,512 \| 454,790,016 \|

	## Method

	Each weight W is expanded via `W.repeat(n, n) / n` (the paper's W/2 for 2×).
	Heads double with embedding dimension. head_dim is preserved.
	Output logits match the source model at initialization.

	This is an initialization checkpoint — further training is needed.

	```bibtex
	@article{samragh2024hypercloning,
	title={Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization},
	author={Samragh et al.},
	journal={arXiv:2409.12903},
	year={2024}
	}
	```