NaiveUser
/

morph-1b

Model card Files Files and versions

morph-1b / README.md

NaiveUser's picture

Update README.md

87575b8 verified 7 months ago

|

history blame contribute delete

2.01 kB

	---
	license: mit
	datasets:
	- mlfoundations/dclm-baseline-1.0
	---
	# Morph-1B

	Morph-1B is a 1 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark.

	This model is designed to show wider and shallower models can yield efficiency gains while preserving accuracy.

	## Model Details

	### Model Description

	- Developed by: Song Bian, Minghao Yan, Shivaram Venkataraman

	### Model Sources

	- Repository: [open-lm-morph](https://github.com/Waterpine/open-lm-morph)
	- Paper: [Scaling Inference-Efficient Language Models](https://arxiv.org/pdf/2501.18107)

	### Model Sources

	The model architecture is similar to GPT-2 and LLaMA, using GPT-Neox as the tokenizer.

	### Training Details

	We utilize [DCLM-Baseline](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0) dataset for training.

	The training procedure and hyperparameters are detailed in our ICML 2025 paper.

	## Evaluation

	We evaluate the models over the following dataset: Arc-Easy, Arc-Challenge, BoolQ, COPA, HellaSwag, Lambada, PIQA, WinoGrande, MMLU, Jeopardy, and Winograd.

	### Results

	\| Models \| d_model \| n_layers \| Average \| Latency(s) \|
	\| -------- \| ------- \| ------- \| ------- \| ------- \|
	\| Open-LM-1B \| 2048 \| 24 \| 0.49 \| 3.61 \|
	\| OPT-1.3B \| 2048 \| 24 \| 0.50 \| 2.55 \|
	\| Pythia-1.3B \| 2048 \| 22 \| 0.49 \| 3.28 \|
	\| Neox-1.3B \| 2048 \| 24 \| 0.49 \| 3.99 \|
	\| OPT-IML-1.3B \| 2048 \| 24 \| 0.54 \| 2.54 \|
	\| Morph-1B \| 3072 \| 12 \| 0.52 \| 1.96 \|

	#### Summary

	the Morph-1B model improves inference latency by 1.8× while maintaining accuracy on downstream tasks compared to open-source models.

	## Citation

	BibTeX:

	@article{bian2025scaling,
	title={Scaling Inference-Efficient Language Models},
	author={Bian, Song and Yan, Minghao and Venkataraman, Shivaram},
	journal={arXiv preprint arXiv:2501.18107},
	year={2025}
	}