|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- mlfoundations/dclm-baseline-1.0 |
|
|
--- |
|
|
# Morph-1B |
|
|
|
|
|
Morph-1B is a 1 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark. |
|
|
|
|
|
This model is designed to show wider and shallower models can yield efficiency gains while preserving accuracy. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
- **Developed by:** Song Bian*, Minghao Yan*, Shivaram Venkataraman |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Repository:** [open-lm-morph](https://github.com/Waterpine/open-lm-morph) |
|
|
- **Paper:** [Scaling Inference-Efficient Language Models](https://arxiv.org/pdf/2501.18107) |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
The model architecture is similar to GPT-2 and LLaMA, using GPT-Neox as the tokenizer. |
|
|
|
|
|
### Training Details |
|
|
|
|
|
We utilize [DCLM-Baseline](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0) dataset for training. |
|
|
|
|
|
The training procedure and hyperparameters are detailed in our ICML 2025 paper. |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
We evaluate the models over the following dataset: Arc-Easy, Arc-Challenge, BoolQ, COPA, HellaSwag, Lambada, PIQA, WinoGrande, MMLU, Jeopardy, and Winograd. |
|
|
|
|
|
### Results |
|
|
|
|
|
| Models | d_model | n_layers | Average | Latency(s) | |
|
|
| -------- | ------- | ------- | ------- | ------- | |
|
|
| Open-LM-1B | 2048 | 24 | 0.49 | 3.61 | |
|
|
| OPT-1.3B | 2048 | 24 | 0.50 | 2.55 | |
|
|
| Pythia-1.3B | 2048 | 22 | 0.49 | 3.28 | |
|
|
| Neox-1.3B | 2048 | 24 | 0.49 | 3.99 | |
|
|
| OPT-IML-1.3B | 2048 | 24 | 0.54 | 2.54 | |
|
|
| Morph-1B | 3072 | 12 | 0.52 | 1.96 | |
|
|
|
|
|
#### Summary |
|
|
|
|
|
the Morph-1B model improves inference latency by 1.8× while maintaining accuracy on downstream tasks compared to open-source models. |
|
|
|
|
|
## Citation |
|
|
|
|
|
**BibTeX:** |
|
|
|
|
|
@article{bian2025scaling, |
|
|
title={Scaling Inference-Efficient Language Models}, |
|
|
author={Bian, Song and Yan, Minghao and Venkataraman, Shivaram}, |
|
|
journal={arXiv preprint arXiv:2501.18107}, |
|
|
year={2025} |
|
|
} |
|
|
|