NaiveUser
/

morph-1b

Model card Files Files and versions

NaiveUser commited on Jul 6, 2025

Commit

87575b8

·

verified ·

1 Parent(s): 30dcb56

Update README.md

Files changed (1) hide show

README.md +61 -3

README.md CHANGED Viewed

@@ -1,3 +1,61 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- mlfoundations/dclm-baseline-1.0
+---
+# Morph-1B
+Morph-1B is a 1 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark.
+This model is designed to show wider and shallower models can yield efficiency gains while preserving accuracy.
+## Model Details
+### Model Description
+- **Developed by:** Song Bian*, Minghao Yan*, Shivaram Venkataraman
+### Model Sources
+- **Repository:** [open-lm-morph](https://github.com/Waterpine/open-lm-morph)
+- **Paper:** [Scaling Inference-Efficient Language Models](https://arxiv.org/pdf/2501.18107)
+### Model Sources
+The model architecture is similar to GPT-2 and LLaMA, using GPT-Neox as the tokenizer.
+### Training Details
+We utilize [DCLM-Baseline](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0) dataset for training.
+The training procedure and hyperparameters are detailed in our ICML 2025 paper.
+## Evaluation
+We evaluate the models over the following dataset: Arc-Easy, Arc-Challenge, BoolQ, COPA, HellaSwag, Lambada, PIQA, WinoGrande, MMLU, Jeopardy, and Winograd.
+### Results
+| Models       | d_model | n_layers | Average | Latency(s) |
+| --------     | ------- | ------- | ------- | ------- |
+| Open-LM-1B   | 2048    | 24    | 0.49    | 3.61    |
+| OPT-1.3B     | 2048    | 24    | 0.50    | 2.55    |
+| Pythia-1.3B  | 2048    | 22    | 0.49    | 3.28    |
+| Neox-1.3B    | 2048    | 24    | 0.49    | 3.99    |
+| OPT-IML-1.3B | 2048    | 24    | 0.54    | 2.54    |
+| Morph-1B     | 3072    | 12    | 0.52    | 1.96    |
+#### Summary
+the Morph-1B model improves inference latency by 1.8× while maintaining accuracy on downstream tasks compared to open-source models.
+## Citation
+**BibTeX:**
+@article{bian2025scaling,
+  title={Scaling Inference-Efficient Language Models},
+  author={Bian, Song and Yan, Minghao and Venkataraman, Shivaram},
+  journal={arXiv preprint arXiv:2501.18107},
+  year={2025}
+}