apple
/

OpenELM-3B-Instruct

Text Generation

Model card Files Files and versions

qicao-apple commited on May 2, 2024

Commit

d3c76da

·

1 Parent(s): 5607895

update OpenELM-3B

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ license_link: LICENSE
 *Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari*
-We introduce **OpenELM**, a family of **Open**-source **E**fficient **L**anguage **M**odels. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. We pretrained OpenELM models using the [CoreNet](https://github.com/apple/corenet) library. We release both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters.
 Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them.
@@ -106,7 +106,7 @@ pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0
 ```bash
 # OpenELM-3B-Instruct
-hf_model=OpenELM-3B-Instruct
 # this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
 tokenizer=meta-llama/Llama-2-7b-hf
@@ -168,7 +168,7 @@ If you find our work useful, please cite:
 ```BibTex
 @article{mehtaOpenELMEfficientLanguage2024,
-	title = {{OpenELM}: {An} {Efficient} {Language} {Model} {Family} with {Open}-source {Training} and {Inference} {Framework}},
 	shorttitle = {{OpenELM}},
 	url = {https://arxiv.org/abs/2404.14619v1},
 	language = {en},

 *Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari*
+We introduce **OpenELM**, a family of **Open** **E**fficient **L**anguage **M**odels. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. We pretrained OpenELM models using the [CoreNet](https://github.com/apple/corenet) library. We release both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters.
 Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them.
 ```bash
 # OpenELM-3B-Instruct
+hf_model=apple/OpenELM-3B-Instruct
 # this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
 tokenizer=meta-llama/Llama-2-7b-hf
 ```BibTex
 @article{mehtaOpenELMEfficientLanguage2024,
+	title = {{OpenELM}: {An} {Efficient} {Language} {Model} {Family} with {Open} {Training} and {Inference} {Framework}},
 	shorttitle = {{OpenELM}},
 	url = {https://arxiv.org/abs/2404.14619v1},
 	language = {en},