Instructions to use apple/OpenELM-3B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use apple/OpenELM-3B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="apple/OpenELM-3B-Instruct", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("apple/OpenELM-3B-Instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use apple/OpenELM-3B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "apple/OpenELM-3B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "apple/OpenELM-3B-Instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/apple/OpenELM-3B-Instruct
- SGLang
How to use apple/OpenELM-3B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "apple/OpenELM-3B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "apple/OpenELM-3B-Instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "apple/OpenELM-3B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "apple/OpenELM-3B-Instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use apple/OpenELM-3B-Instruct with Docker Model Runner:
docker model run hf.co/apple/OpenELM-3B-Instruct
Commit ·
d3c76da
1
Parent(s): 5607895
update OpenELM-3B
Browse files
README.md
CHANGED
|
@@ -8,7 +8,7 @@ license_link: LICENSE
|
|
| 8 |
|
| 9 |
*Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari*
|
| 10 |
|
| 11 |
-
We introduce **OpenELM**, a family of **Open**
|
| 12 |
|
| 13 |
Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them.
|
| 14 |
|
|
@@ -106,7 +106,7 @@ pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0
|
|
| 106 |
```bash
|
| 107 |
|
| 108 |
# OpenELM-3B-Instruct
|
| 109 |
-
hf_model=OpenELM-3B-Instruct
|
| 110 |
|
| 111 |
# this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
|
| 112 |
tokenizer=meta-llama/Llama-2-7b-hf
|
|
@@ -168,7 +168,7 @@ If you find our work useful, please cite:
|
|
| 168 |
|
| 169 |
```BibTex
|
| 170 |
@article{mehtaOpenELMEfficientLanguage2024,
|
| 171 |
-
title = {{OpenELM}: {An} {Efficient} {Language} {Model} {Family} with {Open}
|
| 172 |
shorttitle = {{OpenELM}},
|
| 173 |
url = {https://arxiv.org/abs/2404.14619v1},
|
| 174 |
language = {en},
|
|
|
|
| 8 |
|
| 9 |
*Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari*
|
| 10 |
|
| 11 |
+
We introduce **OpenELM**, a family of **Open** **E**fficient **L**anguage **M**odels. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. We pretrained OpenELM models using the [CoreNet](https://github.com/apple/corenet) library. We release both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters.
|
| 12 |
|
| 13 |
Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Please check license agreements and terms of these datasets before using them.
|
| 14 |
|
|
|
|
| 106 |
```bash
|
| 107 |
|
| 108 |
# OpenELM-3B-Instruct
|
| 109 |
+
hf_model=apple/OpenELM-3B-Instruct
|
| 110 |
|
| 111 |
# this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
|
| 112 |
tokenizer=meta-llama/Llama-2-7b-hf
|
|
|
|
| 168 |
|
| 169 |
```BibTex
|
| 170 |
@article{mehtaOpenELMEfficientLanguage2024,
|
| 171 |
+
title = {{OpenELM}: {An} {Efficient} {Language} {Model} {Family} with {Open} {Training} and {Inference} {Framework}},
|
| 172 |
shorttitle = {{OpenELM}},
|
| 173 |
url = {https://arxiv.org/abs/2404.14619v1},
|
| 174 |
language = {en},
|