Text Generation
Transformers
PyTorch
English
llama
math
reasoning
Eval Results
text-generation-inference
Instructions to use EleutherAI/llemma_7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EleutherAI/llemma_7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="EleutherAI/llemma_7b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("EleutherAI/llemma_7b") model = AutoModelForCausalLM.from_pretrained("EleutherAI/llemma_7b") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use EleutherAI/llemma_7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "EleutherAI/llemma_7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EleutherAI/llemma_7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/EleutherAI/llemma_7b
- SGLang
How to use EleutherAI/llemma_7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "EleutherAI/llemma_7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EleutherAI/llemma_7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "EleutherAI/llemma_7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EleutherAI/llemma_7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use EleutherAI/llemma_7b with Docker Model Runner:
docker model run hf.co/EleutherAI/llemma_7b
zhangir-azerbayev commited on
Commit ·
21a6f14
1
Parent(s): 660c780
update readme with links
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ tags:
|
|
| 10 |
---
|
| 11 |
<img src="llemma.png" width="400">
|
| 12 |
|
| 13 |
-
[ArXiv](
|
| 14 |
|
| 15 |
[Zhangir Azerbayev](https://zhangir-azerbayev.github.io/), [Hailey Schoelkopf](https://github.com/haileyschoelkopf), [Keiran Paster](https://keirp.com), [Marco Dos Santos](https://github.com/dsantosmarco), [Stephen McAleer](https://www.andrew.cmu.edu/user/smcaleer/), [Albert Q. Jiang](https://albertqjiang.github.io/), [Jia Deng](https://www.cs.princeton.edu/~jiadeng/), [Stella Biderman](https://www.stellabiderman.com/), [Sean Welleck](https://wellecks.com/)
|
| 16 |
|
|
@@ -57,12 +57,13 @@ In addition to chain-of-thought reasoning, Llemma has strong capabilities in com
|
|
| 57 |
|
| 58 |
### Citation
|
| 59 |
```
|
| 60 |
-
@
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
|
|
|
| 66 |
}
|
| 67 |
```
|
| 68 |
|
|
|
|
| 10 |
---
|
| 11 |
<img src="llemma.png" width="400">
|
| 12 |
|
| 13 |
+
[ArXiv](http://arxiv.org/abs/2310.10631) | [Models](https://huggingface.co/EleutherAI/llemma_34b) | [Data](https://huggingface.co/datasets/EleutherAI/proof-pile-2) | [Code](https://github.com/EleutherAI/math-lm) | [Blog](https://blog.eleuther.ai/llemma/) | [Sample Explorer](https://llemma-demo.github.io/)
|
| 14 |
|
| 15 |
[Zhangir Azerbayev](https://zhangir-azerbayev.github.io/), [Hailey Schoelkopf](https://github.com/haileyschoelkopf), [Keiran Paster](https://keirp.com), [Marco Dos Santos](https://github.com/dsantosmarco), [Stephen McAleer](https://www.andrew.cmu.edu/user/smcaleer/), [Albert Q. Jiang](https://albertqjiang.github.io/), [Jia Deng](https://www.cs.princeton.edu/~jiadeng/), [Stella Biderman](https://www.stellabiderman.com/), [Sean Welleck](https://wellecks.com/)
|
| 16 |
|
|
|
|
| 57 |
|
| 58 |
### Citation
|
| 59 |
```
|
| 60 |
+
@misc{azerbayev2023llemma,
|
| 61 |
+
title={Llemma: An Open Language Model For Mathematics},
|
| 62 |
+
author={Zhangir Azerbayev and Hailey Schoelkopf and Keiran Paster and Marco Dos Santos and Stephen McAleer and Albert Q. Jiang and Jia Deng and Stella Biderman and Sean Welleck},
|
| 63 |
+
year={2023},
|
| 64 |
+
eprint={2310.10631},
|
| 65 |
+
archivePrefix={arXiv},
|
| 66 |
+
primaryClass={cs.CL}
|
| 67 |
}
|
| 68 |
```
|
| 69 |
|