Update README.md
Browse files
README.md
CHANGED
|
@@ -15,6 +15,7 @@ This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-lla
|
|
| 15 |
|
| 16 |
## Running the model
|
| 17 |
|
|
|
|
| 18 |
```python
|
| 19 |
# pip install transformers accelerate
|
| 20 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
@@ -29,6 +30,23 @@ outputs = model.generate(**input_ids)
|
|
| 29 |
print(tokenizer.decode(outputs[0]))
|
| 30 |
```
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
## Evaluation Benchmark Results
|
| 33 |
|
| 34 |
Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
|
|
|
|
| 15 |
|
| 16 |
## Running the model
|
| 17 |
|
| 18 |
+
It can be run naively in transformers for testing purposes:
|
| 19 |
```python
|
| 20 |
# pip install transformers accelerate
|
| 21 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
|
|
| 30 |
print(tokenizer.decode(outputs[0]))
|
| 31 |
```
|
| 32 |
|
| 33 |
+
To take advantage of the 2:4 sparsity present, install [nm-vllm](https://github.com/neuralmagic/nm-vllm) for fast inference and low memory-usage:
|
| 34 |
+
```bash
|
| 35 |
+
pip install nm-vllm[sparse] --extra-index-url https://pypi.neuralmagic.com/simple
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
```python
|
| 39 |
+
from vllm import LLM, SamplingParams
|
| 40 |
+
|
| 41 |
+
model = LLM("nm-testing/SparseLlama-3-8B-pruned_50.2of4", sparsity="semi_structured_sparse_w16a16")
|
| 42 |
+
|
| 43 |
+
prompt = "A poem about Machine Learning goes as follows:"
|
| 44 |
+
sampling_params = SamplingParams(max_tokens=100, temperature=0)
|
| 45 |
+
|
| 46 |
+
outputs = model.generate(prompt, sampling_params=sampling_params)
|
| 47 |
+
print(outputs[0].outputs[0].text)
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
## Evaluation Benchmark Results
|
| 51 |
|
| 52 |
Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).
|