TheStageAI
/

Elastic-DeepSeek-R1-Distill-Llama-8B

psynote123 commited on Apr 25, 2025

Commit

512099f

verified ·

1 Parent(s): e894e37

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -57,7 +57,7 @@ model = AutoModelForCausalLM.from_pretrained(
     token=hf_token,
     torch_dtype=torch.bfloat16,
     attn_implementation="sdpa",
-    mode='s'
 ).to(device)
 model.generation_config.pad_token_id = tokenizer.eos_token_id
@@ -153,7 +153,7 @@ __100 input/300 output; tok/s:__
 | GPU/Model | S   | M | L | XL | Original | W8A8, int8 |
 |-----------|-----|---|---|----|----------|------------|
 | H100 | 194 | 191 | 161 | 131 | 58 | 198 | - |
-| L40S | -1 | -1 | -1 | -1 | -1 | -1 | - |

     token=hf_token,
     torch_dtype=torch.bfloat16,
     attn_implementation="sdpa",
+    mode='S'
 ).to(device)
 model.generation_config.pad_token_id = tokenizer.eos_token_id
 | GPU/Model | S   | M | L | XL | Original | W8A8, int8 |
 |-----------|-----|---|---|----|----------|------------|
 | H100 | 194 | 191 | 161 | 131 | 58 | 198 | - |
+| L40S | 72 | 70 | 56 | 44 | 40 | 74 | - |