d3LLM-model commited on
Commit
098382b
Β·
verified Β·
1 Parent(s): 5c882e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -3
README.md CHANGED
@@ -18,9 +18,7 @@ pipeline_tag: text-generation
18
 
19
  ## Key Features
20
 
21
- - πŸš€ **4.9Γ— faster** than autoregressive models (Qwen-2.5-7B-it) on H100 GPU
22
- - 🎯 **3.5Γ— faster** on A100 GPU
23
- - ⚑ **280.97 tokens/s** on H100 (vs 57.32 for AR baseline)
24
  - πŸ“Š High AUP (Accuracy Under Parallelism) scores across benchmarks
25
  - πŸ”§ Optimized for coding and math reasoning tasks
26
 
 
18
 
19
  ## Key Features
20
 
21
+ - πŸš€ High throughput: **5.0Γ— faster** than autoregressive models (Qwen-2.5-7B-it) on H100 GPU, **3.5Γ— faster** on A100 GPU. Achieves **288.73 tokens/s** on H100 (vs 57.32 for AR baseline).
 
 
22
  - πŸ“Š High AUP (Accuracy Under Parallelism) scores across benchmarks
23
  - πŸ”§ Optimized for coding and math reasoning tasks
24