WilhelmT commited on
Commit
4a14c74
·
verified ·
1 Parent(s): 4ab8c36

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -9
README.md CHANGED
@@ -50,24 +50,25 @@ FlashHead matches the baseline **Llama-3.2-3B-Instruct** within rounding on stan
50
 
51
  | **Precision** | **Tokens/sec** | **Speedup vs BF16** |
52
  |----------------|----------------|----------------------|
53
- | BF16 baseline | 130 | 1.0× |
54
- | **FlashHead (Embedl)** | **163** | **1.25×** |
55
- | W4A16 baseline | 278 | 2.14× |
56
- | **FlashHead W4A16 (Embedl)** | **485** | **3.73×** |
57
 
58
- FlashHead improves end-to-end speed by **1.75×** over state-of-the-art, while maintaining full accuracy parity.
59
 
60
  ---
61
 
62
  ## Accuracy (Parity with Baseline)
63
 
64
- | **Method** | **MMLU-Pro** | **HellaSwag** | **IFEval** | **BoolQ** | **BBH** | **TruthfulQA** | **GSM8K** |
65
- |-------------|---------------|----------------|--------------|-------------|-------------|----------------|--------------|
66
- | **Baseline** | 0.18 | 0.59 | 0.45 | 0.69 | 0.38 | 0.36 | 0.46 |
67
- | **FlashHead** | 0.18 | 0.59 | 0.45 | 0.69 | 0.38 | 0.36 | 0.46 |
68
 
69
  FlashHead matches baseline performance exactly across all evaluation benchmarks.
70
 
 
71
  ---
72
 
73
  ## Installation
 
50
 
51
  | **Precision** | **Tokens/sec** | **Speedup vs BF16** |
52
  |----------------|----------------|----------------------|
53
+ | BF16 baseline | 54 | 1.0× |
54
+ | **FlashHead (Embedl)** | **58** | **1.07×** |
55
+ | W4A16 baseline | 141 | 2.61× |
56
+ | **FlashHead W4A16 (Embedl)** | **177** | **3.28×** |
57
 
58
+ FlashHead improves end-to-end speed by **1.26×** over state-of-the-art, while maintaining full accuracy parity.
59
 
60
  ---
61
 
62
  ## Accuracy (Parity with Baseline)
63
 
64
+ | **Method** | **MMLU-Pro** | **IFEval** | **BBH** | **TruthfulQA** | **GSM8K** |
65
+ |-------------|---------------|-------------|-------------|----------------|--------------|
66
+ | **Baseline** | 0.31 | 0.57 | 0.57 | 0.57 | 0.77 |
67
+ | **FlashHead** | 0.31 | 0.56 | 0.57 | 0.58 | 0.77 |
68
 
69
  FlashHead matches baseline performance exactly across all evaluation benchmarks.
70
 
71
+
72
  ---
73
 
74
  ## Installation