Update README.md
Browse files
README.md
CHANGED
|
@@ -48,21 +48,21 @@ FlashHead matches the baseline **Llama-3.2-3B** within rounding on standard eval
|
|
| 48 |
|
| 49 |
| **Precision** | **Tokens/sec** | **Speedup vs BF16** |
|
| 50 |
|----------------|----------------|----------------------|
|
| 51 |
-
| BF16 baseline |
|
| 52 |
-
| **FlashHead (Embedl)** | **
|
| 53 |
-
| W4A16 baseline |
|
| 54 |
-
| **FlashHead W4A16 (Embedl)** | **
|
| 55 |
|
| 56 |
-
FlashHead improves end-to-end speed by **1.
|
| 57 |
|
| 58 |
---
|
| 59 |
|
| 60 |
## Accuracy (Parity with Baseline)
|
| 61 |
|
| 62 |
-
| **Method** | **MMLU-Pro** | **
|
| 63 |
|-------------|---------------|----------------|--------------|-------------|-------------|----------------|--------------|
|
| 64 |
-
| **Baseline** | 0.
|
| 65 |
-
| **FlashHead** | 0.
|
| 66 |
|
| 67 |
FlashHead matches baseline performance exactly across all evaluation benchmarks.
|
| 68 |
|
|
|
|
| 48 |
|
| 49 |
| **Precision** | **Tokens/sec** | **Speedup vs BF16** |
|
| 50 |
|----------------|----------------|----------------------|
|
| 51 |
+
| BF16 baseline | 54 | 1.0× |
|
| 52 |
+
| **FlashHead (Embedl)** | **58** | **1.07×** |
|
| 53 |
+
| W4A16 baseline | 141 | 2.61× |
|
| 54 |
+
| **FlashHead W4A16 (Embedl)** | **177** | **3.28×** |
|
| 55 |
|
| 56 |
+
FlashHead improves end-to-end speed by **1.26×** over state-of-the-art, while maintaining full accuracy parity.
|
| 57 |
|
| 58 |
---
|
| 59 |
|
| 60 |
## Accuracy (Parity with Baseline)
|
| 61 |
|
| 62 |
+
| **Method** | **MMLU-Pro** | **IFEval** | **BBH** | **TruthfulQA** | **GSM8K** |
|
| 63 |
|-------------|---------------|----------------|--------------|-------------|-------------|----------------|--------------|
|
| 64 |
+
| **Baseline** | 0.31 | 0.57 | 0.57 | 0.57 | 0.77 |
|
| 65 |
+
| **FlashHead** | 0.31 | 0.56 | 0.57 | 0.58 | 0.77 |
|
| 66 |
|
| 67 |
FlashHead matches baseline performance exactly across all evaluation benchmarks.
|
| 68 |
|