Update README.md
Browse files
README.md
CHANGED
|
@@ -60,7 +60,7 @@ python3 -m embedl.models.vllm.demo --model embedl/gemma-3-270m-it-FlashHead
|
|
| 60 |
| BF16 baseline | 397 | 1.0× |
|
| 61 |
| **FlashHead (Embedl)** | **526** | **1.32×** |
|
| 62 |
| W4A16 baseline | 420 | 1.06× |
|
| 63 |
-
| **FlashHead W4A16 (Embedl)** | **568** | **1.
|
| 64 |
|
| 65 |
FlashHead improves end-to-end speed by **1.35×** over state-of-the-art, while maintaining full accuracy parity.
|
| 66 |
|
|
|
|
| 60 |
| BF16 baseline | 397 | 1.0× |
|
| 61 |
| **FlashHead (Embedl)** | **526** | **1.32×** |
|
| 62 |
| W4A16 baseline | 420 | 1.06× |
|
| 63 |
+
| **FlashHead W4A16 (Embedl)** | **568** | **1.43×** |
|
| 64 |
|
| 65 |
FlashHead improves end-to-end speed by **1.35×** over state-of-the-art, while maintaining full accuracy parity.
|
| 66 |
|