Update README.md
Browse files
README.md
CHANGED
|
@@ -41,7 +41,7 @@ FlashHead matches the Llama-3.2-3B-Instruct baseline within rounding error on co
|
|
| 41 |
## Optimizations
|
| 42 |
|
| 43 |
- **FlashHead LM Head** - lightweight replacement for the dense LM head, significantly improving throughput.
|
| 44 |
-
- **Quantization (W4A16)** - large reduction in memory footprint and
|
| 45 |
- **Custom Runtime Integration** - compatible with **vLLM (0.10.2)** via the `embedl-models` package.
|
| 46 |
|
| 47 |
---
|
|
|
|
| 41 |
## Optimizations
|
| 42 |
|
| 43 |
- **FlashHead LM Head** - lightweight replacement for the dense LM head, significantly improving throughput.
|
| 44 |
+
- **Quantization (W4A16)** - large reduction in memory footprint and latency.
|
| 45 |
- **Custom Runtime Integration** - compatible with **vLLM (0.10.2)** via the `embedl-models` package.
|
| 46 |
|
| 47 |
---
|