WilhelmT commited on
Commit
a59054b
·
verified ·
1 Parent(s): 337df94

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -41,7 +41,7 @@ FlashHead matches the Llama-3.2-3B-Instruct baseline within rounding error on co
41
  ## Optimizations
42
 
43
  - **FlashHead LM Head** - lightweight replacement for the dense LM head, significantly improving throughput.
44
- - **Quantization (W4A16)** - large reduction in memory footprint and accuracy.
45
  - **Custom Runtime Integration** - compatible with **vLLM (0.10.2)** via the `embedl-models` package.
46
 
47
  ---
 
41
  ## Optimizations
42
 
43
  - **FlashHead LM Head** - lightweight replacement for the dense LM head, significantly improving throughput.
44
+ - **Quantization (W4A16)** - large reduction in memory footprint and latency.
45
  - **Custom Runtime Integration** - compatible with **vLLM (0.10.2)** via the `embedl-models` package.
46
 
47
  ---