qubitron commited on
Commit
a69905b
·
verified ·
1 Parent(s): 4432b35

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -27,7 +27,7 @@ This repository provides two post-training quantized variants of `GSAI-ML/LLaDA-
27
  | File | Quantization | Size | Memory Saved | Speed (A100) |
28
  |---|---|---|---|---|
29
  | `llada_int8_quantized.pt` | INT8 per-row | 8.54 GB | **47%** | **9.64 tok/s** |
30
- | `llada_int4_quantized.pt` | INT4 packed | 5.82 GB | **64%** | 3.39 tok/s |
31
 
32
  Original model (bfloat16): 16.13 GB
33
 
 
27
  | File | Quantization | Size | Memory Saved | Speed (A100) |
28
  |---|---|---|---|---|
29
  | `llada_int8_quantized.pt` | INT8 per-row | 8.54 GB | **47%** | **9.64 tok/s** |
30
+ | `llada_int4_quantized.pt` | INT4 packed | 4.79 GB | **70%** | 3.39 tok/s |
31
 
32
  Original model (bfloat16): 16.13 GB
33