qubitron
/

LLaDA-8B-Quantized

Text Generation

diffusion-language-model

Model card Files Files and versions

qubitron commited on Apr 10

Commit

a69905b

·

verified ·

1 Parent(s): 4432b35

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ This repository provides two post-training quantized variants of `GSAI-ML/LLaDA-
 | File | Quantization | Size | Memory Saved | Speed (A100) |
 |---|---|---|---|---|
 | `llada_int8_quantized.pt` | INT8 per-row | 8.54 GB | **47%** | **9.64 tok/s** |
-| `llada_int4_quantized.pt` | INT4 packed | 5.82 GB | **64%** | 3.39 tok/s |
 Original model (bfloat16): 16.13 GB

 | File | Quantization | Size | Memory Saved | Speed (A100) |
 |---|---|---|---|---|
 | `llada_int8_quantized.pt` | INT8 per-row | 8.54 GB | **47%** | **9.64 tok/s** |
+| `llada_int4_quantized.pt` | INT4 packed | 4.79 GB | **70%** | 3.39 tok/s |
 Original model (bfloat16): 16.13 GB