xocialize commited on
Commit
56bcf8e
·
verified ·
1 Parent(s): f215076

Update banner: mark superseded by Lance-3B-AWQ-INT4 for VQA; reflect 5c-3h research closure

Browse files
Files changed (1) hide show
  1. README.md +19 -12
README.md CHANGED
@@ -20,18 +20,25 @@ tags:
20
  base_model: bytedance-research/Lance
21
  ---
22
 
23
- > ⚠️ **KNOWN BROKEN — DO NOT USE FOR PRODUCTION.** Re-validation on 2026-05-22 shows
24
- > this 8-bit checkpoint produces visibly degraded t2i output (ghost subject + rainbow
25
- > striped artifacts) compared to the bf16 reference. The "production-ready" status
26
- > below was based on an unvalidated assumption that was not caught during the
27
- > original publish; we apologize for the regression. **Use [`mlx-community/Lance-3B-bf16`](https://huggingface.co/mlx-community/Lance-3B-bf16)
28
- > until a proper DWQ-calibrated quantization lands.** Tracking: [`xocialize/lance-mlx`](https://github.com/xocialize/lance-mlx)
29
- > Phase 5c (deferred).
30
-
31
- > 🛠 **Root cause:** standard mlx-lm `quantize_model` with affine 8-bit destroys quality
32
- > on Lance's MoE-gen tower (consistent with [`Reza2kn/lance-quant`](https://github.com/Reza2kn/lance-quant)'s
33
- > finding that Lance requires per-tower calibration). The fix needs DWQ (Dynamic
34
- > Weight Quantization) with calibration data, not just a re-quantize.
 
 
 
 
 
 
 
35
 
36
  ---
37
 
 
20
  base_model: bytedance-research/Lance
21
  ---
22
 
23
+ > ⚠️ **SUPERSEDED — DO NOT USE.** This 8-bit checkpoint produces visibly degraded t2i
24
+ > output (ghost subject + rainbow striped artifacts vs bf16). Kept on HF for historical
25
+ > reproducibility of the May 2026 quantization research record only.
26
+ >
27
+ > **What to use instead:**
28
+ > - For full-quality `t2i` / `image_edit` / `x2t_image`: [`mlx-community/Lance-3B-bf16`](https://huggingface.co/mlx-community/Lance-3B-bf16) (~15 GB)
29
+ > - For compressed `x2t_image` (VQA) on 8-16 GB Macs: [`mlx-community/Lance-3B-AWQ-INT4`](https://huggingface.co/mlx-community/Lance-3B-AWQ-INT4) (5.65 GB repo, 3.31 GB LLM, 6-9× faster decode)
30
+ > - For image generation on small RAM: **no quantized variant is shippable** — use bf16 on a Mac that fits it. Phase 5c-3h showed the 80% HF detail loss is architectural (forward-pass error compounding through Lance's 2,160 evaluations per image), not a quant-scheme problem.
31
+
32
+ > 🎓 **Quantization research closed (2026-05-26).** The May 2026 effort
33
+ > investigated naive groupwise 4/8-bit, DWQ (4-bit UND-only), and AWQ
34
+ > (4-bit + 8-bit, full + UND-only) across multiple configurations. AWQ math
35
+ > is correct per-Linear (Phase 5c-3h empirical confirmation: -28% output MSE
36
+ > average at 8-bit) but per-step quant improvements don't compound through
37
+ > Lance's flow-matching architecture. No quant scheme tested would close
38
+ > the t2i gap; k-quants from llama.cpp would face the same compounding
39
+ > problem. **Lance-3B-AWQ-INT4 is the final shipping outcome — VQA only.**
40
+ > Full research record: [`xocialize/lance-mlx`](https://github.com/xocialize/lance-mlx)
41
+ > under `notes/phase5n_diagnostics/phase5c3_awq_port/`.
42
 
43
  ---
44