mlx-community
/

Lance-3B-8bit

@@ -20,18 +20,25 @@ tags:
 base_model: bytedance-research/Lance
 ---
-> ⚠️ **KNOWN BROKEN — DO NOT USE FOR PRODUCTION.** Re-validation on 2026-05-22 shows
-> this 8-bit checkpoint produces visibly degraded t2i output (ghost subject + rainbow
-> striped artifacts) compared to the bf16 reference. The "production-ready" status
-> below was based on an unvalidated assumption that was not caught during the
-> original publish; we apologize for the regression. **Use [`mlx-community/Lance-3B-bf16`](https://huggingface.co/mlx-community/Lance-3B-bf16)
-> until a proper DWQ-calibrated quantization lands.** Tracking: [`xocialize/lance-mlx`](https://github.com/xocialize/lance-mlx)
-> Phase 5c (deferred).
-> 🛠 **Root cause:** standard mlx-lm `quantize_model` with affine 8-bit destroys quality
-> on Lance's MoE-gen tower (consistent with [`Reza2kn/lance-quant`](https://github.com/Reza2kn/lance-quant)'s
-> finding that Lance requires per-tower calibration). The fix needs DWQ (Dynamic
-> Weight Quantization) with calibration data, not just a re-quantize.
 ---

 base_model: bytedance-research/Lance
 ---
+> ⚠️ **SUPERSEDED — DO NOT USE.** This 8-bit checkpoint produces visibly degraded t2i
+> output (ghost subject + rainbow striped artifacts vs bf16). Kept on HF for historical
+> reproducibility of the May 2026 quantization research record only.
+>
+> **What to use instead:**
+> - For full-quality `t2i` / `image_edit` / `x2t_image`: [`mlx-community/Lance-3B-bf16`](https://huggingface.co/mlx-community/Lance-3B-bf16) (~15 GB)
+> - For compressed `x2t_image` (VQA) on 8-16 GB Macs: [`mlx-community/Lance-3B-AWQ-INT4`](https://huggingface.co/mlx-community/Lance-3B-AWQ-INT4) (5.65 GB repo, 3.31 GB LLM, 6-9× faster decode)
+> - For image generation on small RAM: **no quantized variant is shippable** — use bf16 on a Mac that fits it. Phase 5c-3h showed the 80% HF detail loss is architectural (forward-pass error compounding through Lance's 2,160 evaluations per image), not a quant-scheme problem.
+> 🎓 **Quantization research closed (2026-05-26).** The May 2026 effort
+> investigated naive groupwise 4/8-bit, DWQ (4-bit UND-only), and AWQ
+> (4-bit + 8-bit, full + UND-only) across multiple configurations. AWQ math
+> is correct per-Linear (Phase 5c-3h empirical confirmation: -28% output MSE
+> average at 8-bit) but per-step quant improvements don't compound through
+> Lance's flow-matching architecture. No quant scheme tested would close
+> the t2i gap; k-quants from llama.cpp would face the same compounding
+> problem. **Lance-3B-AWQ-INT4 is the final shipping outcome — VQA only.**
+> Full research record: [`xocialize/lance-mlx`](https://github.com/xocialize/lance-mlx)
+> under `notes/phase5n_diagnostics/phase5c3_awq_port/`.
 ---