Image-to-Image
MLX
Safetensors
English
Chinese
qwen2_5_vl
apple-silicon
lance
bytedance
multimodal
text-to-image
image-editing
vqa
qwen2.5-vl
quantized
8-bit precision
Instructions to use mlx-community/Lance-3B-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Lance-3B-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Lance-3B-8bit mlx-community/Lance-3B-8bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
| license: apache-2.0 | |
| language: | |
| - en | |
| - zh | |
| library_name: mlx | |
| pipeline_tag: image-to-image | |
| tags: | |
| - mlx | |
| - apple-silicon | |
| - lance | |
| - bytedance | |
| - multimodal | |
| - text-to-image | |
| - image-editing | |
| - vqa | |
| - qwen2.5-vl | |
| - quantized | |
| - 8-bit | |
| base_model: bytedance-research/Lance | |
| > β οΈ **SUPERSEDED β DO NOT USE.** This 8-bit checkpoint produces visibly degraded t2i | |
| > output (ghost subject + rainbow striped artifacts vs bf16). Kept on HF for historical | |
| > reproducibility of the May 2026 quantization research record only. | |
| > | |
| > **What to use instead:** | |
| > - For full-quality `t2i` / `image_edit` / `x2t_image`: [`mlx-community/Lance-3B-bf16`](https://huggingface.co/mlx-community/Lance-3B-bf16) (~15 GB) | |
| > - For compressed `x2t_image` (VQA) on 8-16 GB Macs: [`mlx-community/Lance-3B-AWQ-INT4`](https://huggingface.co/mlx-community/Lance-3B-AWQ-INT4) (5.65 GB repo, 3.31 GB LLM, 6-9Γ faster decode) | |
| > - For image generation on small RAM: **no quantized variant is shippable** β use bf16 on a Mac that fits it. Phase 5c-3h showed the 80% HF detail loss is architectural (forward-pass error compounding through Lance's 2,160 evaluations per image), not a quant-scheme problem. | |
| > π **Quantization research closed (2026-05-26).** The May 2026 effort | |
| > investigated naive groupwise 4/8-bit, DWQ (4-bit UND-only), and AWQ | |
| > (4-bit + 8-bit, full + UND-only) across multiple configurations. AWQ math | |
| > is correct per-Linear (Phase 5c-3h empirical confirmation: -28% output MSE | |
| > average at 8-bit) but per-step quant improvements don't compound through | |
| > Lance's flow-matching architecture. No quant scheme tested would close | |
| > the t2i gap; k-quants from llama.cpp would face the same compounding | |
| > problem. **Lance-3B-AWQ-INT4 is the final shipping outcome β VQA only.** | |
| > Full research record: [`xocialize/lance-mlx`](https://github.com/xocialize/lance-mlx) | |
| > under `notes/phase5n_diagnostics/phase5c3_awq_port/`. | |
| --- | |
| > π Part of the **[Lance MLX collection](https://huggingface.co/collections/mlx-community/lance-mlx-6a0f3cd5648a74f8283fc8a4)** on mlx-community. | |
| # Lance-3B-8bit (MLX, image specialist, 8-bit quantized) | |
| 8-bit groupwise affine quantization of [`mlx-community/Lance-3B-bf16`](https://huggingface.co/mlx-community/Lance-3B-bf16), the image-specialist Lance checkpoint. Produced via mlx-lm's `quantize_model` utility with a per-tower skip predicate (`time_embedder`, `llm2vae`, and `vae_in_proj` kept at bf16 for numerical safety; the bulk LLM weights β attention projections, MLP, embeddings, lm_head β quantized). | |
| ## Status | |
| π’ **Production-ready for image tasks on Apple Silicon as of 2026-05-21.** | |
| | Capability | Status | Speedup vs bf16 | | |
| |---|---|---| | |
| | t2i (text β image) | β Photorealistic, prompt-aligned | **~2.7Γ faster** (75 s vs 201 s for 768Β² Γ 30 steps Γ CFG=4.0) | | |
| | image_edit (instruction-based) | β Identity + style preservation | ~2.5Γ faster expected | | |
| | x2t_image (image VQA) | β Content-correct | similar / faster | | |
| **Memory footprint:** 6.59 GB on disk (53% of the bf16 12.37 GB). Runtime RAM ~8β10 GB, comfortable on a 16 GB Mac. | |
| ## Quality notes vs bf16 | |
| - **Photorealism + content fidelity preserved.** Cats, dragons, portraits, etc., all generate cleanly. | |
| - **Fine text on generated objects shows slight degradation.** E.g. "STOP" on a sign may render as "SNICS" or similar near-miss. The content is otherwise correct (correct color, correct rectangular sign shape, recognizable text-like glyphs). | |
| - For prompts that don't require legible in-image text, output is visually indistinguishable from bf16 to a casual eye. | |
| ## Quickstart | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| weights = snapshot_download("mlx-community/Lance-3B-8bit") | |
| ``` | |
| ### Text-to-image | |
| ```python | |
| from lance_mlx.pipeline.t2i import TextToImagePipeline | |
| pipe = TextToImagePipeline.from_pretrained( | |
| lance_weights_dir=weights, | |
| vae_safetensors=f"{weights}/vae.safetensors", | |
| ) | |
| image = pipe.generate( | |
| "A photorealistic tabby cat in a sunlit window.", | |
| height=768, width=768, num_steps=30, cfg_scale=4.0, seed=42, | |
| ) | |
| image.save("cat.png") | |
| ``` | |
| ### Image editing + VQA | |
| Same API as the bf16 variant β `ImageEditPipeline` and `UnderstandingPipeline` both pick up the `quantization` block in `config.json` automatically via `lance_mlx.model._loader.load_lance_model`. | |
| ## What's quantized vs skipped | |
| | Component | Quantization | Why | | |
| |---|---|---| | |
| | `embed_tokens` (151,936 Γ 2,048) | β 8-bit | Big, tolerant | | |
| | `lm_head` (151,936 Γ 2,048) | β 8-bit | Big, used in AR decode only | | |
| | 32 layers Γ `q/k/v/o_proj` (UND) | β 8-bit | Bulk of LLM compute | | |
| | 32 layers Γ `q/k/v/o_proj_moe_gen` (GEN) | β 8-bit | Bulk of GEN compute | | |
| | 32 layers Γ `mlp.{up,gate,down}_proj` | β 8-bit | Bulk of LLM compute | | |
| | 32 layers Γ `mlp_moe_gen.{up,gate,down}` | β 8-bit | Bulk of GEN compute | | |
| | `time_embedder.proj_in/out` | β bf16 | Timestep info, numerically sensitive | | |
| | `llm2vae` (flow head, 2048 Γ 48) | β bf16 | Tiny + critical to flow prediction | | |
| | `vae_in_proj.vae2llm` (2048 Γ 48) | β bf16 | Auto-skipped (input_dim 48 β 64*k) | | |
| | `latent_pos_embed.pos_embed` | β bf16 | Custom param holder, no `to_quantized` | | |
| | All RMSNorms + QK-norms | β bf16 | F32 / bf16 norm scales preserved | | |
| | Wan2.2 VAE (encoder + decoder) | β bf16 | Pixel fidelity matters | | |
| | Qwen2.5-VL ViT | β bf16 | Semantic fidelity matters for x2t | | |
| Recipe: 8-bit affine, group_size 64. `quantization_report.json` in this repo has full provenance. | |
| ## Why no Video 8-bit yet | |
| The video specialist (`Lance_3B_Video`) does **not** quantize cleanly to 8-bit with this recipe β t2v output collapses to a gray gradient regardless of whether the GEN tower is included or skipped, and finer group_sizes don't help. The video-specialist fine-tune has different weight distributions that affine 8-bit can't capture. | |
| Reza2kn/lance-quant's findings suggest **DWQ (dynamic weight quantization)** with calibration is the right approach for Lance video at 8-bit and below. That's a Phase 5c project. For now, use [`mlx-community/Lance-3B-Video-bf16`](https://huggingface.co/mlx-community/Lance-3B-Video-bf16) at bf16 for video tasks. | |
| ## Files in this repo | |
| | File | Size | Notes | | |
| |---|---|---| | |
| | `model.safetensors` | 6.59 GB | Quantized LLM weights (2033 tensors: each Linear becomes weight + scales + biases) | | |
| | `vit.safetensors` | 1.34 GB | bf16 (not quantized) | | |
| | `vae.safetensors` | 1.41 GB | bf16 (not quantized) | | |
| | `config.json` | β | With `quantization` block (`bits=8, group_size=64, mode=affine`) | | |
| | `quantization_report.json` | β | Provenance + footprint stats | | |
| | `tokenizer.json` / `vocab.json` | β | Qwen2.5-VL vocabulary | | |
| ## Architecture (same as the bf16 variant) | |
| See [`mlx-community/Lance-3B-bf16`](https://huggingface.co/mlx-community/Lance-3B-bf16) for the full architecture description. | |
| ## License | |
| This MLX port + quantization: **Apache 2.0**. | |
| Underlying weights: | |
| - Lance: Apache 2.0 (ByteDance Intelligent Creation Lab). | |
| - Wan2.2 VAE: Apache 2.0 (Alibaba). | |
| - Qwen2.5-VL: Apache 2.0 (Alibaba). | |
| ## Citation | |
| ```bibtex | |
| @article{fu2026lance, | |
| title={Lance: Unified Multimodal Modeling by Multi-Task Synergy}, | |
| author={Fu, Fengyi and Huang, Mengqi and Wu, Shaojin and others}, | |
| journal={arXiv preprint arXiv:2605.18678}, | |
| year={2026} | |
| } | |
| ``` | |
| ## Links | |
| - **MLX port code:** [`github.com/xocialize/lance-mlx`](https://github.com/xocialize/lance-mlx) | |
| - **bf16 source:** [`mlx-community/Lance-3B-bf16`](https://huggingface.co/mlx-community/Lance-3B-bf16) | |
| - **Standalone VAE:** [`mlx-community/Wan2.2-VAE-Lance-bf16`](https://huggingface.co/mlx-community/Wan2.2-VAE-Lance-bf16) | |
| - **Video specialist (bf16, alpha 8-bit pending):** [`mlx-community/Lance-3B-Video-bf16`](https://huggingface.co/mlx-community/Lance-3B-Video-bf16) | |