--- license: mit library_name: mlx pipeline_tag: text-to-image tags: [mlx, text-to-image, diffusion, lens, apple-silicon, quantized] base_model: microsoft/Lens --- # Lens-3.8B-8bit (MLX) Apple **MLX** conversion of the [`microsoft/Lens`](https://huggingface.co/microsoft/Lens) 3.8B text-to-image DiT — **int8 (group_size 64), keeping the small in/out/time projections at bf16. Higher-fidelity quant, ~2x smaller than bf16.** ~4.39 GB. This repo contains the **DiT only** (MIT). The full pipeline also uses the GPT-OSS-20B text encoder (Apache-2.0) and the FLUX.2 semantic VAE, pulled from source by the loader (see **License**). Full-precision: [Lens-3.8B-bf16](https://huggingface.co/mlx-community/Lens-3.8B-bf16). Fidelity: single-pass DiT cosine **0.99998** vs the PyTorch reference. Quantization changes the denoise trajectory, so quantized samples differ in composition from bf16 but are equally sharp. ![sample](sample.png) ## Usage ```python from lens_mlx.pipeline_mlx import LensPipeline # github.com/xocialize-code/lens-mlx # `base` = a microsoft/Lens snapshot providing the tokenizer, GPT-OSS encoder, and FLUX.2 VAE. pipe = LensPipeline.from_pretrained(base, dit_repo="mlx-community/Lens-3.8B-8bit") img = pipe("A serene lake below snow-capped mountains, golden hour.", height=1024, width=1024, num_inference_steps=20, seed=42) img.save("out.png") ``` ## License - **DiT weights (this repo):** MIT, from `microsoft/Lens`. - **GPT-OSS-20B encoder:** Apache-2.0 (reuse `mlx-community/gpt-oss-20b-MXFP4-*`). - **FLUX.2 VAE:** its own (FLUX.2-dev) terms — **not re-hosted**; fetched from source. Upstream: [microsoft/Lens](https://huggingface.co/microsoft/Lens) · MLX port: [xocialize-code/lens-mlx](https://github.com/xocialize-code/lens-mlx)