Lens-3.8B-8bit (MLX)

Apple MLX conversion of the microsoft/Lens 3.8B text-to-image DiT — int8 (group_size 64), keeping the small in/out/time projections at bf16. Higher-fidelity quant, ~2x smaller than bf16. ~4.39 GB.

This repo contains the DiT only (MIT). The full pipeline also uses the GPT-OSS-20B text encoder (Apache-2.0) and the FLUX.2 semantic VAE, pulled from source by the loader (see License). Full-precision: Lens-3.8B-bf16.

Fidelity: single-pass DiT cosine 0.99998 vs the PyTorch reference. Quantization changes the denoise trajectory, so quantized samples differ in composition from bf16 but are equally sharp.

Usage

from lens_mlx.pipeline_mlx import LensPipeline   # github.com/xocialize/lens-mlx

# `base` = a microsoft/Lens snapshot providing the tokenizer, GPT-OSS encoder, and FLUX.2 VAE.
pipe = LensPipeline.from_pretrained(base, dit_repo="mlx-community/Lens-3.8B-8bit")
img = pipe("A serene lake below snow-capped mountains, golden hour.",
           height=1024, width=1024, num_inference_steps=20, seed=42)
img.save("out.png")

License

DiT weights (this repo): MIT, from microsoft/Lens.
GPT-OSS-20B encoder: Apache-2.0 (reuse mlx-community/gpt-oss-20b-MXFP4-*).
FLUX.2 VAE: its own (FLUX.2-dev) terms — not re-hosted; fetched from source.

Upstream: microsoft/Lens · MLX port: xocialize/lens-mlx