Card: show loading the published converted weights via dit_repo

a1effda verified about 14 hours ago

1.78 kB

	---
	license: mit
	library_name: mlx
	pipeline_tag: text-to-image
	tags: [mlx, text-to-image, diffusion, lens, apple-silicon, quantized]
	base_model: microsoft/Lens
	---

	# Lens-3.8B-8bit (MLX)

	Apple MLX conversion of the [`microsoft/Lens`](https://huggingface.co/microsoft/Lens)
	3.8B text-to-image DiT — int8 (group_size 64), keeping the small in/out/time projections at bf16. Higher-fidelity quant, ~2x smaller than bf16. ~4.39 GB.

	This repo contains the DiT only (MIT). The full pipeline also uses the GPT-OSS-20B text
	encoder (Apache-2.0) and the FLUX.2 semantic VAE, pulled from source by the loader (see
	License). Full-precision: [Lens-3.8B-bf16](https://huggingface.co/mlx-community/Lens-3.8B-bf16).

	Fidelity: single-pass DiT cosine 0.99998 vs the PyTorch reference. Quantization changes the
	denoise trajectory, so quantized samples differ in composition from bf16 but are equally sharp.

	![sample](sample.png)

	## Usage

	```python
	from lens_mlx.pipeline_mlx import LensPipeline # github.com/xocialize-code/lens-mlx

	# `base` = a microsoft/Lens snapshot providing the tokenizer, GPT-OSS encoder, and FLUX.2 VAE.
	pipe = LensPipeline.from_pretrained(base, dit_repo="mlx-community/Lens-3.8B-8bit")
	img = pipe("A serene lake below snow-capped mountains, golden hour.",
	height=1024, width=1024, num_inference_steps=20, seed=42)
	img.save("out.png")
	```

	## License
	- DiT weights (this repo): MIT, from `microsoft/Lens`.
	- GPT-OSS-20B encoder: Apache-2.0 (reuse `mlx-community/gpt-oss-20b-MXFP4-*`).
	- FLUX.2 VAE: its own (FLUX.2-dev) terms — not re-hosted; fetched from source.

	Upstream: [microsoft/Lens](https://huggingface.co/microsoft/Lens) ·
	MLX port: [xocialize-code/lens-mlx](https://github.com/xocialize-code/lens-mlx)