Card: show loading the published converted weights via dit_repo

fe51870 verified 3 days ago

2.27 kB

	---
	license: mit
	library_name: mlx
	pipeline_tag: text-to-image
	tags:
	- mlx
	- text-to-image
	- diffusion
	- lens
	- apple-silicon
	base_model: microsoft/Lens
	---

	# Lens-3.8B-bf16 (MLX)

	Apple MLX conversion of the denoising transformer (DiT) from
	[`microsoft/Lens`](https://huggingface.co/microsoft/Lens) — a 3.8B-parameter foundational
	text-to-image model — for fast inference on Apple Silicon. bf16, full precision.

	This repo contains the DiT only (MIT-licensed). The full pipeline also uses the
	GPT-OSS-20B text encoder (Apache-2.0) and the FLUX.2 semantic VAE, which the loader pulls
	from their own sources rather than re-hosting here (see License below).

	\| component \| parity vs PyTorch reference \|
	\|---\|---\|
	\| GPT-OSS text features \| per-layer cosine ≈ 0.998 \|
	\| Lens DiT (this repo) \| cosine 0.999999 \|
	\| FLUX.2 VAE decode \| PSNR 57.65 dB \|
	\| full end-to-end image \| PSNR 45.26 dB \|

	Generates a 1024×1024 image in ~33 s on Apple Silicon (20 steps, ~39 GB peak).

	![sample](sample.png)

	## Usage

	```python
	from lens_mlx.pipeline_mlx import LensPipeline # github.com/xocialize-code/lens-mlx

	# `base` = a microsoft/Lens snapshot providing the tokenizer, GPT-OSS encoder, and FLUX.2 VAE.
	pipe = LensPipeline.from_pretrained(base, dit_repo="mlx-community/Lens-3.8B-bf16")
	img = pipe("A serene lake below snow-capped mountains, golden hour.",
	height=1024, width=1024, num_inference_steps=20, seed=42)
	img.save("out.png")
	```

	## Conversion

	Converted from `microsoft/Lens` with `recipes/convert_lens.py` (lens-mlx). The DiT is pure
	Linear + RMSNorm; weights map 1:1 (no transpose) and every tensor is materialized before
	save. Layer-by-layer parity against the PyTorch reference is in the lens-mlx test suite.

	## License

	- DiT weights (this repo): MIT, inherited from `microsoft/Lens`.
	- GPT-OSS-20B encoder: Apache-2.0 (not included; reuse the mlx-community MXFP4 repo).
	- FLUX.2 VAE: governed by its own (FLUX.2-dev) terms — not re-hosted here; the
	loader fetches it from source. Verify the VAE license for your use case.

	## Citation

	Upstream: [microsoft/Lens](https://huggingface.co/microsoft/Lens) ·
	MLX port: [xocialize-code/lens-mlx](https://github.com/xocialize-code/lens-mlx)