Lens-3.8B-bf16 / README.md
xocialize's picture
Card: show loading the published converted weights via dit_repo
fe51870 verified
---
license: mit
library_name: mlx
pipeline_tag: text-to-image
tags:
- mlx
- text-to-image
- diffusion
- lens
- apple-silicon
base_model: microsoft/Lens
---
# Lens-3.8B-bf16 (MLX)
Apple **MLX** conversion of the denoising transformer (DiT) from
[`microsoft/Lens`](https://huggingface.co/microsoft/Lens) — a 3.8B-parameter foundational
text-to-image model — for fast inference on Apple Silicon. **bf16**, full precision.
This repo contains the **DiT only** (MIT-licensed). The full pipeline also uses the
GPT-OSS-20B text encoder (Apache-2.0) and the FLUX.2 semantic VAE, which the loader pulls
from their own sources rather than re-hosting here (see **License** below).
| component | parity vs PyTorch reference |
|---|---|
| GPT-OSS text features | per-layer cosine ≈ 0.998 |
| **Lens DiT (this repo)** | **cosine 0.999999** |
| FLUX.2 VAE decode | PSNR 57.65 dB |
| full end-to-end image | PSNR 45.26 dB |
Generates a 1024×1024 image in ~33 s on Apple Silicon (20 steps, ~39 GB peak).
![sample](sample.png)
## Usage
```python
from lens_mlx.pipeline_mlx import LensPipeline # github.com/xocialize-code/lens-mlx
# `base` = a microsoft/Lens snapshot providing the tokenizer, GPT-OSS encoder, and FLUX.2 VAE.
pipe = LensPipeline.from_pretrained(base, dit_repo="mlx-community/Lens-3.8B-bf16")
img = pipe("A serene lake below snow-capped mountains, golden hour.",
height=1024, width=1024, num_inference_steps=20, seed=42)
img.save("out.png")
```
## Conversion
Converted from `microsoft/Lens` with `recipes/convert_lens.py` (lens-mlx). The DiT is pure
Linear + RMSNorm; weights map 1:1 (no transpose) and every tensor is materialized before
save. Layer-by-layer parity against the PyTorch reference is in the lens-mlx test suite.
## License
- **DiT weights (this repo):** MIT, inherited from `microsoft/Lens`.
- **GPT-OSS-20B encoder:** Apache-2.0 (not included; reuse the mlx-community MXFP4 repo).
- **FLUX.2 VAE:** governed by its own (FLUX.2-dev) terms — **not re-hosted here**; the
loader fetches it from source. Verify the VAE license for your use case.
## Citation
Upstream: [microsoft/Lens](https://huggingface.co/microsoft/Lens) ·
MLX port: [xocialize-code/lens-mlx](https://github.com/xocialize-code/lens-mlx)