File size: 1,770 Bytes
c8e1d94
 
 
 
5788902
c8e1d94
 
 
5788902
c8e1d94
5788902
 
c8e1d94
5788902
 
 
 
 
 
 
 
 
 
 
 
 
 
8bc0c4c
 
5788902
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: mit
library_name: mlx
pipeline_tag: text-to-image
tags: [mlx, text-to-image, diffusion, lens, apple-silicon, quantized]
base_model: microsoft/Lens
---

# Lens-3.8B-4bit (MLX)

Apple **MLX** conversion of the [`microsoft/Lens`](https://huggingface.co/microsoft/Lens)
3.8B text-to-image DiT — **int4 (group_size 64), keeping the small in/out/time projections at bf16 for fidelity. ~3.5x smaller than bf16.** ~2.35 GB.

This repo contains the **DiT only** (MIT). The full pipeline also uses the GPT-OSS-20B text
encoder (Apache-2.0) and the FLUX.2 semantic VAE, pulled from source by the loader (see
**License**). Full-precision: [Lens-3.8B-bf16](https://huggingface.co/mlx-community/Lens-3.8B-bf16).

Fidelity: single-pass DiT cosine **0.9976** vs the PyTorch reference. Quantization changes the
denoise trajectory, so quantized samples differ in composition from bf16 but are equally sharp.

![sample](sample.png)

## Usage

```python
from lens_mlx.pipeline_mlx import LensPipeline   # github.com/xocialize-code/lens-mlx

# `base` = a microsoft/Lens snapshot providing the tokenizer, GPT-OSS encoder, and FLUX.2 VAE.
pipe = LensPipeline.from_pretrained(base, dit_repo="mlx-community/Lens-3.8B-4bit")
img = pipe("A serene lake below snow-capped mountains, golden hour.",
           height=1024, width=1024, num_inference_steps=20, seed=42)
img.save("out.png")
```

## License
- **DiT weights (this repo):** MIT, from `microsoft/Lens`.
- **GPT-OSS-20B encoder:** Apache-2.0 (reuse `mlx-community/gpt-oss-20b-MXFP4-*`).
- **FLUX.2 VAE:** its own (FLUX.2-dev) terms — **not re-hosted**; fetched from source.

Upstream: [microsoft/Lens](https://huggingface.co/microsoft/Lens) ·
MLX port: [xocialize-code/lens-mlx](https://github.com/xocialize-code/lens-mlx)