Add Lens-3.8B-8bit DiT weights + model card

Browse files

Files changed (5) hide show

.gitattributes +1 -0
README.md +32 -6
config.json +38 -0
model.safetensors +3 -0
sample.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+sample.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -2,14 +2,40 @@
 license: mit
 library_name: mlx
 pipeline_tag: text-to-image
-tags: [mlx, text-to-image, diffusion, lens, apple-silicon]
 base_model: microsoft/Lens
 ---
-# Lens-3.8B-8bit (MLX) — coming soon
-`int8`/8bit MLX conversion of the [microsoft/Lens](https://huggingface.co/microsoft/Lens) 3.8B
-text-to-image DiT for Apple Silicon (~4.36 GB). Weights upload imminent.
-Full-precision release: [mlx-community/Lens-3.8B-bf16](https://huggingface.co/mlx-community/Lens-3.8B-bf16).
-Code: [xocialize-code/lens-mlx](https://github.com/xocialize-code/lens-mlx).

 license: mit
 library_name: mlx
 pipeline_tag: text-to-image
+tags: [mlx, text-to-image, diffusion, lens, apple-silicon, quantized]
 base_model: microsoft/Lens
 ---
+# Lens-3.8B-8bit (MLX)
+Apple **MLX** conversion of the [`microsoft/Lens`](https://huggingface.co/microsoft/Lens)
+3.8B text-to-image DiT — **int8 (group_size 64), keeping the small in/out/time projections at bf16. Higher-fidelity quant, ~2x smaller than bf16.** ~4.39 GB.
+This repo contains the **DiT only** (MIT). The full pipeline also uses the GPT-OSS-20B text
+encoder (Apache-2.0) and the FLUX.2 semantic VAE, pulled from source by the loader (see
+**License**). Full-precision: [Lens-3.8B-bf16](https://huggingface.co/mlx-community/Lens-3.8B-bf16).
+Fidelity: single-pass DiT cosine **0.99998** vs the PyTorch reference. Quantization changes the
+denoise trajectory, so quantized samples differ in composition from bf16 but are equally sharp.
+![sample](sample.png)
+## Usage
+```python
+import mlx.core as mx
+from lens_mlx.pipeline_mlx import LensPipeline   # github.com/xocialize-code/lens-mlx
+pipe = LensPipeline.from_pretrained("path/to/Lens", quantize_bits=8)
+img = pipe("A serene lake below snow-capped mountains, golden hour.",
+           height=1024, width=1024, num_inference_steps=20, seed=42)
+img.save("out.png")
+```
+## License
+- **DiT weights (this repo):** MIT, from `microsoft/Lens`.
+- **GPT-OSS-20B encoder:** Apache-2.0 (reuse `mlx-community/gpt-oss-20b-MXFP4-*`).
+- **FLUX.2 VAE:** its own (FLUX.2-dev) terms — **not re-hosted**; fetched from source.
+Upstream: [microsoft/Lens](https://huggingface.co/microsoft/Lens) ·
+MLX port: [xocialize-code/lens-mlx](https://github.com/xocialize-code/lens-mlx)

config.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "_class_name": "LensTransformer2DModel",
+  "_diffusers_version": "0.37.1",
+  "attention_head_dim": 64,
+  "axes_dims_rope": [
+    8,
+    28,
+    28
+  ],
+  "enc_hidden_dim": 2880,
+  "gate_mlp": true,
+  "in_channels": 128,
+  "inner_dim": 1536,
+  "multi_layer_encoder_feature": true,
+  "num_attention_heads": 24,
+  "num_layers": 48,
+  "out_channels": 32,
+  "patch_size": 2,
+  "rms_norm": true,
+  "selected_layer_index": [
+    5,
+    11,
+    17,
+    23
+  ],
+  "mlx_format": true,
+  "quantization": {
+    "group_size": 64,
+    "bits": 8,
+    "keep_hi_precision": [
+      "img_in",
+      "txt_in",
+      "proj_out",
+      "time_text_embed",
+      "norm_out"
+    ]
+  }
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:95d9af088a22132a1ab555bffff3861d0613d0537a65897eddbc0f21e685444a
+size 4386667927

sample.png ADDED Viewed

Git LFS Details

SHA256: 72a1dd91472fcfa390c2bfb159ce04a3c1ecc22547f4fae5259418376fed8eeb
Pointer size: 132 Bytes
Size of remote file: 1.4 MB