Instructions to use microsoft/Lens-Turbo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use microsoft/Lens-Turbo with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("microsoft/Lens-Turbo", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
Upload README.md
Browse files
README.md
CHANGED
|
@@ -1,8 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
<div align="center">
|
| 2 |
|
| 3 |
# Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
|
| 4 |
|
| 5 |
-
<img src="assets/teaser.
|
| 6 |
|
| 7 |
<p>
|
| 8 |
<sub>
|
|
@@ -11,17 +17,17 @@
|
|
| 11 |
<strong>Zhiyang Liang</strong>*,
|
| 12 |
<strong>Yang Yue</strong>*,
|
| 13 |
<strong>Jiawei Zhang</strong>*,
|
|
|
|
|
|
|
| 14 |
<strong>Qinhong Yang</strong>,
|
| 15 |
<strong>Yanchen Dong</strong>,
|
| 16 |
<strong>Yitong Wang</strong>,
|
| 17 |
<strong>Yunuo Chen</strong>,
|
| 18 |
<strong>Xiuyu Wu</strong>,
|
| 19 |
-
<strong>Fangyun Wei</strong>†,
|
| 20 |
-
<strong>Dong Chen</strong>†,
|
| 21 |
-
<strong>Dongdong Chen</strong>,
|
| 22 |
<strong>Ziyu Wan</strong>,
|
| 23 |
<strong>Lei Shi</strong>,
|
| 24 |
<strong>Ji Li</strong>,
|
|
|
|
| 25 |
<strong>Chong Luo</strong>,
|
| 26 |
<strong>Yan Lu</strong>,
|
| 27 |
<strong>Baining Guo</strong>
|
|
@@ -326,13 +332,13 @@ import torch
|
|
| 326 |
from lens import LensPipeline
|
| 327 |
|
| 328 |
pipe = LensPipeline.from_pretrained(
|
| 329 |
-
"microsoft/Lens
|
| 330 |
).to("cuda")
|
| 331 |
|
| 332 |
image = pipe(
|
| 333 |
prompt="A cat holding a sign that says \"hello world\"",
|
| 334 |
base_resolution=1440, aspect_ratio="1:1",
|
| 335 |
-
num_inference_steps=
|
| 336 |
generator=torch.Generator("cuda").manual_seed(0),
|
| 337 |
).images[0]
|
| 338 |
image.save("lens.png")
|
|
@@ -344,10 +350,10 @@ To trade speed for VRAM, replace `.to("cuda")` with `pipe.enable_model_cpu_offlo
|
|
| 344 |
|
| 345 |
```bash
|
| 346 |
python inference.py \
|
| 347 |
-
--repo_id "microsoft/Lens
|
| 348 |
--prompt "A cinematic mountain lake at sunrise, soft mist, detailed reflections" \
|
| 349 |
--base_resolution 1440 --aspect_ratio 1:1 \
|
| 350 |
-
--steps
|
| 351 |
--out ./outputs
|
| 352 |
```
|
| 353 |
|
|
@@ -355,8 +361,8 @@ python inference.py \
|
|
| 355 |
|
| 356 |
```bash
|
| 357 |
python inference.py \
|
| 358 |
-
--repo_id "microsoft/Lens
|
| 359 |
-
--steps
|
| 360 |
--prompt "a red fox in snow|a glass greenhouse at night"
|
| 361 |
```
|
| 362 |
|
|
@@ -364,8 +370,8 @@ python inference.py \
|
|
| 364 |
|
| 365 |
```bash
|
| 366 |
python inference.py \
|
| 367 |
-
--repo_id "microsoft/Lens
|
| 368 |
-
--steps
|
| 369 |
--prompt "a cat" \
|
| 370 |
--disable_mxfp4 --offload
|
| 371 |
```
|
|
@@ -401,7 +407,9 @@ python inference.py \
|
|
| 401 |
|
| 402 |
## Responsible AI
|
| 403 |
|
| 404 |
-
The
|
|
|
|
|
|
|
| 405 |
|
| 406 |
## Privacy
|
| 407 |
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
pipeline_tag: text-to-image
|
| 6 |
+
---
|
| 7 |
<div align="center">
|
| 8 |
|
| 9 |
# Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
|
| 10 |
|
| 11 |
+
<img src="assets/teaser.png" alt="Lens Teaser" width="100%" />
|
| 12 |
|
| 13 |
<p>
|
| 14 |
<sub>
|
|
|
|
| 17 |
<strong>Zhiyang Liang</strong>*,
|
| 18 |
<strong>Yang Yue</strong>*,
|
| 19 |
<strong>Jiawei Zhang</strong>*,
|
| 20 |
+
<strong>Fangyun Wei</strong>†,
|
| 21 |
+
<strong>Dong Chen</strong>†,
|
| 22 |
<strong>Qinhong Yang</strong>,
|
| 23 |
<strong>Yanchen Dong</strong>,
|
| 24 |
<strong>Yitong Wang</strong>,
|
| 25 |
<strong>Yunuo Chen</strong>,
|
| 26 |
<strong>Xiuyu Wu</strong>,
|
|
|
|
|
|
|
|
|
|
| 27 |
<strong>Ziyu Wan</strong>,
|
| 28 |
<strong>Lei Shi</strong>,
|
| 29 |
<strong>Ji Li</strong>,
|
| 30 |
+
<strong>Dongdong Chen</strong>,
|
| 31 |
<strong>Chong Luo</strong>,
|
| 32 |
<strong>Yan Lu</strong>,
|
| 33 |
<strong>Baining Guo</strong>
|
|
|
|
| 332 |
from lens import LensPipeline
|
| 333 |
|
| 334 |
pipe = LensPipeline.from_pretrained(
|
| 335 |
+
"microsoft/Lens", torch_dtype=torch.bfloat16
|
| 336 |
).to("cuda")
|
| 337 |
|
| 338 |
image = pipe(
|
| 339 |
prompt="A cat holding a sign that says \"hello world\"",
|
| 340 |
base_resolution=1440, aspect_ratio="1:1",
|
| 341 |
+
num_inference_steps=20, guidance_scale=5.0,
|
| 342 |
generator=torch.Generator("cuda").manual_seed(0),
|
| 343 |
).images[0]
|
| 344 |
image.save("lens.png")
|
|
|
|
| 350 |
|
| 351 |
```bash
|
| 352 |
python inference.py \
|
| 353 |
+
--repo_id "microsoft/Lens" \
|
| 354 |
--prompt "A cinematic mountain lake at sunrise, soft mist, detailed reflections" \
|
| 355 |
--base_resolution 1440 --aspect_ratio 1:1 \
|
| 356 |
+
--steps 20 --cfg 5.0 --n 1 --seed 42 \
|
| 357 |
--out ./outputs
|
| 358 |
```
|
| 359 |
|
|
|
|
| 361 |
|
| 362 |
```bash
|
| 363 |
python inference.py \
|
| 364 |
+
--repo_id "microsoft/Lens" \
|
| 365 |
+
--steps 20 --cfg 5.0 \
|
| 366 |
--prompt "a red fox in snow|a glass greenhouse at night"
|
| 367 |
```
|
| 368 |
|
|
|
|
| 370 |
|
| 371 |
```bash
|
| 372 |
python inference.py \
|
| 373 |
+
--repo_id "microsoft/Lens" \
|
| 374 |
+
--steps 20 --cfg 5.0 \
|
| 375 |
--prompt "a cat" \
|
| 376 |
--disable_mxfp4 --offload
|
| 377 |
```
|
|
|
|
| 407 |
|
| 408 |
## Responsible AI
|
| 409 |
|
| 410 |
+
The model is released for research purposes only and is not intended for product or service deployment. Responsible AI considerations were incorporated throughout the development process, including data selection, model training, and evaluation.
|
| 411 |
+
The training data includes a combination of public, licensed, and internal datasets that were processed to remove clearly identifiable personal information and reduce harmful content where possible. However, as the data is largely sourced from web-scale collections, it may contain biases or uneven representation. As a result, the model may generate outputs that are inaccurate, biased, or inappropriate under certain prompts, including content that could be misleading or raise copyright or IP-related concerns.
|
| 412 |
+
Given these limitations, the model should be used in controlled research settings, with appropriate human oversight. Downstream users are responsible for applying additional safeguards, such as content moderation, validation, and compliance checks, before using the model in broader applications.
|
| 413 |
|
| 414 |
## Privacy
|
| 415 |
|