microsoft
/

Lens-Turbo

@@ -1,8 +1,14 @@
 <div align="center">
 # Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
-<img src="assets/teaser.webp" alt="Lens Teaser" width="100%" />
 <p>
   <sub>
@@ -11,17 +17,17 @@
     <strong>Zhiyang Liang</strong>&ast;,
     <strong>Yang Yue</strong>&ast;,
     <strong>Jiawei Zhang</strong>&ast;,
     <strong>Qinhong Yang</strong>,
     <strong>Yanchen Dong</strong>,
     <strong>Yitong Wang</strong>,
     <strong>Yunuo Chen</strong>,
     <strong>Xiuyu Wu</strong>,
-    <strong>Fangyun Wei</strong>&dagger;,
-    <strong>Dong Chen</strong>&dagger;,
-    <strong>Dongdong Chen</strong>,
     <strong>Ziyu Wan</strong>,
     <strong>Lei Shi</strong>,
     <strong>Ji Li</strong>,
     <strong>Chong Luo</strong>,
     <strong>Yan Lu</strong>,
     <strong>Baining Guo</strong>
@@ -326,13 +332,13 @@ import torch
 from lens import LensPipeline
 pipe = LensPipeline.from_pretrained(
-    "microsoft/Lens-Turbo", torch_dtype=torch.bfloat16
 ).to("cuda")
 image = pipe(
     prompt="A cat holding a sign that says \"hello world\"",
     base_resolution=1440, aspect_ratio="1:1",
-    num_inference_steps=4, guidance_scale=1.0,
     generator=torch.Generator("cuda").manual_seed(0),
 ).images[0]
 image.save("lens.png")
@@ -344,10 +350,10 @@ To trade speed for VRAM, replace `.to("cuda")` with `pipe.enable_model_cpu_offlo
 ```bash
 python inference.py \
-    --repo_id "microsoft/Lens-Turbo" \
     --prompt "A cinematic mountain lake at sunrise, soft mist, detailed reflections" \
     --base_resolution 1440 --aspect_ratio 1:1 \
-    --steps 4 --cfg 1.0 --n 1 --seed 42 \
     --out ./outputs
 ```
@@ -355,8 +361,8 @@ python inference.py \
 ```bash
 python inference.py \
-    --repo_id "microsoft/Lens-Turbo" \
-    --steps 4 --cfg 1.0 \
     --prompt "a red fox in snow|a glass greenhouse at night"
 ```
@@ -364,8 +370,8 @@ python inference.py \
 ```bash
 python inference.py \
-    --repo_id "microsoft/Lens-Turbo" \
-    --steps 4 --cfg 1.0 \
     --prompt "a cat" \
     --disable_mxfp4 --offload
 ```
@@ -401,7 +407,9 @@ python inference.py \
 ## Responsible AI
-The release is intended for research purposes only and does not involve any product or service deployment. Responsible AI considerations were factored into all stages. The datasets used in this paper are public and have been reviewed to ensure there is no personally identifiable information or harmful content. However, as these datasets are sourced from the Internet, potential bias may still be present.
 ## Privacy

+---
+license: mit
+language:
+- en
+pipeline_tag: text-to-image
+---
 <div align="center">
 # Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
+<img src="assets/teaser.png" alt="Lens Teaser" width="100%" />
 <p>
   <sub>
     <strong>Zhiyang Liang</strong>&ast;,
     <strong>Yang Yue</strong>&ast;,
     <strong>Jiawei Zhang</strong>&ast;,
+    <strong>Fangyun Wei</strong>&dagger;,
+    <strong>Dong Chen</strong>&dagger;,
     <strong>Qinhong Yang</strong>,
     <strong>Yanchen Dong</strong>,
     <strong>Yitong Wang</strong>,
     <strong>Yunuo Chen</strong>,
     <strong>Xiuyu Wu</strong>,
     <strong>Ziyu Wan</strong>,
     <strong>Lei Shi</strong>,
     <strong>Ji Li</strong>,
+    <strong>Dongdong Chen</strong>,
     <strong>Chong Luo</strong>,
     <strong>Yan Lu</strong>,
     <strong>Baining Guo</strong>
 from lens import LensPipeline
 pipe = LensPipeline.from_pretrained(
+    "microsoft/Lens", torch_dtype=torch.bfloat16
 ).to("cuda")
 image = pipe(
     prompt="A cat holding a sign that says \"hello world\"",
     base_resolution=1440, aspect_ratio="1:1",
+    num_inference_steps=20, guidance_scale=5.0,
     generator=torch.Generator("cuda").manual_seed(0),
 ).images[0]
 image.save("lens.png")
 ```bash
 python inference.py \
+    --repo_id "microsoft/Lens" \
     --prompt "A cinematic mountain lake at sunrise, soft mist, detailed reflections" \
     --base_resolution 1440 --aspect_ratio 1:1 \
+    --steps 20 --cfg 5.0 --n 1 --seed 42 \
     --out ./outputs
 ```
 ```bash
 python inference.py \
+    --repo_id "microsoft/Lens" \
+    --steps 20 --cfg 5.0 \
     --prompt "a red fox in snow|a glass greenhouse at night"
 ```
 ```bash
 python inference.py \
+    --repo_id "microsoft/Lens" \
+    --steps 20 --cfg 5.0 \
     --prompt "a cat" \
     --disable_mxfp4 --offload
 ```
 ## Responsible AI
+The model is released for research purposes only and is not intended for product or service deployment. Responsible AI considerations were incorporated throughout the development process, including data selection, model training, and evaluation.
+The training data includes a combination of public, licensed, and internal datasets that were processed to remove clearly identifiable personal information and reduce harmful content where possible. However, as the data is largely sourced from web-scale collections, it may contain biases or uneven representation. As a result, the model may generate outputs that are inaccurate, biased, or inappropriate under certain prompts, including content that could be misleading or raise copyright or IP-related concerns.
+Given these limitations, the model should be used in controlled research settings, with appropriate human oversight. Downstream users are responsible for applying additional safeguards, such as content moderation, validation, and compliance checks, before using the model in broader applications.
 ## Privacy