DongChenMSRA commited on
Commit
f3cab7b
·
verified ·
1 Parent(s): 1ac4fb2

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -13
README.md CHANGED
@@ -1,8 +1,14 @@
 
 
 
 
 
 
1
  <div align="center">
2
 
3
  # Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
4
 
5
- <img src="assets/teaser.webp" alt="Lens Teaser" width="100%" />
6
 
7
  <p>
8
  <sub>
@@ -11,17 +17,17 @@
11
  <strong>Zhiyang Liang</strong>&ast;,
12
  <strong>Yang Yue</strong>&ast;,
13
  <strong>Jiawei Zhang</strong>&ast;,
 
 
14
  <strong>Qinhong Yang</strong>,
15
  <strong>Yanchen Dong</strong>,
16
  <strong>Yitong Wang</strong>,
17
  <strong>Yunuo Chen</strong>,
18
  <strong>Xiuyu Wu</strong>,
19
- <strong>Fangyun Wei</strong>&dagger;,
20
- <strong>Dong Chen</strong>&dagger;,
21
- <strong>Dongdong Chen</strong>,
22
  <strong>Ziyu Wan</strong>,
23
  <strong>Lei Shi</strong>,
24
  <strong>Ji Li</strong>,
 
25
  <strong>Chong Luo</strong>,
26
  <strong>Yan Lu</strong>,
27
  <strong>Baining Guo</strong>
@@ -326,13 +332,13 @@ import torch
326
  from lens import LensPipeline
327
 
328
  pipe = LensPipeline.from_pretrained(
329
- "microsoft/Lens-Base", torch_dtype=torch.bfloat16
330
  ).to("cuda")
331
 
332
  image = pipe(
333
  prompt="A cat holding a sign that says \"hello world\"",
334
  base_resolution=1440, aspect_ratio="1:1",
335
- num_inference_steps=50, guidance_scale=5.0,
336
  generator=torch.Generator("cuda").manual_seed(0),
337
  ).images[0]
338
  image.save("lens.png")
@@ -344,10 +350,10 @@ To trade speed for VRAM, replace `.to("cuda")` with `pipe.enable_model_cpu_offlo
344
 
345
  ```bash
346
  python inference.py \
347
- --repo_id "microsoft/Lens-Base" \
348
  --prompt "A cinematic mountain lake at sunrise, soft mist, detailed reflections" \
349
  --base_resolution 1440 --aspect_ratio 1:1 \
350
- --steps 50 --cfg 5.0 --n 1 --seed 42 \
351
  --out ./outputs
352
  ```
353
 
@@ -355,8 +361,8 @@ python inference.py \
355
 
356
  ```bash
357
  python inference.py \
358
- --repo_id "microsoft/Lens-Base" \
359
- --steps 50 --cfg 5.0 \
360
  --prompt "a red fox in snow|a glass greenhouse at night"
361
  ```
362
 
@@ -364,8 +370,8 @@ python inference.py \
364
 
365
  ```bash
366
  python inference.py \
367
- --repo_id "microsoft/Lens-Base" \
368
- --steps 50 --cfg 5.0 \
369
  --prompt "a cat" \
370
  --disable_mxfp4 --offload
371
  ```
@@ -401,7 +407,9 @@ python inference.py \
401
 
402
  ## Responsible AI
403
 
404
- The release is intended for research purposes only and does not involve any product or service deployment. Responsible AI considerations were factored into all stages. The datasets used in this paper are public and have been reviewed to ensure there is no personally identifiable information or harmful content. However, as these datasets are sourced from the Internet, potential bias may still be present.
 
 
405
 
406
  ## Privacy
407
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ pipeline_tag: text-to-image
6
+ ---
7
  <div align="center">
8
 
9
  # Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
10
 
11
+ <img src="assets/teaser.png" alt="Lens Teaser" width="100%" />
12
 
13
  <p>
14
  <sub>
 
17
  <strong>Zhiyang Liang</strong>&ast;,
18
  <strong>Yang Yue</strong>&ast;,
19
  <strong>Jiawei Zhang</strong>&ast;,
20
+ <strong>Fangyun Wei</strong>&dagger;,
21
+ <strong>Dong Chen</strong>&dagger;,
22
  <strong>Qinhong Yang</strong>,
23
  <strong>Yanchen Dong</strong>,
24
  <strong>Yitong Wang</strong>,
25
  <strong>Yunuo Chen</strong>,
26
  <strong>Xiuyu Wu</strong>,
 
 
 
27
  <strong>Ziyu Wan</strong>,
28
  <strong>Lei Shi</strong>,
29
  <strong>Ji Li</strong>,
30
+ <strong>Dongdong Chen</strong>,
31
  <strong>Chong Luo</strong>,
32
  <strong>Yan Lu</strong>,
33
  <strong>Baining Guo</strong>
 
332
  from lens import LensPipeline
333
 
334
  pipe = LensPipeline.from_pretrained(
335
+ "microsoft/Lens", torch_dtype=torch.bfloat16
336
  ).to("cuda")
337
 
338
  image = pipe(
339
  prompt="A cat holding a sign that says \"hello world\"",
340
  base_resolution=1440, aspect_ratio="1:1",
341
+ num_inference_steps=20, guidance_scale=5.0,
342
  generator=torch.Generator("cuda").manual_seed(0),
343
  ).images[0]
344
  image.save("lens.png")
 
350
 
351
  ```bash
352
  python inference.py \
353
+ --repo_id "microsoft/Lens" \
354
  --prompt "A cinematic mountain lake at sunrise, soft mist, detailed reflections" \
355
  --base_resolution 1440 --aspect_ratio 1:1 \
356
+ --steps 20 --cfg 5.0 --n 1 --seed 42 \
357
  --out ./outputs
358
  ```
359
 
 
361
 
362
  ```bash
363
  python inference.py \
364
+ --repo_id "microsoft/Lens" \
365
+ --steps 20 --cfg 5.0 \
366
  --prompt "a red fox in snow|a glass greenhouse at night"
367
  ```
368
 
 
370
 
371
  ```bash
372
  python inference.py \
373
+ --repo_id "microsoft/Lens" \
374
+ --steps 20 --cfg 5.0 \
375
  --prompt "a cat" \
376
  --disable_mxfp4 --offload
377
  ```
 
407
 
408
  ## Responsible AI
409
 
410
+ The model is released for research purposes only and is not intended for product or service deployment. Responsible AI considerations were incorporated throughout the development process, including data selection, model training, and evaluation.
411
+ The training data includes a combination of public, licensed, and internal datasets that were processed to remove clearly identifiable personal information and reduce harmful content where possible. However, as the data is largely sourced from web-scale collections, it may contain biases or uneven representation. As a result, the model may generate outputs that are inaccurate, biased, or inappropriate under certain prompts, including content that could be misleading or raise copyright or IP-related concerns.
412
+ Given these limitations, the model should be used in controlled research settings, with appropriate human oversight. Downstream users are responsible for applying additional safeguards, such as content moderation, validation, and compliance checks, before using the model in broader applications.
413
 
414
  ## Privacy
415