Langitzt
/

Langitzt commited on
Commit
2f73e01
·
verified ·
1 Parent(s): ece172c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -250
README.md CHANGED
@@ -1,421 +1,280 @@
1
  ---
2
  license: mit
3
- language:
4
- - en
5
  pipeline_tag: text-to-image
 
 
 
 
 
 
 
6
  ---
 
7
  <div align="center">
8
 
9
- # Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
 
 
 
 
10
 
11
- <img src="assets/teaser.webp" alt="Lens Teaser" width="100%" />
 
12
 
13
- <p>
14
- <b>Contributors (Alphabetical Order):</b><br />
15
- <strong>Baining Guo</strong>,
16
- <strong>Chong Luo</strong>,
17
- <strong>Dong Chen</strong>&dagger;,
18
- <strong>Dongdong Chen</strong>,
19
- <strong>Fangyun Wei</strong>&dagger;,
20
- <strong>Ji Li</strong>,
21
- <strong>Jianmin Bao</strong>,
22
- <strong>Jiawei Zhang</strong>&ast;,
23
- <strong>Jinjing Zhao</strong>&ast;,
24
- <strong>Lei Shi</strong>,
25
- <strong>Qinhong Yang</strong>,
26
- <strong>Sirui Zhang</strong>&ast;,
27
- <strong>Xiuyu Wu</strong>,
28
- <strong>Xuelu Feng</strong>,
29
- <strong>Yan Lu</strong>,
30
- <strong>Yanchen Dong</strong>,
31
- <strong>Yang Yue</strong>&ast;,
32
- <strong>Yitong Wang</strong>,
33
- <strong>Yunuo Chen</strong>,
34
- <strong>Zhiyang Liang</strong>&ast;,
35
- <strong>Ziyu Wan</strong>&dagger;
36
- <br />
37
- Microsoft &nbsp;|&nbsp; &ast;Core Contributors &nbsp;|&nbsp; &dagger;Project Lead
38
- </p>
39
 
40
- <p>
41
- <a href="https://arxiv.org/abs/2605.21573"><img alt="arXiv" src="https://img.shields.io/badge/arXiv-Paper-b31b1b?logo=arxiv&logoColor=white" height="22" /></a>
42
- &nbsp;
43
- <a href="https://huggingface.co/microsoft/Lens"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97-Models-yellow" height="22" /></a>
44
- &nbsp;
45
- <a href="https://github.com/microsoft/Lens"><img alt="GitHub" src="https://img.shields.io/badge/GitHub-Repo-181717?logo=github&logoColor=white" height="22" /></a>
46
- &nbsp;
47
- <a href="LICENSE"><img alt="License: MIT" src="https://img.shields.io/badge/License-MIT-green.svg" height="22" /></a>
48
- </p>
49
 
50
  </div>
51
 
52
  ---
53
 
54
- **Lens** is a **3.8B-parameter** foundational text-to-image model designed for **efficient training** and **fast high-resolution generation**. It combines dense-caption pre-training, mixed-resolution learning, GPT-OSS multi-layer text features, and the FLUX.2 semantic VAE to reach competitive quality with substantially less training compute than larger T2I models.
55
-
56
- This repository provides the minimal inference code for generating images from Lens DiT checkpoints.
57
 
58
- ## Highlights
59
 
60
- - **Efficient Foundation** &mdash; Trained on **Lens-800M**, an 800M image-text corpus with long GPT-4.1 captions, maximizing information density per training batch.
61
- - **Compact & Expressive** &mdash; A 48-block MMDiT denoiser leverages FLUX.2 latents and concatenated multi-layer GPT-OSS features for stronger prompt following and multilingual generalization.
62
- - **Flexible Resolution** &mdash; Mixed-resolution training enables inference across aspect ratios from `1:2` to `2:1` and resolutions up to **1440&times;1440**.
63
- - **Post-trained Variants** &mdash; RL tuning improves visual quality and artifact suppression; the distilled **Lens-Turbo** supports fast **4-step** generation.
 
64
 
65
- ## Gallery
66
-
67
- <!-- LENS_GALLERY_START -->
68
-
69
- <details name="lens-gallery" open>
70
- <summary><b>Page 1 / 6</b> &nbsp; samples 000-005</summary>
71
 
72
  <table>
73
  <tr>
74
  <td width="33%" valign="top">
75
- <img src="assets/gallery/000-1440x1440.png" alt="Lens gallery sample 000" width="100%" />
76
  <br />
77
- <sub><b>Sample 000</b> &middot; 1440x1440<br />A generous portion of classic British fish and chips served on a sheet of white paper, golden crispy beer-battered cod fillet alongside thick-cut chips, a wedge of lemon, mushy peas in a small dish, malt vinegar bottle nearby, wooden pub table, overhead shot</sub>
78
  </td>
79
  <td width="33%" valign="top">
80
- <img src="assets/gallery/001-1440x1440.png" alt="Lens gallery sample 001" width="100%" />
81
  <br />
82
- <sub><b>Sample 001</b> &middot; 1440x1440<br />The iconic Big Ben clock tower and the Houses of Parliament in London at golden hour, the River Thames reflecting warm amber light, Westminster Bridge in the foreground, a classic red double-decker bus crossing, dramatic clouds lit by sunset</sub>
83
  </td>
84
  <td width="33%" valign="top">
85
- <img src="assets/gallery/002-1440x1440.png" alt="Lens gallery sample 002" width="100%" />
86
  <br />
87
- <sub><b>Sample 002</b> &middot; 1440x1440<br />La Tour Eiffel au cr&#233;puscule vue depuis le Trocad&#233;ro, la structure en fer illumin&#233;e de milliers de lumi&#232;res dor&#233;es scintillantes, le ciel passant du bleu profond au violet, les fontaines du Trocad&#233;ro au premier plan avec des reflets dor&#233;s, silhouettes de promeneurs</sub>
88
  </td>
89
  </tr>
90
  <tr>
91
  <td width="33%" valign="top">
92
- <img src="assets/gallery/003-1248x1664.png" alt="Lens gallery sample 003" width="100%" />
93
  <br />
94
- <sub><b>Sample 003</b> &middot; 1248x1664<br />A crystal dragon soaring through an aurora borealis sky, its entire body made of transparent faceted crystal refracting the green and purple aurora light into rainbow spectra, ice particles trailing from its wings, high fantasy digital art</sub>
95
  </td>
96
  <td width="33%" valign="top">
97
- <img src="assets/gallery/004-1664x1248.png" alt="Lens gallery sample 004" width="100%" />
98
  <br />
99
- <sub><b>Sample 004</b> &middot; 1664x1248<br />Aerial view of Yuanyang rice terraces in Yunnan province at sunrise, thousands of cascading water-filled paddies reflecting golden and pink sky colors, morning mist weaving between terrace layers, lush green hillside with scattered palm trees, drone photography</sub>
100
  </td>
101
  <td width="33%" valign="top">
102
- <img src="assets/gallery/005-1664x1248.png" alt="Lens gallery sample 005" width="100%" />
103
  <br />
104
- <sub><b>Sample 005</b> &middot; 1664x1248<br />A green iguana basking on a moss-covered fallen log in a tropical rainforest, every scale and spine rendered in sharp detail, dewdrops clinging to its skin, a blurred waterfall and lush tropical foliage in the background, National Geographic wildlife photography style</sub>
105
  </td>
106
  </tr>
107
  </table>
108
  </details>
109
 
110
- <details name="lens-gallery">
111
- <summary><b>Page 2 / 6</b> &nbsp; samples 006-011</summary>
112
 
113
  <table>
114
  <tr>
115
  <td width="33%" valign="top">
116
- <img src="assets/gallery/006-1248x1664.png" alt="Lens gallery sample 006" width="100%" />
117
  <br />
118
- <sub><b>Sample 006</b> &middot; 1248x1664<br />Oil painting portrait of a Renaissance noblewoman in a deep blue velvet dress with pearl drop earrings, soft chiaroscuro lighting revealing delicate skin, craquelure texture on the painted surface, in the style of Vermeer</sub>
119
  </td>
120
  <td width="33%" valign="top">
121
- <img src="assets/gallery/007-1440x1440.png" alt="Lens gallery sample 007" width="100%" />
122
  <br />
123
- <sub><b>Sample 007</b> &middot; 1440x1440<br />An artisan honey jar with a hand-illustrated vintage botanical label reading &quot;Mountain Wildflower Honey&quot; in brown serif letterpress-style typography with decorative flourishes, detailed ink drawings of wildflowers, clover and honeybees surrounding the text, kraft paper label on clear glass jar</sub>
124
  </td>
125
  <td width="33%" valign="top">
126
- <img src="assets/gallery/008-1440x1440.png" alt="Lens gallery sample 008" width="100%" />
127
  <br />
128
- <sub><b>Sample 008</b> &middot; 1440x1440<br />Watercolor portrait of a thoughtful young man reading a worn leather book in a Parisian cafe, loose wet-on-wet brushstrokes bleeding into warm amber and burnt sienna washes, visible paper grain texture</sub>
129
  </td>
130
  </tr>
131
  <tr>
132
  <td width="33%" valign="top">
133
- <img src="assets/gallery/009-1664x1248.png" alt="Lens gallery sample 009" width="100%" />
134
  <br />
135
- <sub><b>Sample 009</b> &middot; 1664x1248<br />An explorer&#x27;s oak desk with an aged world map spread open, a brass sextant, leather-bound navigation journal with handwritten entries, melting candle in a copper holder, scattered compass and quill pen, warm window light, still life photography</sub>
136
  </td>
137
  <td width="33%" valign="top">
138
- <img src="assets/gallery/010-1664x1248.png" alt="Lens gallery sample 010" width="100%" />
139
  <br />
140
- <sub><b>Sample 010</b> &middot; 1664x1248<br />New York Grand Central Terminal subway station with the classic station name &quot;GRAND CENTRAL&quot; spelled out in elegant white ceramic mosaic tile letters embedded in a dark green tile wall, each letter approximately eight inches tall, ornate tile border frames, the S-curve of train tracks visible</sub>
141
  </td>
142
  <td width="33%" valign="top">
143
- <img src="assets/gallery/011-1664x1248.png" alt="Lens gallery sample 011" width="100%" />
144
  <br />
145
- <sub><b>Sample 011</b> &middot; 1664x1248<br />A ruby-throated hummingbird hovering in front of a bright red heliconia flower, wings frozen in a figure-eight pattern showing iridescent feather detail, individual water droplets suspended around the bird, high-speed macro photography with dark background</sub>
146
  </td>
147
  </tr>
148
  </table>
149
  </details>
150
 
151
- <details name="lens-gallery">
152
- <summary><b>Page 3 / 6</b> &nbsp; samples 012-017</summary>
153
 
154
  <table>
155
  <tr>
156
  <td width="33%" valign="top">
157
- <img src="assets/gallery/012-1664x1248.png" alt="Lens gallery sample 012" width="100%" />
158
  <br />
159
- <sub><b>Sample 012</b> &middot; 1664x1248<br />An old Remington typewriter with a sheet of cream-colored paper rolled into the carriage, the typed words &quot;Chapter One: The Beginning&quot; visible in slightly uneven Courier typeface with characteristic ink density variations, some letters slightly misaligned, warm desk lamp lighting</sub>
160
  </td>
161
  <td width="33%" valign="top">
162
- <img src="assets/gallery/013-1664x1248.png" alt="Lens gallery sample 013" width="100%" />
163
  <br />
164
- <sub><b>Sample 013</b> &middot; 1664x1248<br />The Great Wildebeest Migration crossing the Mara River at golden hour, hundreds of animals plunging into churning water sending spray everywhere, dust clouds rising from the riverbank, dramatic backlit scene, National Geographic documentary style</sub>
165
  </td>
166
  <td width="33%" valign="top">
167
- <img src="assets/gallery/014-1248x1664.png" alt="Lens gallery sample 014" width="100%" />
168
  <br />
169
- <sub><b>Sample 014</b> &middot; 1248x1664<br />A charming flower shop storefront window with hand-painted white script lettering on the glass reading &quot;Fresh Flowers Daily&quot; in flowing connected cursive with decorative swashes, roses and peonies arranged in buckets visible through the lettering, morning sunlight catching the painted letters</sub>
170
  </td>
171
  </tr>
172
  <tr>
173
  <td width="33%" valign="top">
174
- <img src="assets/gallery/015-1248x1664.png" alt="Lens gallery sample 015" width="100%" />
175
  <br />
176
- <sub><b>Sample 015</b> &middot; 1248x1664<br />A steampunk floating sky-city built on massive gear-driven platforms, brass and copper towers connected by chain bridges, steam-powered airships and hot air balloons docking at various levels, sunset clouds below the city, detailed concept art</sub>
177
  </td>
178
  <td width="33%" valign="top">
179
- <img src="assets/gallery/016-1664x1248.png" alt="Lens gallery sample 016" width="100%" />
180
  <br />
181
- <sub><b>Sample 016</b> &middot; 1664x1248<br />Milford Sound in New Zealand at dawn, a perfect mirror reflection of steep fjord walls on glass-still water, waterfalls streaming down thousand-foot cliffs, morning mist hovering above the water surface, panoramic landscape photography</sub>
182
  </td>
183
  <td width="33%" valign="top">
184
- <img src="assets/gallery/017-1248x1664.png" alt="Lens gallery sample 017" width="100%" />
185
  <br />
186
- <sub><b>Sample 017</b> &middot; 1248x1664<br />An Indian Bharatanatyam classical dancer in the aramandi pose, bronze ankle bells and elaborate hand mudra gestures, rich silk costume with gold temple jewelry, captured mid-performance with dramatic stage lighting</sub>
187
  </td>
188
  </tr>
189
  </table>
190
  </details>
191
 
192
- <details name="lens-gallery">
193
- <summary><b>Page 4 / 6</b> &nbsp; samples 018-023</summary>
194
 
195
  <table>
196
  <tr>
197
  <td width="33%" valign="top">
198
- <img src="assets/gallery/018-1248x1664.png" alt="Lens gallery sample 018" width="100%" />
199
  <br />
200
- <sub><b>Sample 018</b> &middot; 1248x1664<br />A narrow alleyway in Marrakech&#x27;s old medina with walls painted in vivid cobalt blue, colorful handwoven rugs and ceramic plates displayed along the walls, ornate wooden doors, warm sunlight from above creating dramatic shadows, Moroccan architecture</sub>
201
  </td>
202
  <td width="33%" valign="top">
203
- <img src="assets/gallery/019-1664x1248.png" alt="Lens gallery sample 019" width="100%" />
204
  <br />
205
- <sub><b>Sample 019</b> &middot; 1664x1248<br />A rustic wooden sign at a fishing village dock reading &quot;Fresh Catch of the Day&quot; in hand-carved letters painted nautical blue, thick hemp rope threaded through the sign as a border, fishing nets and lobster traps stacked in the background, seaside atmosphere</sub>
206
  </td>
207
  <td width="33%" valign="top">
208
- <img src="assets/gallery/020-1664x1248.png" alt="Lens gallery sample 020" width="100%" />
209
  <br />
210
- <sub><b>Sample 020</b> &middot; 1664x1248<br />A sunken shipwreck on the ocean floor completely overgrown with colorful coral formations, schools of tropical fish swimming through the broken hull and portholes, shafts of sunlight streaming down from the surface above, underwater archaeology photography</sub>
211
  </td>
212
  </tr>
213
  <tr>
214
  <td width="33%" valign="top">
215
- <img src="assets/gallery/021-1664x1248.png" alt="Lens gallery sample 021" width="100%" />
216
  <br />
217
- <sub><b>Sample 021</b> &middot; 1664x1248<br />Zhangjiajie pillar mountains rising above a sea of clouds at sunrise, golden light painting the sandstone peaks, the surreal Avatar-like floating mountain landscape stretching to the horizon, aerial drone photography capturing immense vertical scale</sub>
218
  </td>
219
  <td width="33%" valign="top">
220
- <img src="assets/gallery/022-1440x1440.png" alt="Lens gallery sample 022" width="100%" />
221
  <br />
222
- <sub><b>Sample 022</b> &middot; 1440x1440<br />A red-eyed tree frog perched on a bright red bromeliad flower in the Costa Rican cloud forest, its neon green body contrasting with blue-striped flanks and orange feet, water droplets on its smooth skin, extreme macro with ring flash lighting</sub>
223
  </td>
224
  <td width="33%" valign="top">
225
- <img src="assets/gallery/023-1248x1664.png" alt="Lens gallery sample 023" width="100%" />
226
  <br />
227
- <sub><b>Sample 023</b> &middot; 1248x1664<br />Inside a massive limestone cave, ancient stalactites and stalagmites meeting to form columns, an underground river reflecting the formations like a mirror, subtle warm lighting revealing millions of years of mineral deposits, spelunking exploration photography</sub>
228
  </td>
229
  </tr>
230
  </table>
231
  </details>
232
 
233
- <details name="lens-gallery">
234
- <summary><b>Page 5 / 6</b> &nbsp; samples 024-029</summary>
235
 
236
  <table>
237
  <tr>
238
  <td width="33%" valign="top">
239
- <img src="assets/gallery/024-1664x1248.png" alt="Lens gallery sample 024" width="100%" />
240
  <br />
241
- <sub><b>Sample 024</b> &middot; 1664x1248<br />A weathered 1960s gas station with a large roadside sign reading &quot;ROUTE 66 GAS &amp; GO&quot; in retro rounded sans-serif letters with a red and white color scheme, vintage gas pumps with analog dials in the foreground, a classic Chevrolet parked to the side, Americana nostalgia</sub>
242
  </td>
243
  <td width="33%" valign="top">
244
- <img src="assets/gallery/025-1664x1248.png" alt="Lens gallery sample 025" width="100%" />
245
  <br />
246
- <sub><b>Sample 025</b> &middot; 1664x1248<br />Construction site hoarding covered in unauthorized street art with &quot;ART IS EVERYWHERE&quot; spray-painted in large freehand capital letters using multiple overlapping colors of red, yellow and blue, paint drips running down from each letter, chaotic beautiful urban canvas</sub>
247
  </td>
248
  <td width="33%" valign="top">
249
- <img src="assets/gallery/026-1664x1248.png" alt="Lens gallery sample 026" width="100%" />
250
  <br />
251
- <sub><b>Sample 026</b> &middot; 1664x1248<br />Top-down view of a koi pond, dozens of ornamental koi fish in vivid red white orange and gold patterns swimming through crystal-clear emerald water, fallen cherry blossom petals floating on the surface, Japanese garden aerial photography</sub>
252
  </td>
253
  </tr>
254
  <tr>
255
  <td width="33%" valign="top">
256
- <img src="assets/gallery/027-1664x1248.png" alt="Lens gallery sample 027" width="100%" />
257
  <br />
258
- <sub><b>Sample 027</b> &middot; 1664x1248<br />The Potala Palace in Lhasa under a canopy of stars with the Milky Way arching overhead, Tibetan prayer wheels and butter lamps in the foreground casting warm golden light, the massive white and red palace walls glowing in moonlight, night photography</sub>
259
  </td>
260
  <td width="33%" valign="top">
261
- <img src="assets/gallery/028-1248x1664.png" alt="Lens gallery sample 028" width="100%" />
262
  <br />
263
- <sub><b>Sample 028</b> &middot; 1248x1664<br />Yellowstone&#x27;s Grand Prismatic Spring shot from directly above by drone, concentric rings of vivid blue turquoise green yellow and orange created by thermophilic bacteria, steam rising from the surface, abstract natural color palette</sub>
264
  </td>
265
  <td width="33%" valign="top">
266
- <img src="assets/gallery/029-1664x1248.png" alt="Lens gallery sample 029" width="100%" />
267
  <br />
268
- <sub><b>Sample 029</b> &middot; 1664x1248<br />A herd of African elephants walking in a line across the savanna with Mount Kilimanjaro&#x27;s snow-capped peak behind them, golden sunset dust kicked up by their feet creating a hazy atmosphere, telephoto wildlife photography showing massive scale</sub>
269
  </td>
270
  </tr>
271
  </table>
272
  </details>
273
 
274
- <details name="lens-gallery">
275
- <summary><b>Page 6 / 6</b> &nbsp; samples 030-031</summary>
276
 
277
  <table>
278
  <tr>
279
  <td width="33%" valign="top">
280
- <img src="assets/gallery/030-1664x1248.png" alt="Lens gallery sample 030" width="100%" />
281
  <br />
282
- <sub><b>Sample 030</b> &middot; 1664x1248<br />The Hall of Mirrors at the Palace of Versailles, hundreds of candles reflected infinitely in the massive gilded mirrors, crystal chandeliers casting prismatic light across painted ceilings and gold leaf ornamentation, Baroque opulence</sub>
283
  </td>
284
  <td width="33%" valign="top">
285
- <img src="assets/gallery/031-1664x1248.png" alt="Lens gallery sample 031" width="100%" />
286
  <br />
287
- <sub><b>Sample 031</b> &middot; 1664x1248<br />A pirate captain&#x27;s cabin, navigation charts pinned to the wall, a brass telescope and astrolabe on the desk, stacks of gold coins and a jewel-encrusted goblet, rum bottle, warm swinging lantern light casting shadows with the ship&#x27;s motion</sub>
288
  </td>
289
  <td width="33%"></td>
290
  </tr>
291
- <tr>
292
- <td width="33%"></td>
293
- <td width="33%"></td>
294
- <td width="33%"></td>
295
- </tr>
296
  </table>
297
  </details>
298
- <!-- LENS_GALLERY_END -->
299
-
300
- ## Installation
301
 
302
- > **Tested environment:** Python 3.12 &middot; CUDA 12.6 &middot; PyTorch 2.11.0+cu126 &middot; TorchVision 0.26.0+cu126
 
303
 
304
  ```bash
305
- conda create -n lens python=3.12 -y
306
- conda activate lens
307
-
308
  uv pip install torch==2.11.0+cu126 torchvision==0.26.0+cu126 \
309
  --index-url https://download.pytorch.org/whl/cu126
310
  uv pip install -r requirements.txt
311
- ```
312
-
313
- The default GPT-OSS encoder and FLUX.2 VAE are loaded from Hugging Face. Make sure your environment has access to any gated model repositories you use.
314
-
315
- ## Checkpoints
316
-
317
- | Repo | Description | Steps | CFG |
318
- | :--- | :--- | :---: | :---: |
319
- | [`microsoft/Lens`](https://huggingface.co/microsoft/Lens) | **Default.** RL-tuned for visual quality | 20 | 5.0 |
320
- | [`microsoft/Lens-Turbo`](https://huggingface.co/microsoft/Lens-Turbo) | Distilled from the RL model for fast 4-step sampling | 4 | 1.0 |
321
- | [`microsoft/Lens-Base`](https://huggingface.co/microsoft/Lens-Base) | Supervised base model (no RL, no distillation) | 50 | 5.0 |
322
-
323
- Pick a variant by passing its repo id to `--repo_id` (CLI) or `LensPipeline.from_pretrained(...)` (Python).
324
-
325
- ## Inference
326
-
327
- > **Important:** run from the cloned repo root so `from lens import LensPipeline` resolves to this package &mdash; importing `lens` is what registers `LensGptOssEncoder` / `LensTransformer2DModel` with the `transformers` and `diffusers` namespaces that `model_index.json` references.
328
-
329
- **Python API:**
330
-
331
- ```python
332
- import torch
333
- from lens import LensPipeline
334
-
335
- pipe = LensPipeline.from_pretrained(
336
- "microsoft/Lens", torch_dtype=torch.bfloat16
337
- ).to("cuda")
338
-
339
- image = pipe(
340
- prompt="A cat holding a sign that says \"hello world\"",
341
- base_resolution=1440, aspect_ratio="1:1",
342
- num_inference_steps=20, guidance_scale=5.0,
343
- generator=torch.Generator("cuda").manual_seed(0),
344
- ).images[0]
345
- image.save("lens.png")
346
- ```
347
-
348
- To trade speed for VRAM, replace `.to("cuda")` with `pipe.enable_model_cpu_offload()`.
349
-
350
- **CLI &mdash; basic usage:**
351
-
352
- ```bash
353
- python inference.py \
354
- --repo_id "microsoft/Lens" \
355
- --prompt "A cinematic mountain lake at sunrise, soft mist, detailed reflections" \
356
- --base_resolution 1440 --aspect_ratio 1:1 \
357
- --steps 20 --cfg 5.0 --n 1 --seed 42 \
358
- --out ./outputs
359
- ```
360
-
361
- **Batch generation** &mdash; join multiple prompts with `|`:
362
-
363
- ```bash
364
- python inference.py \
365
- --repo_id "microsoft/Lens" \
366
- --steps 20 --cfg 5.0 \
367
- --prompt "a red fox in snow|a glass greenhouse at night"
368
- ```
369
-
370
- **A100 / V100 (no MXFP4 kernels)** &mdash; dequantize the GPT-OSS encoder to bf16:
371
-
372
- ```bash
373
- python inference.py \
374
- --repo_id "microsoft/Lens" \
375
- --steps 20 --cfg 5.0 \
376
- --prompt "a cat" \
377
- --disable_mxfp4 --offload
378
- ```
379
-
380
- ### Options
381
-
382
- | Flag | Description | Default |
383
- | :--- | :--- | :--- |
384
- | `--repo_id` | HF repo id (or local path) of the assembled Lens pipeline | `microsoft/Lens` |
385
- | `--base_resolution` | `1024` or `1440` | `1440` |
386
- | `--aspect_ratio` | `1:2`, `9:16`, `2:3`, `3:4`, `1:1`, `4:3`, `3:2`, `16:9`, `2:1` | `1:1` |
387
- | `--steps` | Number of denoising steps | `20` |
388
- | `--cfg` | Classifier-free guidance scale | `5.0` |
389
- | `--n` | Number of images per prompt | `1` |
390
- | `--seed` | Random seed (omit for non-deterministic) | &mdash; |
391
- | `--out` | Output directory | `./outputs` |
392
- | `--dtype` | Compute dtype: `bfloat16`, `float16`, `float32` | `bfloat16` |
393
- | `--disable_mxfp4` | Dequantize the GPT-OSS text encoder to `--dtype` (required on A100 / V100; Hopper+ keeps MXFP4 by default for less VRAM) | &mdash; |
394
- | `--offload` | Enable diffusers CPU offload (`text_encoder->transformer->vae`) to reduce peak VRAM | &mdash; |
395
- | `--reasoner` | Refine prompts with the loaded GPT-OSS encoder before generation | &mdash; |
396
- | `--api_url` / `--api_key` / `--api_model` | Use an OpenAI-compatible API for prompt refinement (takes precedence over `--reasoner`) | &mdash; |
397
-
398
- ## Citation
399
-
400
- ```bibtex
401
- @article{zhao2026lens,
402
- title = {Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models},
403
- author = {Guo, Baining and Luo, Chong and Chen, Dong and Chen, Dongdong and Wei, Fangyun and Li, Ji and Bao, Jianmin and Zhang, Jiawei and Zhao, Jinjing and Shi, Lei and Yang, Qinhong and Zhang, Sirui and Wu, Xiuyu and Feng, Xuelu and Lu, Yan and Dong, Yanchen and Yue, Yang and Wang, Yitong and Chen, Yunuo and Liang, Zhiyang and Wan, Ziyu},
404
- journal = {arXiv preprint arXiv:2605.21573},
405
- year = {2026}
406
- }
407
- ```
408
-
409
- ## Responsible AI
410
-
411
- The model is released for research purposes only and is not intended for product or service deployment. Responsible AI considerations were incorporated throughout the development process, including data selection, model training, and evaluation.
412
- The training data includes a combination of public, licensed, and internal datasets that were processed to remove clearly identifiable personal information and reduce harmful content where possible. However, as the data is largely sourced from web-scale collections, it may contain biases or uneven representation. As a result, the model may generate outputs that are inaccurate, biased, or inappropriate under certain prompts, including content that could be misleading or raise copyright or IP-related concerns.
413
- Given these limitations, the model should be used in controlled research settings, with appropriate human oversight. Downstream users are responsible for applying additional safeguards, such as content moderation, validation, and compliance checks, before using the model in broader applications.
414
-
415
- ## Privacy
416
-
417
- This project does not collect any usage data. For more information, see the [Microsoft Privacy Statement](https://go.microsoft.com/fwlink/?LinkId=521839).
418
-
419
- ## License
420
-
421
- This project is released under the [MIT License](LICENSE).
 
1
  ---
2
  license: mit
 
 
3
  pipeline_tag: text-to-image
4
+ tags:
5
+ - sky-vision
6
+ - text-to-image
7
+ - generative-ai
8
+ - skytech
9
+ - high-resolution
10
+ - efficient-generation
11
  ---
12
+
13
  <div align="center">
14
 
15
+ # langit.cb
16
+ ### ✨ Sky Vision ✨
17
+ *Bekerja sama dengan Microsoft*
18
+
19
+ <br>
20
 
21
+ **Kontributor:**
22
+ **[LANGIT.CB]** · Tim Pengembang Skytech · Rekan Teknis Microsoft
23
 
24
+ <br>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
+ <a href="https://huggingface.co/skytech/Sky-Vision"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97-Model%20HuggingFace-yellow" height="24" /></a>
27
+ &nbsp;
28
+ <a href="https://github.com/skytech-id/Sky-Vision"><img alt="GitHub" src="https://img.shields.io/badge/GitHub-Repositori-181717?logo=github&logoColor=white" height="24" /></a>
29
+ &nbsp;
30
+ <a href="LICENSE"><img alt="Lisensi: MIT" src="https://img.shields.io/badge/Lisensi-MIT-green.svg" height="24" /></a>
 
 
 
 
31
 
32
  </div>
33
 
34
  ---
35
 
36
+ **Sky Vision** adalah model dasar pengubah teks menjadi gambar dengan **3,8 miliar parameter**, dirancang khusus untuk efisiensi pelatihan dan pembuatan gambar beresolusi tinggi yang cepat. Model ini menggabungkan pelatihan dengan keterangan rinci, pembelajaran pada berbagai ukuran gambar, fitur teks berjenjang dari GPT-OSS, serta pengolah gambar FLUX.2 VAE. Hasil kualitas yang dihasilkan setara atau lebih baik dibandingkan model berukuran lebih besar, namun membutuhkan sumber daya komputasi yang jauh lebih sedikit.
 
 
37
 
38
+ Repositori ini berisi kode dasar untuk menjalankan dan menghasilkan gambar menggunakan model Sky Vision.
39
 
40
+ ## Keunggulan Utama
41
+ - **Dasar yang Efisien** Dilatih menggunakan kumpulan data berisi 800 juta pasangan gambar dan teks dengan deskripsi panjang dan rinci, sehingga setiap tahap pelatihan memberikan informasi yang maksimal.
42
+ - **Ringkas namun Unggul** Menggunakan struktur jaringan yang terdiri dari 48 blok pemrosesan, memanfaatkan representasi gambar yang berkualitas dan fitur bahasa yang lengkap, sehingga mampu mengikuti instruksi dengan baik serta mendukung berbagai bahasa meski dilatih dengan data berbahasa Inggris.
43
+ - **Resolusi yang Fleksibel** Dapat menghasilkan gambar dengan rasio aspek beragam mulai dari `1:2` hingga `2:1`, serta resolusi tertinggi mencapai **1440×1440 piksel**.
44
+ - **Varian Model Khusus** – Versi yang telah disempurnakan mampu mengurangi cacat visual dan meningkatkan kehalusan gambar; sedangkan versi **Sky Vision-Turbo** memungkinkan pembuatan gambar hanya dalam 4 langkah pemrosesan dengan hasil yang cepat dan tetap berkualitas.
45
 
46
+ ## Contoh Hasil Gambar
47
+ <details name="galeri-sky-vision" open>
48
+ <summary><b>Halaman 1 / 6</b> &nbsp; Contoh 000–005</summary>
 
 
 
49
 
50
  <table>
51
  <tr>
52
  <td width="33%" valign="top">
53
+ <img src="assets/gallery/000-1440x1440.png" alt="Contoh gambar 000" width="100%" />
54
  <br />
55
+ <sub><b>Contoh 000</b> &middot; 1440×1440<br />Hidangan ikan dan kentang goreng gaya Inggris klasik, disajikan di atas kertas putih, ikan goreng renyah, irisan lemon, kacang polong lembut, meja kayu, pandangan dari atas</sub>
56
  </td>
57
  <td width="33%" valign="top">
58
+ <img src="assets/gallery/001-1440x1440.png" alt="Contoh gambar 001" width="100%" />
59
  <br />
60
+ <sub><b>Contoh 001</b> &middot; 1440×1440<br />Menara Jam Big Ben dan Gedung Parlemen London saat sore hari, Sungai Thames memantulkan cahaya, jembatan Westminster di depan, bus merah ganda melintas, awan indah disinari matahari terbenam</sub>
61
  </td>
62
  <td width="33%" valign="top">
63
+ <img src="assets/gallery/002-1440x1440.png" alt="Contoh gambar 002" width="100%" />
64
  <br />
65
+ <sub><b>Contoh 002</b> &middot; 1440×1440<br />Menara Eiffel saat senja dilihat dari Trocadéro, struktur besi bercahaya ribuan lampu keemasan, langit berwarna biru tua hingga ungu, air mancur dan bayangan pengunjung terlihat jelas</sub>
66
  </td>
67
  </tr>
68
  <tr>
69
  <td width="33%" valign="top">
70
+ <img src="assets/gallery/003-1248x1664.png" alt="Contoh gambar 003" width="100%" />
71
  <br />
72
+ <sub><b>Contoh 003</b> &middot; 1248×1664<br />Naga kristal terbang di bawah langit yang penuh cahaya kutub, seluruh tubuhnya transparan dan membiaskan cahaya menjadi pelangi, partikel es berjatuhan dari sayapnya, karya seni fantasi tinggi</sub>
73
  </td>
74
  <td width="33%" valign="top">
75
+ <img src="assets/gallery/004-1664x1248.png" alt="Contoh gambar 004" width="100%" />
76
  <br />
77
+ <sub><b>Contoh 004</b> &middot; 1664×1248<br />Pemandangan udara teras sawah Yuanyang di Provinsi Yunnan saat matahari terbit, ribuan petak sawah berisi air memantulkan cahaya keemasan dan merah muda, kabut tipis bergerak di antara bukit hijau</sub>
78
  </td>
79
  <td width="33%" valign="top">
80
+ <img src="assets/gallery/005-1664x1248.png" alt="Contoh gambar 005" width="100%" />
81
  <br />
82
+ <sub><b>Contoh 005</b> &middot; 1664×1248<br />Iguana hijau sedang berjemur di atas batang kayu yang tertutup lumut di hutan hujan tropis, setiap sisik terlihat jelas, titik air menempel di kulitnya, air terjun dan pepohonan rimbun di latar belakang</sub>
83
  </td>
84
  </tr>
85
  </table>
86
  </details>
87
 
88
+ <details name="galeri-sky-vision">
89
+ <summary><b>Halaman 2 / 6</b> &nbsp; Contoh 006011</summary>
90
 
91
  <table>
92
  <tr>
93
  <td width="33%" valign="top">
94
+ <img src="assets/gallery/006-1248x1664.png" alt="Contoh gambar 006" width="100%" />
95
  <br />
96
+ <sub><b>Contoh 006</b> &middot; 1248×1664<br />Lukisan potret wanita bangsawan zaman Renaisans mengenakan gaun beludru biru tua dan anting mutiara, pencahayaan lembut menonjolkan tekstur kulit dan detail cat, gaya karya Vermeer</sub>
97
  </td>
98
  <td width="33%" valign="top">
99
+ <img src="assets/gallery/007-1440x1440.png" alt="Contoh gambar 007" width="100%" />
100
  <br />
101
+ <sub><b>Contoh 007</b> &middot; 1440×1440<br />Botol madu buatan tangan dengan etiket bergambar tanaman hias bertuliskan "Madu Bunga Liar Pegunungan", tulisan rapi dan hiasan bunga serta lebah, dibungkus kertas cokelat</sub>
102
  </td>
103
  <td width="33%" valign="top">
104
+ <img src="assets/gallery/008-1440x1440.png" alt="Contoh gambar 008" width="100%" />
105
  <br />
106
+ <sub><b>Contoh 008</b> &middot; 1440×1440<br />Gambar cat air pemuda sedang membaca buku tua di sebuah kafe di Paris, sapuan kuas yang lembut dan warna hangat menciptakan suasana tenang, tekstur kertas terlihat jelas</sub>
107
  </td>
108
  </tr>
109
  <tr>
110
  <td width="33%" valign="top">
111
+ <img src="assets/gallery/009-1664x1248.png" alt="Contoh gambar 009" width="100%" />
112
  <br />
113
+ <sub><b>Contoh 009</b> &middot; 1664×1248<br />Meja kerja penjelajah dengan peta dunia terbuka luas, alat ukur dari kuningan, buku catatan perjalanan, lilin menyala, kompas dan pena bulu, diterangi cahaya matahari yang masuk lewat jendela</sub>
114
  </td>
115
  <td width="33%" valign="top">
116
+ <img src="assets/gallery/010-1664x1248.png" alt="Contoh gambar 010" width="100%" />
117
  <br />
118
+ <sub><b>Contoh 010</b> &middot; 1664×1248<br />Stasiun kereta Grand Central di New York dengan tulisan nama stasiun yang terbuat dari ubin keramik putih pada dinding berwarna hijau tua, rel kereta api terlihat melengkung di latar belakang</sub>
119
  </td>
120
  <td width="33%" valign="top">
121
+ <img src="assets/gallery/011-1664x1248.png" alt="Contoh gambar 011" width="100%" />
122
  <br />
123
+ <sub><b>Contoh 011</b> &middot; 1664×1248<br />Burung kolibri berwarna-warni sedang melayang di depan bunga merah cerah, gerakan sayap yang cepat terhenti dalam gambar, butiran air melayang di sekitarnya, fotografi makro dengan latar belakang gelap</sub>
124
  </td>
125
  </tr>
126
  </table>
127
  </details>
128
 
129
+ <details name="galeri-sky-vision">
130
+ <summary><b>Halaman 3 / 6</b> &nbsp; Contoh 012017</summary>
131
 
132
  <table>
133
  <tr>
134
  <td width="33%" valign="top">
135
+ <img src="assets/gallery/012-1664x1248.png" alt="Contoh gambar 012" width="100%" />
136
  <br />
137
+ <sub><b>Contoh 012</b> &middot; 1664×1248<br />Mesin ketik tua dengan kertas yang tergulung, tulisan "Bab Satu: Awal Mula" tercetak dengan jelas namun sedikit tidak rata, pencahayaan hangat dari lampu meja</sub>
138
  </td>
139
  <td width="33%" valign="top">
140
+ <img src="assets/gallery/013-1664x1248.png" alt="Contoh gambar 013" width="100%" />
141
  <br />
142
+ <sub><b>Contoh 013</b> &middot; 1664×1248<br />Migrasi hewan besar saat melintasi Sungai Mara, ratusan hewan melompat ke dalam air yang bergejolak, debu dan percikan air terlihat jelas, pemandangan indah saat cahaya matahari terbenam</sub>
143
  </td>
144
  <td width="33%" valign="top">
145
+ <img src="assets/gallery/014-1248x1664.png" alt="Contoh gambar 014" width="100%" />
146
  <br />
147
+ <sub><b>Contoh 014</b> &middot; 1248×1664<br />Jendela toko bunga yang menarik dengan tulisan "Bunga Segar Setiap Hari" yang dilukis dengan tangan, karangan bunga mawar dan peoni terlihat jelas dari luar, diterangi cahaya matahari pagi</sub>
148
  </td>
149
  </tr>
150
  <tr>
151
  <td width="33%" valign="top">
152
+ <img src="assets/gallery/015-1248x1664.png" alt="Contoh gambar 015" width="100%" />
153
  <br />
154
+ <sub><b>Contoh 015</b> &middot; 1248×1664<br />Kota terapung bergaya steampunk yang dibangun di atas roda gigi besar, menara dari kuningan dan tembaga dihubungkan jembatan, kapal udara berlabuh di berbagai tempat, pemandangan awan di bawah saat matahari terbenam</sub>
155
  </td>
156
  <td width="33%" valign="top">
157
+ <img src="assets/gallery/016-1664x1248.png" alt="Contoh gambar 016" width="100%" />
158
  <br />
159
+ <sub><b>Contoh 016</b> &middot; 1664×1248<br />Pemandangan Teluk Milford di Selandia Baru saat fajar, air yang tenang memantulkan tebing curam dan air terjun, kabut tipis melayang di permukaan air, pemandangan alam yang sangat indah</sub>
160
  </td>
161
  <td width="33%" valign="top">
162
+ <img src="assets/gallery/017-1248x1664.png" alt="Contoh gambar 017" width="100%" />
163
  <br />
164
+ <sub><b>Contoh 017</b> &middot; 1248×1664<br />Penari klasik India sedang menari dalam posisi yang indah, mengenakan pakaian sutra dan perhiasan emas, gerakan tangan dan kaki yang penuh makna diterangi pencahayaan panggung yang dramatis</sub>
165
  </td>
166
  </tr>
167
  </table>
168
  </details>
169
 
170
+ <details name="galeri-sky-vision">
171
+ <summary><b>Halaman 4 / 6</b> &nbsp; Contoh 018023</summary>
172
 
173
  <table>
174
  <tr>
175
  <td width="33%" valign="top">
176
+ <img src="assets/gallery/018-1248x1664.png" alt="Contoh gambar 018" width="100%" />
177
  <br />
178
+ <sub><b>Contoh 018</b> &middot; 1248×1664<br />Lorong sempit di kota tua Maroko dengan dinding berwarna biru cerah, karpet dan piring keramik berwarna-warni dipajang di sepanjang jalan, pintu kayu yang indah dan bayangan yang jelas akibat sinar matahari</sub>
179
  </td>
180
  <td width="33%" valign="top">
181
+ <img src="assets/gallery/019-1664x1248.png" alt="Contoh gambar 019" width="100%" />
182
  <br />
183
+ <sub><b>Contoh 019</b> &middot; 1664×1248<br />Papan tanda kayu sederhana di dermaga perikanan bertuliskan "Hasil Tangkapan Segar Hari Ini", tulisan diukir dan dicat biru, tali tebal sebagai hiasan, jaring dan perangkap ikan terlihat di latar belakang</sub>
184
  </td>
185
  <td width="33%" valign="top">
186
+ <img src="assets/gallery/020-1664x1248.png" alt="Contoh gambar 020" width="100%" />
187
  <br />
188
+ <sub><b>Contoh 020</b> &middot; 1664×1248<br />Bangkai kapal yang tenggelam di dasar laut dan sepenuhnya tertutup karang berwarna-warni, ikan-ikan indah berenang melewati bagian kapal yang rusak, cahaya matahari menembus masuk dari permukaan air</sub>
189
  </td>
190
  </tr>
191
  <tr>
192
  <td width="33%" valign="top">
193
+ <img src="assets/gallery/021-1664x1248.png" alt="Contoh gambar 021" width="100%" />
194
  <br />
195
+ <sub><b>Contoh 021</b> &middot; 1664×1248<br />Pegunungan Pilar Zhangjiajie yang menjulang tinggi di atas lautan awan saat matahari terbit, cahaya keemasan menyinari puncak batu pasir, pemandangan yang luar biasa luas dan menakjubkan</sub>
196
  </td>
197
  <td width="33%" valign="top">
198
+ <img src="assets/gallery/022-1440x1440.png" alt="Contoh gambar 022" width="100%" />
199
  <br />
200
+ <sub><b>Contoh 022</b> &middot; 1440×1440<br />Katak bermata merah yang sedang bertengger di atas bunga merah cerah di hutan pegunungan Kosta Rika, tubuh hijau terang dengan garis biru dan kaki oranye, butiran air terlihat jelas di kulitnya</sub>
201
  </td>
202
  <td width="33%" valign="top">
203
+ <img src="assets/gallery/023-1248x1664.png" alt="Contoh gambar 023" width="100%" />
204
  <br />
205
+ <sub><b>Contoh 023</b> &middot; 1248×1664<br />Bagian dalam gua batu kapur yang luas dengan formasi batu yang menjulang dan menyambung membentuk tiang, sungai bawah tanah memantulkan bayangan dinding gua, pencahayaan lembut menonjolkan keindahan alam yang terbentuk selama jutaan tahun</sub>
206
  </td>
207
  </tr>
208
  </table>
209
  </details>
210
 
211
+ <details name="galeri-sky-vision">
212
+ <summary><b>Halaman 5 / 6</b> &nbsp; Contoh 024029</summary>
213
 
214
  <table>
215
  <tr>
216
  <td width="33%" valign="top">
217
+ <img src="assets/gallery/024-1664x1248.png" alt="Contoh gambar 024" width="100%" />
218
  <br />
219
+ <sub><b>Contoh 024</b> &middot; 1664×1248<br />Pom bensin tua dari tahun 1960-an dengan papan tanda besar bertuliskan "JALUR 66 BENSIN & LAYANAN", huruf melengkung berwarna merah dan putih, pompa bensin bergaya lama dan mobil klasik terlihat di depan</sub>
220
  </td>
221
  <td width="33%" valign="top">
222
+ <img src="assets/gallery/025-1664x1248.png" alt="Contoh gambar 025" width="100%" />
223
  <br />
224
+ <sub><b>Contoh 025</b> &middot; 1664×1248<br />Dinding pembatas lokasi bangunan yang penuh dengan karya seni jalanan, tulisan "SENI ADA DI MANA-MANA" dicat dengan warna merah, kuning dan biru yang tumpang tindih, tetesan cat terlihat di sekeliling huruf</sub>
225
  </td>
226
  <td width="33%" valign="top">
227
+ <img src="assets/gallery/026-1664x1248.png" alt="Contoh gambar 026" width="100%" />
228
  <br />
229
+ <sub><b>Contoh 026</b> &middot; 1664×1248<br />Pemandangan dari atas kolam ikan hias, puluhan ikan koi berwarna merah, putih, oranye dan emas berenang di air yang jernih, kelopak bunga sakura terapung di permukaan air</sub>
230
  </td>
231
  </tr>
232
  <tr>
233
  <td width="33%" valign="top">
234
+ <img src="assets/gallery/027-1664x1248.png" alt="Contoh gambar 027" width="100%" />
235
  <br />
236
+ <sub><b>Contoh 027</b> &middot; 1664×1248<br />Istana Potala di Lhasa di bawah langit penuh bintang dengan gugusan Bima Sakti yang terlihat jelas, roti doa dan lampu minyak bersinar hangat di latar depan, dinding istana yang megah bercahaya di bawah sinar bulan</sub>
237
  </td>
238
  <td width="33%" valign="top">
239
+ <img src="assets/gallery/028-1248x1664.png" alt="Contoh gambar 028" width="100%" />
240
  <br />
241
+ <sub><b>Contoh 028</b> &middot; 1248×1664<br />Pemandangan dari udara Mata Air Besar Berwarna di Taman Nasional Yellowstone, lingkaran konsentris berwarna biru, hijau, kuning dan oranye yang terbentuk akibat bakteri, uap air naik dari permukaannya</sub>
242
  </td>
243
  <td width="33%" valign="top">
244
+ <img src="assets/gallery/029-1664x1248.png" alt="Contoh gambar 029" width="100%" />
245
  <br />
246
+ <sub><b>Contoh 029</b> &middot; 1664×1248<br />Sekumpulan gajah Afrika berjalan beriringan melintasi padang rumput dengan puncak Gunung Kilimanjaro yang tertutup salju di latar belakang, debu halus yang terangkat menciptakan suasana kabur saat matahari terbenam</sub>
247
  </td>
248
  </tr>
249
  </table>
250
  </details>
251
 
252
+ <details name="galeri-sky-vision">
253
+ <summary><b>Halaman 6 / 6</b> &nbsp; Contoh 030031</summary>
254
 
255
  <table>
256
  <tr>
257
  <td width="33%" valign="top">
258
+ <img src="assets/gallery/030-1664x1248.png" alt="Contoh gambar 030" width="100%" />
259
  <br />
260
+ <sub><b>Contoh 030</b> &middot; 1664×1248<br />Ruang Cermin di Istana Versailles, ratusan lilin yang cahayanya dipantulkan tak terhingga oleh cermin besar berbingkai emas, lampu gantung kristal dan langit-langit yang penuh lukisan serta hiasan emas</sub>
261
  </td>
262
  <td width="33%" valign="top">
263
+ <img src="assets/gallery/031-1664x1248.png" alt="Contoh gambar 031" width="100%" />
264
  <br />
265
+ <sub><b>Contoh 031</b> &middot; 1664×1248<br />Kabin kapten kapal bajak laut, peta navigasi ditempel di dinding, teropong dan alat ukur di atas meja, tumpukan koin emas dan gelas berhias permata, cahaya lampu yang bergoyang-goyang menciptakan bayangan yang bergerak</sub>
266
  </td>
267
  <td width="33%"></td>
268
  </tr>
 
 
 
 
 
269
  </table>
270
  </details>
 
 
 
271
 
272
+ ## Cara Memasang
273
+ > **Lingkungan yang teruji:** Python 3.12 · CUDA 12.6 · PyTorch 2.11.0+cu126 · TorchVision 0.26.0+cu126
274
 
275
  ```bash
276
+ conda create -n skyvision python=3.12 -y
277
+ conda activate skyvision
 
278
  uv pip install torch==2.11.0+cu126 torchvision==0.26.0+cu126 \
279
  --index-url https://download.pytorch.org/whl/cu126
280
  uv pip install -r requirements.txt