update system prompt
Browse files
app.py
CHANGED
|
@@ -223,6 +223,7 @@ Duration Guidelines:
|
|
| 223 |
- Video generation: typically 60-180 seconds
|
| 224 |
- Audio/music generation: typically 30-90 seconds
|
| 225 |
- Model loading + inference: add 10-30s buffer
|
|
|
|
| 226 |
|
| 227 |
Functions that typically need @spaces.GPU:
|
| 228 |
- Image generation (text-to-image, image-to-image)
|
|
@@ -231,6 +232,43 @@ Functions that typically need @spaces.GPU:
|
|
| 231 |
- Model inference with transformers, diffusers
|
| 232 |
- Any function using .to('cuda') or GPU operations
|
| 233 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 234 |
## Complete Gradio API Reference
|
| 235 |
|
| 236 |
This reference is automatically synced from https://www.gradio.app/llms.txt to ensure accuracy.
|
|
@@ -278,6 +316,7 @@ Duration Guidelines:
|
|
| 278 |
- Video generation: typically 60-180 seconds
|
| 279 |
- Audio/music generation: typically 30-90 seconds
|
| 280 |
- Model loading + inference: add 10-30s buffer
|
|
|
|
| 281 |
|
| 282 |
Functions that typically need @spaces.GPU:
|
| 283 |
- Image generation (text-to-image, image-to-image)
|
|
@@ -286,6 +325,43 @@ Functions that typically need @spaces.GPU:
|
|
| 286 |
- Model inference with transformers, diffusers
|
| 287 |
- Any function using .to('cuda') or GPU operations
|
| 288 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 289 |
## Complete Gradio API Reference
|
| 290 |
|
| 291 |
This reference is automatically synced from https://www.gradio.app/llms.txt to ensure accuracy.
|
|
|
|
| 223 |
- Video generation: typically 60-180 seconds
|
| 224 |
- Audio/music generation: typically 30-90 seconds
|
| 225 |
- Model loading + inference: add 10-30s buffer
|
| 226 |
+
- AoT compilation during startup: use @spaces.GPU(duration=1500) for maximum allowed duration
|
| 227 |
|
| 228 |
Functions that typically need @spaces.GPU:
|
| 229 |
- Image generation (text-to-image, image-to-image)
|
|
|
|
| 232 |
- Model inference with transformers, diffusers
|
| 233 |
- Any function using .to('cuda') or GPU operations
|
| 234 |
|
| 235 |
+
## Advanced ZeroGPU Optimization (Recommended)
|
| 236 |
+
|
| 237 |
+
For production Spaces with heavy models, consider ahead-of-time (AoT) compilation for 1.3x-1.8x speedups:
|
| 238 |
+
|
| 239 |
+
```python
|
| 240 |
+
import spaces
|
| 241 |
+
import torch
|
| 242 |
+
from diffusers import DiffusionPipeline
|
| 243 |
+
|
| 244 |
+
pipe = DiffusionPipeline.from_pretrained(..., torch_dtype=torch.bfloat16)
|
| 245 |
+
pipe.to('cuda')
|
| 246 |
+
|
| 247 |
+
@spaces.GPU(duration=1500) # Max duration for compilation
|
| 248 |
+
def compile_transformer():
|
| 249 |
+
with spaces.aoti_capture(pipe.transformer) as call:
|
| 250 |
+
pipe("arbitrary example prompt")
|
| 251 |
+
|
| 252 |
+
exported = torch.export.export(
|
| 253 |
+
pipe.transformer,
|
| 254 |
+
args=call.args,
|
| 255 |
+
kwargs=call.kwargs,
|
| 256 |
+
)
|
| 257 |
+
return spaces.aoti_compile(exported)
|
| 258 |
+
|
| 259 |
+
compiled_transformer = compile_transformer()
|
| 260 |
+
spaces.aoti_apply(compiled_transformer, pipe.transformer)
|
| 261 |
+
|
| 262 |
+
@spaces.GPU
|
| 263 |
+
def generate(prompt):
|
| 264 |
+
return pipe(prompt).images
|
| 265 |
+
```
|
| 266 |
+
|
| 267 |
+
Optional enhancements:
|
| 268 |
+
- FP8 quantization with torchao for additional 1.2x speedup (H200 compatible)
|
| 269 |
+
- Dynamic shapes for variable input sizes
|
| 270 |
+
- FlashAttention-3 via kernels library for attention speedups
|
| 271 |
+
|
| 272 |
## Complete Gradio API Reference
|
| 273 |
|
| 274 |
This reference is automatically synced from https://www.gradio.app/llms.txt to ensure accuracy.
|
|
|
|
| 316 |
- Video generation: typically 60-180 seconds
|
| 317 |
- Audio/music generation: typically 30-90 seconds
|
| 318 |
- Model loading + inference: add 10-30s buffer
|
| 319 |
+
- AoT compilation during startup: use @spaces.GPU(duration=1500) for maximum allowed duration
|
| 320 |
|
| 321 |
Functions that typically need @spaces.GPU:
|
| 322 |
- Image generation (text-to-image, image-to-image)
|
|
|
|
| 325 |
- Model inference with transformers, diffusers
|
| 326 |
- Any function using .to('cuda') or GPU operations
|
| 327 |
|
| 328 |
+
## Advanced ZeroGPU Optimization (Recommended)
|
| 329 |
+
|
| 330 |
+
For production Spaces with heavy models, consider ahead-of-time (AoT) compilation for 1.3x-1.8x speedups:
|
| 331 |
+
|
| 332 |
+
```python
|
| 333 |
+
import spaces
|
| 334 |
+
import torch
|
| 335 |
+
from diffusers import DiffusionPipeline
|
| 336 |
+
|
| 337 |
+
pipe = DiffusionPipeline.from_pretrained(..., torch_dtype=torch.bfloat16)
|
| 338 |
+
pipe.to('cuda')
|
| 339 |
+
|
| 340 |
+
@spaces.GPU(duration=1500) # Max duration for compilation
|
| 341 |
+
def compile_transformer():
|
| 342 |
+
with spaces.aoti_capture(pipe.transformer) as call:
|
| 343 |
+
pipe("arbitrary example prompt")
|
| 344 |
+
|
| 345 |
+
exported = torch.export.export(
|
| 346 |
+
pipe.transformer,
|
| 347 |
+
args=call.args,
|
| 348 |
+
kwargs=call.kwargs,
|
| 349 |
+
)
|
| 350 |
+
return spaces.aoti_compile(exported)
|
| 351 |
+
|
| 352 |
+
compiled_transformer = compile_transformer()
|
| 353 |
+
spaces.aoti_apply(compiled_transformer, pipe.transformer)
|
| 354 |
+
|
| 355 |
+
@spaces.GPU
|
| 356 |
+
def generate(prompt):
|
| 357 |
+
return pipe(prompt).images
|
| 358 |
+
```
|
| 359 |
+
|
| 360 |
+
Optional enhancements:
|
| 361 |
+
- FP8 quantization with torchao for additional 1.2x speedup (H200 compatible)
|
| 362 |
+
- Dynamic shapes for variable input sizes
|
| 363 |
+
- FlashAttention-3 via kernels library for attention speedups
|
| 364 |
+
|
| 365 |
## Complete Gradio API Reference
|
| 366 |
|
| 367 |
This reference is automatically synced from https://www.gradio.app/llms.txt to ensure accuracy.
|