Spaces:
Running on Zero
Running on Zero
Commit ·
12556c0
1
Parent(s): c3cec42
fix: free GPU memory between samples to prevent VRAM fragmentation
Browse filesAll samples within a single generate call share one @spaces.GPU
reservation. Without explicit cleanup, each sample's intermediate
tensors accumulate in the CUDA allocator cache, fragmenting VRAM and
causing progressive quality degradation on samples 2, 3, 4+.
torch.cuda.empty_cache() after each sample flushes the allocator so
every sample starts from a clean memory state, making quality
consistent across all generations.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
app.py
CHANGED
|
@@ -535,6 +535,11 @@ def _taro_gpu_infer(video_file, seed_val, cfg_scale, num_steps, mode,
|
|
| 535 |
_TARO_INFERENCE_CACHE.pop(next(iter(_TARO_INFERENCE_CACHE)))
|
| 536 |
results.append((wavs, cavp_feats, onset_feats))
|
| 537 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 538 |
return results
|
| 539 |
|
| 540 |
# Attach a context slot for the CPU wrapper to pass pre-computed data
|
|
|
|
| 535 |
_TARO_INFERENCE_CACHE.pop(next(iter(_TARO_INFERENCE_CACHE)))
|
| 536 |
results.append((wavs, cavp_feats, onset_feats))
|
| 537 |
|
| 538 |
+
# Free GPU memory between samples so VRAM fragmentation doesn't
|
| 539 |
+
# degrade diffusion quality on samples 2, 3, 4, etc.
|
| 540 |
+
if torch.cuda.is_available():
|
| 541 |
+
torch.cuda.empty_cache()
|
| 542 |
+
|
| 543 |
return results
|
| 544 |
|
| 545 |
# Attach a context slot for the CPU wrapper to pass pre-computed data
|