BoxOfColors Claude Sonnet 4.6 commited on
Commit
12556c0
·
1 Parent(s): c3cec42

fix: free GPU memory between samples to prevent VRAM fragmentation

Browse files

All samples within a single generate call share one @spaces.GPU
reservation. Without explicit cleanup, each sample's intermediate
tensors accumulate in the CUDA allocator cache, fragmenting VRAM and
causing progressive quality degradation on samples 2, 3, 4+.

torch.cuda.empty_cache() after each sample flushes the allocator so
every sample starts from a clean memory state, making quality
consistent across all generations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (1) hide show
  1. app.py +5 -0
app.py CHANGED
@@ -535,6 +535,11 @@ def _taro_gpu_infer(video_file, seed_val, cfg_scale, num_steps, mode,
535
  _TARO_INFERENCE_CACHE.pop(next(iter(_TARO_INFERENCE_CACHE)))
536
  results.append((wavs, cavp_feats, onset_feats))
537
 
 
 
 
 
 
538
  return results
539
 
540
  # Attach a context slot for the CPU wrapper to pass pre-computed data
 
535
  _TARO_INFERENCE_CACHE.pop(next(iter(_TARO_INFERENCE_CACHE)))
536
  results.append((wavs, cavp_feats, onset_feats))
537
 
538
+ # Free GPU memory between samples so VRAM fragmentation doesn't
539
+ # degrade diffusion quality on samples 2, 3, 4, etc.
540
+ if torch.cuda.is_available():
541
+ torch.cuda.empty_cache()
542
+
543
  return results
544
 
545
  # Attach a context slot for the CPU wrapper to pass pre-computed data