Spaces:

AshishNoel14
/

tts-validation

Paused

AshishNoel14 commited on Jan 30

Commit

6b18a5f

verified ·

1 Parent(s): d213ced

Upload folder using huggingface_hub

Files changed (1) hide show

wvmos/wv_mos.py CHANGED Viewed

@@ -109,11 +109,12 @@ class Wav2Vec2MOS(nn.Module):
         # 1. Load Audio (Original 16k)
         signal = librosa.load(path, sr=16_000)[0]
-        # 2. Sliding Window (10-minute window, 5-minute overlap)
-        # BENCHMARK RESULT: Win=600s yields 0.00 deviation from Ground Truth!
-        # This fits in 16GB RAM (~4GB peak) and solves the score inflation issue.
-        window_size = 16000 * 600  # 10 minutes
-        stride = 16000 * 300       # 5 minutes
         # Prepare windows
         chunks = []

         # 1. Load Audio (Original 16k)
         signal = librosa.load(path, sr=16_000)[0]
+        # 2. Sliding Window (5-minute window, 2.5-minute overlap)
+        # 600s (10-min) caused runtime failures/OOM on Hugging Face.
+        # 300s (5-min) was verified to have 0.09 deviation (within 0.1 tolerance)
+        # and is much safer for memory.
+        window_size = 16000 * 300  # 5 minutes
+        stride = 16000 * 150       # 2.5 minutes
         # Prepare windows
         chunks = []