Vocoder produces click/pop artifact at the end of generated audio segments

#22

by hashchen - opened 25 days ago

Discussion

hashchen

25 days ago

•

edited 25 days ago

Description:

When using s2-pro for TTS via sgl-omni serve, the generated audio segments frequently contain an audible click or pop sound at the very end. This happens regardless of the text content or speaker reference used.

Observed behavior:

~90%+ of generated segments have an audible click/pop at the tail end of the audio
The artifact appears to be random — the same text can produce it on one run and not another
The artifact is present in the raw WAV output from the server — no client-side processing is applied

How to reproduce:
import requests

payload = {
"input": "Hello, how are you doing today?",
"response_format": "wav",
"references": [{"vq_codes": [...], "text": "..."}] # any valid reference
}
resp = requests.post("http://localhost:8080/v1/audio/speech", json=payload)

Listen to the end of the resulting WAV — click/pop is audible

Generate 20-30 segments with varied text — the vast majority will have the artifact at the end.

Root cause hypothesis:

The vocoder/codec decoder appears to stop generating abruptly before the waveform has decayed to zero, creating a discontinuity at the end of the audio. This is a classic cause of click/pop artifacts in digital audio.

Current workaround:

We trim the last 50-80ms off each generated segment and apply a short fade-out (15-30ms). This removes the artifact in most cases but occasionally clips the tail end of actual speech content — not ideal for short utterances.

Environment:

Model: fishaudio/s2-pro
Server: sgl-omni serve --model-path fishaudio/s2-pro --config examples/configs/s2pro_tts.yaml
Output format: WAV

Questions:

Is there a server-side config (e.g. in s2pro_tts.yaml) that controls end-of-sequence behavior or adds padding?
Could the model be made to generate a few extra silent frames at the end to ensure a clean tail-off?
Is this a known issue with the vocoder decoder?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment