JacobLinCool/VoiceBank-DEMAND-16k
Viewer • Updated • 12.4k • 1.54k • 11
A causal, fully-convolutional speech enhancer trained on VoiceBank-DEMAND-16k. Source: github.com/LarocheC/sparse-nsnet2. See RESULTS_CONVFSENET.md for the full results, architecture description, and the magnitude-compression trick that makes int8 deployment essentially loss-free.
| metric | value |
|---|---|
| params | 1.45 M |
| FP32 PESQ | 2.931 |
| int8 PESQ | 2.911 |
| Δ (FP32→int8) | +0.020 |
| int8 RTF (ORT CPU) | 0.017 |
| int8 size | 1.6 MiB |
PESQ is on the full 824-utterance VoiceBank-DEMAND test split.
| file | what it is |
|---|---|
g_best |
PyTorch checkpoint (full state dict — generator, optim, etc.) |
g_best_fp32.onnx |
Streaming FP32 ONNX (per-frame inputs + FIFO state buffers) |
g_best.onnx |
Static int8 ONNX (QDQ, per-channel weights, MinMax calibration; compression prologue kept FP32) |
config.json |
Training config (architecture + STFT params) |
PyTorch:
import json, torch
from huggingface_hub import hf_hub_download
from common.env import AttrDict
from convfsenet.model import build_causal_model
REPO = "claroche1/convfsenet"
cfg = json.load(open(hf_hub_download(REPO, "config.json")))
ckpt = torch.load(hf_hub_download(REPO, "g_best"),
map_location="cuda", weights_only=False)
model = build_causal_model(AttrDict(cfg)).cuda().eval()
model.load_state_dict(ckpt["generator"])
ONNX (FP32 or int8):
import onnxruntime as ort
from huggingface_hub import hf_hub_download
REPO = "claroche1/convfsenet"
sess = ort.InferenceSession(
hf_hub_download(REPO, "g_best.onnx"), # or g_best_fp32.onnx
providers=["CPUExecutionProvider"],
)
# Streaming shape: feed one frame of magnitude STFT (B, n_freq) + the per-block
# FIFO state buffers per call. End-to-end RMS-norm + STFT + frame loop + iSTFT
# pipeline lives in convfsenet/inference_onnx.py in the source repo.
MIT. See the source repository for training code and full attribution.