You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

ConvFSENet

A causal, fully-convolutional speech enhancer trained on VoiceBank-DEMAND-16k. Source: github.com/LarocheC/sparse-nsnet2. See RESULTS_CONVFSENET.md for the full results, architecture description, and the magnitude-compression trick that makes int8 deployment essentially loss-free.

This repo holds two channel-width variants:

variant	location	params	FP32 PESQ	int8 PESQ	Δ (FP32→int8)	int8 RTF	int8 size
192/384 (deployed)	repo root	1.45 M	2.931	2.911	+0.020	0.017	1.6 MiB
128/256 (compact)	`128-256/`	0.67 M	2.891	2.883	+0.008	0.032	0.80 MiB

n_channels_res / n_channels_conv are the only differences — identical recipe (mag-compressed input, 200-epoch PESQ metric-GAN, cosine LR). The 128/256 variant is ~54% fewer params and half the int8 size for ~0.03 PESQ, with int8 PTQ again essentially loss-free (+0.008). PESQ is on the full 824-utterance VoiceBank-DEMAND test split; RTF is the int8 streaming session under onnxruntime CPU (single thread).

Files (per variant)

file	what it is
`g_best`	PyTorch checkpoint (`{"generator": state_dict}`)
`g_best_fp32.onnx`	Streaming FP32 ONNX (per-frame inputs + FIFO state buffers)
`g_best.onnx`	Static int8 ONNX (QDQ, per-channel weights, MinMax calibration; compression prologue kept FP32)
`config.json`	Training config (architecture + STFT params)

The root files are the 192/384 model; the same four files under 128-256/ are the compact model.

Loading

PyTorch (set SUB = "" for the root 192/384 model, or "128-256/" for the compact one):

import json, torch
from huggingface_hub import hf_hub_download
from common.env import AttrDict
from convfsenet.model import build_causal_model

REPO, SUB = "claroche1/convfsenet", "128-256/"        # or SUB = "" for the deployed 192/384 model
cfg  = json.load(open(hf_hub_download(REPO, SUB + "config.json")))
ckpt = torch.load(hf_hub_download(REPO, SUB + "g_best"),
                  map_location="cuda", weights_only=False)
model = build_causal_model(AttrDict(cfg)).cuda().eval()
model.load_state_dict(ckpt["generator"])

ONNX (FP32 or int8):

import onnxruntime as ort
from huggingface_hub import hf_hub_download

REPO, SUB = "claroche1/convfsenet", "128-256/"        # or SUB = "" for the root model
sess = ort.InferenceSession(
    hf_hub_download(REPO, SUB + "g_best.onnx"),   # or SUB + "g_best_fp32.onnx"
    providers=["CPUExecutionProvider"],
)
# Streaming shape: feed one frame of magnitude STFT (B, n_freq) + the per-block
# FIFO state buffers per call. End-to-end RMS-norm + STFT + frame loop + iSTFT
# pipeline lives in convfsenet/inference_onnx.py in the source repo.

License

MIT. See the source repository for training code and full attribution.

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

claroche1
/

convfsenet

You need to agree to share your contact information to access this model

ConvFSENet

Files (per variant)

Loading

License

Dataset used to train claroche1/convfsenet