ESPnet

non-profit

AI & ML interests

voice-conversion speech-separation speech-enhancement speech-translation speech-synthesis speech-recognition spoken-language-understanding

Recent Activity

Fhrozen new activity 10 days ago

espnet/pengcheng_aishell_asr_train_asr_whisper_medium_finetune_raw_zh_whisper_multilingual_sp:app.py

Fhrozen new activity 10 days ago

espnet/pengcheng_aishell_asr_train_asr_whisper_medium_finetune_raw_zh_whisper_multilingual_sp:app.py

RishabA updated a model 17 days ago

espnet/ta_openslr127

View all activity

Fhrozen

in espnet/pengcheng_aishell_asr_train_asr_whisper_medium_finetune_raw_zh_whisper_multilingual_sp 10 days ago

app.py

#4 opened 15 days ago by

nathanjames

app.py

#5 opened 15 days ago by

nathanjames

RishabA

updated a model 17 days ago

espnet/ta_openslr127

Automatic Speech Recognition • Updated 17 days ago • 15 • 1

huckiyang

authored 2 papers 17 days ago

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Paper • 2604.24954 • Published Apr 27 • 26

The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents

Paper • 2602.14224 • Published Feb 15

RishabA

published a model 18 days ago

espnet/ta_openslr127

Automatic Speech Recognition • Updated 17 days ago • 15 • 1

s3nh

posted an update 20 days ago

Post

207

Existing methods — GPTQ, AWQ, llama.cpp's k-quants — minimize empirical loss heuristically. None of them prove they are optimal in any information-theoretic sense. ICRB-Q builds a quantization scheme that is provably optimal via the Cramér-Rao lower bound (CRB): no unbiased estimator of a weight can have lower variance than [F(θ)]⁻¹, where F is the Fisher information matrix.

1 reply

vectominist

authored 2 papers 20 days ago

End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training

Paper • 2005.01972 • Published May 5, 2020

USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding

Paper • 2606.06444 • Published 25 days ago • 3

cyhuang-tw

updated a model about 1 month ago

espnet/multi-talker-whisper-small-ami

Automatic Speech Recognition • Updated May 27 • 5 • 1

cyhuang-tw

published a model about 1 month ago

espnet/multi-talker-whisper-small-ami

Automatic Speech Recognition • Updated May 27 • 5 • 1

cjli

updated a model about 2 months ago

espnet/powsm_ctc

Automatic Speech Recognition • Updated May 4 • 27 • 5

pyf98

authored 3 papers 2 months ago

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC

Paper • 2505.24200 • Published May 30, 2025

ESPnet-SpeechLM: An Open Speech Language Model Toolkit

Paper • 2502.15218 • Published Feb 21, 2025

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Paper • 2604.24954 • Published Apr 27 • 26

Aniket-Tathe-08

updated a model 2 months ago

espnet/marathi_lrec2020

Automatic Speech Recognition • Updated Apr 22

consome2

posted an update 2 months ago

Post

3283

Built a small site for tracking speech-to-speech, full-duplex, and audio foundation model work.
It covers models, benchmarks, datasets, and some blog posts to organize the landscape in one place.

Still early, but sharing in case it is useful:
https://www.fullduplex.ai/

If you spot missing entries or mistakes, I would really appreciate corrections.