5 3 13

kerwinEvern

Kerwin11

AI & ML interests

None yet

Recent Activity

liked a model 1 day ago

zai-org/GLM-OCR

liked a model 6 days ago

onnx-community/mms-300m-1130-forced-aligner-ONNX

liked a model 7 days ago

PaddlePaddle/en_PP-OCRv5_mobile_rec

View all activity

Organizations

None yet

liked a model 1 day ago

zai-org/GLM-OCR

Image-to-Text • Updated 25 days ago • 8.08M • • 1.71k

liked a model 6 days ago

onnx-community/mms-300m-1130-forced-aligner-ONNX

Automatic Speech Recognition • Updated Mar 28 • 58 • 2

liked a model 7 days ago

PaddlePaddle/en_PP-OCRv5_mobile_rec

Image-to-Text • Updated Aug 29, 2025 • 318k • 2

upvoted a collection 8 days ago

Nemotron Speech

Collection

Open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S • 12 items • Updated about 18 hours ago • 51

upvoted an article 17 days ago

Article

Scaling Real-Time Voice Agents with Cache-Aware Streaming ASR

Jan 5

•

New activity in 1-800-BAD-CODE/punctuation_fullstop_truecase_english 23 days ago

Is PyTorch necessary? I want to package it for client-side use, but the PyTorch dependency is too large.

#11 opened 23 days ago by

Kerwin11

liked a model 23 days ago

1-800-BAD-CODE/punctuation_fullstop_truecase_english

Updated Mar 19, 2023 • 38.2k • 16

liked a model 25 days ago

csukuangfj/sherpa-onnx-nemotron-speech-streaming-en-0.6b-2026-01-14

Updated Jan 14 • 3

liked a model 2 months ago

sentence-transformers/all-MiniLM-L6-v2

liked a model 3 months ago

Qwen/Qwen3-ForcedAligner-0.6B

Automatic Speech Recognition • 0.9B • Updated Jan 30 • 429k • 128

New activity in nvidia/nemotron-speech-streaming-en-0.6b 3 months ago

missing punctuation marks

🤗👀 1

#11 opened 3 months ago by

Kerwin11

updated a dataset 3 months ago

Kerwin11/audio_data

Viewer • Updated Feb 7 • 1 • 10

published a dataset 3 months ago

Kerwin11/audio_data

Viewer • Updated Feb 7 • 1 • 10

liked a model 3 months ago

mistralai/Voxtral-Mini-4B-Realtime-2602

Automatic Speech Recognition • 4B • Updated Mar 11 • 1.18M • 841

New activity in nvidia/nemotron-speech-streaming-en-0.6b 4 months ago

Does decoding efficiency decrease as the audio length increases?

👀 1

#9 opened 4 months ago by

Kerwin11

commented on Scaling Real-Time Voice Agents with Cache-Aware Streaming ASR 4 months ago

Thank you for the question, @Amirjab21 ! This is one of the key advantages of a native streaming model. The audio is not processed in a single pass over the full input; instead, it is consumed incrementally in small chunks as they arrive, with relevant contextual information preserved in the model’s cache. This design allows the model to handle arbitrarily long audio streams without an explicit duration limit, since context is carried forward through the cache and computation is performed only on the new incoming frames, rather than reprocessing the entire audio or chunking it to a fixed maximum length.

Hi kunaldhawan ~ @kunaldhawan
Based on my testing, it seems that the inference speed decreases as the audio length increases. I tested a 30-minute audio file, and the speed dropped by approximately 10ms in the final stage.
This is my test discussions
https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b/discussions/9