Instructions to use WhissleAI/STT-meta-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use WhissleAI/STT-meta-1B with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("WhissleAI/STT-meta-1B") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Access Whissle STT-meta-1B on Hugging Face
This model is licensed for inference only — no training, fine-tuning, distillation, or reverse engineering permitted. Accept the license to access. Automatic approval.
By clicking "Agree", you accept the Whissle Inference-Only License Agreement. See the LICENSE file for full terms. Key restrictions: INFERENCE ONLY — no training, fine-tuning, distillation, model compression, or reverse engineering permitted. Free for inference use under 100M MAU. "Powered by Whissle" attribution required for redistribution.
Log in or Sign Up to review the conditions and access this model content.
Whissle STT-meta-1B
Multilingual speech recognition model with dual-head tag classifier for real-time speaker metadata extraction. Built on Conformer-CTC architecture with 18 encoder layers. Supports 9 languages with age, gender, emotion, and intent detection per utterance.
Model Details
| Architecture | Conformer-CTC + dual-head tag classifier |
| Encoder | 512-dim, 18 layers, 4x subsampling |
| Download size | ~488 MB |
| Format | NeMo (.nemo) and ONNX (CPU and GPU compatible) |
| Sample rate | 16 kHz mono |
| Languages | English, Hindi, Spanish, French, German, Italian, Gujarati, Marathi |
| Base model | nvidia/parakeet-ctc-0.6b |
Tag Classifier Outputs
| Category | Classes | Labels |
|---|---|---|
| Age | 6 | 0-18, 18-30, 30-45, 45-60, 60+, NONE |
| Emotion | 8 | NEUTRAL, HAPPY, SAD, ANGRY, FEAR, SURPRISE, DISGUST, NONE |
| Gender | 4 | MALE, FEMALE, OTHER, NONE |
| Intent | 10 | COMMAND, DESCRIBE, EXCLAIM, EXPLAIN, INFORM, OPINION, QUESTION, REQUEST, STATEMENT, NONE |
Quick Start
Use with the Whissle STT Inference Server (ONNX, CPU):
git clone https://github.com/WhissleAI/whissle_stt_inference.git
cd whissle_stt_inference
./setup.sh --model en-meta
Or load directly with NeMo:
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("WhissleAI/STT-meta-1B")
transcriptions = asr_model.transcribe(["/path/to/your/audio.wav"])
Also usable with PromptingNemo.
Performance
Tested on CPU (Apple M-series):
| Audio length | Inference time | RTF | Tags |
|---|---|---|---|
| 25.9s | 3.6s | 0.14x | Female, 30-45, Neutral, Describe |
| 1.1s | 0.46s | 0.42x | Female, 18-30, Happy, Question |
License
Whissle Inference-Only License — inference only, no training/fine-tuning/distillation/reverse engineering. Free under 100M MAU.
Citation
@misc{whissle2026sttmeta1b,
title={Whissle STT-meta-1B: Multilingual ASR with Intent, Emotion, and Voice Biometrics},
author={Whissle AI},
year={2026},
url={https://huggingface.co/WhissleAI/STT-meta-1B}
}
- Downloads last month
- 5
Model tree for WhissleAI/STT-meta-1B
Base model
nvidia/parakeet-ctc-0.6b