Access Whissle STT-meta-1B on Hugging Face

This model is licensed for inference only — no training, fine-tuning, distillation, or reverse engineering permitted. Accept the license to access. Automatic approval.

By clicking "Agree", you accept the Whissle Inference-Only License Agreement. See the LICENSE file for full terms. Key restrictions: INFERENCE ONLY — no training, fine-tuning, distillation, model compression, or reverse engineering permitted. Free for inference use under 100M MAU. "Powered by Whissle" attribution required for redistribution.

Whissle STT-meta-1B

Multilingual speech recognition model with dual-head tag classifier for real-time speaker metadata extraction. Built on Conformer-CTC architecture with 18 encoder layers. Supports 9 languages with age, gender, emotion, and intent detection per utterance.

Model Details


Architecture	Conformer-CTC + dual-head tag classifier
Encoder	512-dim, 18 layers, 4x subsampling
Download size	~488 MB
Format	NeMo (.nemo) and ONNX (CPU and GPU compatible)
Sample rate	16 kHz mono
Languages	English, Hindi, Spanish, French, German, Italian, Gujarati, Marathi
Base model	nvidia/parakeet-ctc-0.6b

Tag Classifier Outputs

Category	Classes	Labels
Age	6	0-18, 18-30, 30-45, 45-60, 60+, NONE
Emotion	8	NEUTRAL, HAPPY, SAD, ANGRY, FEAR, SURPRISE, DISGUST, NONE
Gender	4	MALE, FEMALE, OTHER, NONE
Intent	10	COMMAND, DESCRIBE, EXCLAIM, EXPLAIN, INFORM, OPINION, QUESTION, REQUEST, STATEMENT, NONE

Quick Start

Use with the Whissle STT Inference Server (ONNX, CPU):

git clone https://github.com/WhissleAI/whissle_stt_inference.git
cd whissle_stt_inference
./setup.sh --model en-meta

Or load directly with NeMo:

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.ASRModel.from_pretrained("WhissleAI/STT-meta-1B")
transcriptions = asr_model.transcribe(["/path/to/your/audio.wav"])

Also usable with PromptingNemo.

Performance

Tested on CPU (Apple M-series):

Audio length	Inference time	RTF	Tags
25.9s	3.6s	0.14x	Female, 30-45, Neutral, Describe
1.1s	0.46s	0.42x	Female, 18-30, Happy, Question

License

Whissle Inference-Only License — inference only, no training/fine-tuning/distillation/reverse engineering. Free under 100M MAU.

Citation

@misc{whissle2026sttmeta1b,
  title={Whissle STT-meta-1B: Multilingual ASR with Intent, Emotion, and Voice Biometrics},
  author={Whissle AI},
  year={2026},
  url={https://huggingface.co/WhissleAI/STT-meta-1B}
}

Downloads last month: 5

Model tree for WhissleAI/STT-meta-1B

Base model

nvidia/parakeet-ctc-0.6b

Finetuned

(10)

this model

WhissleAI
/

STT-meta-1B