HEEP Universal
High Entropy Exponential Pruning for State-of-the-Art Multilingual ASR
HEEP Universal is a state-of-the-art automatic speech recognition model that demonstrates how strategic entropy-based data curation outperforms brute-force data scaling. With a composite word error rate (WER) of 3.10% on English benchmarks, it challenges the "more data is better" paradigm by training on carefully selected high-information samples.
Model Overview
HEEP Universal supports transcription across 204 languages, including a wide range of Indic and global languages, with consistent performance across various domains such as meetings, earnings calls, broadcast media, and educational content. The model is optimized for high-precision, verbatim transcription capturing spoken content word-for-word with remarkable fidelity.
Core Insight: Strategic selection of high-entropy samples leads to better ASR models than training on larger but redundant datasets.
HEEP Methodology
HEEP (High Entropy Exponential Pruning) is an entropy-based data curation methodology that prioritizes information density over data quantity. It identifies high-information training samples while progressively filtering redundant data, enabling efficient model training with significantly reduced computational resources.
Core Methodology
Sample Scoring (Equation 7):
S(x) = α₁·H_acoustic(x) + α₂·H_phonetic(x) + α₃·H_linguistic(x) + α₄·H_contextual(x) + α₅·MI(x, D)
Where:
H_acoustic(x): Spectral/MFCC entropy measuring acoustic diversityH_phonetic(x): Phoneme distribution entropy capturing phonetic complexityH_linguistic(x): Vocabulary and syntax entropy measuring linguistic richnessH_contextual(x): Domain and discourse entropyMI(x, D): Mutual information contribution relative to dataset
Progressive Filtering (Equation 9):
τ_{k+1} = τ_k × growth_factor
The threshold increases exponentially across training rounds, progressively selecting higher-entropy samples.
Key Benefits
- Training on 10-20% of data while matching or exceeding full-dataset performance
- Efficient multilingual model development with cross-lingual transfer
- Error-aware adaptive sample selection across training rounds
- Significant reduction in computational resources and training time
Performance Benchmarks
OpenASR Leaderboard Results
| Dataset | WER (%) | RTFx |
|---|---|---|
| AMI Test | 4.19 | 70.22 |
| Earnings22 Test | 5.83 | 101.52 |
| GigaSpeech Test | 4.99 | 131.09 |
| LibriSpeech Test Clean | 0.71 | 158.74 |
| LibriSpeech Test Other | 2.17 | 142.40 |
| SPGISpeech Test | 1.10 | 170.85 |
| TedLium Test | 1.43 | 153.34 |
| VoxPopuli Test | 4.34 | 179.28 |
Composite Results
- Overall WER: 3.10%
- Average RTFx: 146.23
RTFx (Real-Time Factor) indicates inference speed relative to audio duration. Higher values mean faster processing.
Comparative Performance
Performance comparison against other open-source models on 8 common speech benchmarks:
| Model | AMI | Earnings22 | GigaSpeech | LS Clean | LS Other | SPGISpeech | TedLium | Voxpopuli | Avg WER |
|---|---|---|---|---|---|---|---|---|---|
| nvidia/canary-qwen-2.5b | 10.19 | 10.45 | 9.43 | 1.61 | 3.10 | 1.90 | 2.71 | 5.66 | 5.63 |
| ibm/granite-speech-3.3-8b | 9.12 | 9.53 | 10.33 | 1.42 | 2.99 | 3.86 | 3.50 | 6.00 | 5.74 |
| nvidia/parakeet-tdt-0.6b-v2 | 11.16 | 11.15 | 9.74 | 1.69 | 3.19 | 2.17 | 3.38 | 5.95 | 6.05 |
| microsoft/Phi-4-multimodal-instruct | 11.45 | 10.50 | 9.77 | 1.67 | 3.82 | 3.11 | 2.89 | 5.93 | 6.14 |
| nvidia/canary-1b-flash | 13.11 | 12.77 | 9.85 | 1.48 | 2.87 | 1.95 | 3.12 | 5.63 | 6.35 |
| HEEP Universal (Ours) | 4.19 | 5.83 | 4.99 | 0.71 | 2.17 | 1.10 | 1.43 | 4.34 | 3.10 |
Model Details
- Architecture: Transformer-based encoder-decoder optimized for multilingual transcription
- Languages: 204 languages supported
- Format: Transformers compatible (safetensors)
- Sampling Rate: 16 kHz
- Precision: FP16/FP32 supported
- Optimization: Real-time inference capable with GPU acceleration
Key Features
- Exceptional Accuracy: Achieves 3.10% WER across diverse English test sets
- Real-Time Performance: Average RTFx of 146.23 enables real-time applications
- Verbatim Transcription: Optimized for accurate, word-for-word transcription
- Multi-Domain Excellence: Superior performance across conversational, broadcast, and read speech
- Multilingual Support: 204 languages with cross-lingual transfer learning
- HEEP-Curated Training: Strategic entropy-based data selection for maximum information density
Usage
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
import torch
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model = AutoModelForSpeechSeq2Seq.from_pretrained(
"bc7ec356/heep-universal",
torch_dtype=torch_dtype,
use_safetensors=True,
)
model.to(device)
processor = AutoProcessor.from_pretrained("bc7ec356/heep-universal")
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch_dtype,
device=device,
)
result = pipe("audio.wav")
print(result["text"])
Use Cases
HEEP Universal excels in various speech recognition scenarios:
- Meeting Transcription: High accuracy on conversational speech (AMI: 4.19% WER)
- Financial Communications: Specialized performance on earnings calls (Earnings22: 5.83% WER)
- Broadcast Media: Excellent results on news, podcasts, and media content
- Educational Content: Optimized for lectures and presentations
- Customer Support: Accurate transcription of support calls
- Legal Documentation: Professional-grade accuracy for legal proceedings
- Medical Transcription: High-quality transcription for medical consultations
Performance Optimization Tips
- GPU Acceleration: Use
device="cuda"for significantly faster inference - Precision: Set
torch_dtype=torch.float16for optimal speed on modern GPUs - Language Specification: Specify language code when known to improve accuracy and speed
- Beam Size: Use
beam_size=5for best accuracy, reduce for faster inference - Batch Processing: Process multiple files with a single model instance for efficiency
Acknowledgments
HEEP Universal was developed using the HEEP framework for entropy-based data curation. We thank the open-source community for providing foundational tools that make this work possible.
Citation
If you use this model in your research, please cite:
@article{anonymous2026heep,
title={HEEP: High Entropy Exponential Pruning for State-of-the-Art ASR Through Strategic Data Curation},
author={Anonymous},
journal={Under Review},
year={2026}
}
- Downloads last month
- -