File size: 7,104 Bytes
f70597d e5cc772 f70597d e5cc772 f70597d e5cc772 f70597d e5cc772 f70597d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 |
# Turnlet BERT Multilingual - End-of-Utterance Detection
A lightweight, multilingual DistilBERT model fine-tuned for End-of-Utterance (EOU) detection in conversational AI systems. This model supports **English, Hindi, and Spanish** with high accuracy and fast inference.
## Model Description
- **Architecture**: DistilBERT (6 layers, 768 hidden dimensions)
- **Parameters**: ~67M parameters (DistilBERT base)
- **Languages**: English, Hindi, Spanish
- **Task**: Binary sequence classification (EOU vs Non-EOU)
- **Training**: Knowledge distillation from teacher model
- **Model Size**:
- PyTorch (safetensors): 517 MB
- ONNX (optimized FP32): 517 MB
- ONNX (quantized INT8): 132 MB (74% size reduction)
## Performance Metrics
### Validation Set Performance (Step 60500)
| Language | Accuracy | Samples |
|----------|----------|---------|
| **English** | 97.01% | 16,258 |
| **Hindi** | 96.89% | 12,103 |
| **Spanish** | 94.52% | 7,963 |
| **Overall** | 96.43% | 36,324 |
**Validation Metrics:**
- F1 Score: 0.9635
- Precision: 0.9491
- Recall: 0.9783
### TURNS-2K Benchmark
- **Accuracy**: 91.10%
- **F1 Score**: 0.9150
- **Precision**: 0.9796
- **Recall**: 0.8584
## Model Variants
This repository includes three model formats:
1. **PyTorch (safetensors)**: `model.safetensors` - Full precision PyTorch model
2. **ONNX Optimized (FP32)**: `bert_model_optimized.onnx` - Optimized for inference, full precision
3. **ONNX Quantized (INT8)**: `bert_model_optimized_dynamic_int8.onnx` - **Recommended** for production
### Why Use the Quantized INT8 Model?
- โ
**74% smaller** (132 MB vs 517 MB)
- โ
**Faster inference** on CPU
- โ
**Minimal accuracy loss** (<0.5%)
- โ
**Lower memory footprint**
- โ
**Better for deployment**
## Quick Start
### Interactive Demo (Easiest Way)
```bash
# Clone the model repository
git clone https://huggingface.co/your-username/turnlet-bert-multilingual-eou
cd turnlet-bert-multilingual-eou
# Install dependencies
pip install -r requirements.txt
# Run interactive mode (default - uses fast ONNX INT8)
python inference_example.py
# Or explicitly use interactive mode
python inference_example.py --interactive
# Use PyTorch instead of ONNX
python inference_example.py --interactive --pytorch
# Adjust threshold
python inference_example.py --interactive --threshold 0.9
```
The interactive mode allows you to:
- ๐ฎ Type text and get instant EOU predictions
- ๐ Test in English, Hindi, or Spanish
- ๐ See confidence scores and inference times
- ๐ View visual confidence bars
- ๐ก Type 'examples' to see sample inputs
- ๐ช Type 'quit' or 'exit' to stop
### One-off Prediction
```bash
# Single prediction with ONNX (fast)
python inference_example.py --text "Thanks for your help!"
# Test suite with multiple examples
python inference_example.py --test-suite
```
### Using PyTorch (in Python)
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("your-username/turnlet-bert-multilingual-eou")
tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")
# Predict
text = "Thanks for your help!"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
is_eou = probs[0][1] > 0.5 # Using optimal threshold
print(f"EOU Probability: {probs[0][1]:.3f}")
print(f"Is EOU: {is_eou}")
```
### Using ONNX (Quantized INT8) - Recommended for Production
```python
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")
# Create ONNX session
session = ort.InferenceSession("bert_model_optimized_dynamic_int8.onnx")
# Tokenize
text = "Thanks for your help!"
inputs = tokenizer(text, padding="max_length", max_length=128, truncation=True, return_tensors="np")
# Prepare ONNX inputs
ort_inputs = {
'input_ids': inputs['input_ids'].astype(np.int64),
'attention_mask': inputs['attention_mask'].astype(np.int64)
}
# Run inference
outputs = session.run(None, ort_inputs)
logits = outputs[0][0]
# Calculate probability
probs = np.exp(logits) / np.sum(np.exp(logits))
is_eou = probs[1] > 0.5 # Using optimal threshold
print(f"EOU Probability: {probs[1]:.3f}")
print(f"Is EOU: {is_eou}")
```
## Use Cases
This model is designed for:
- ๐ฃ๏ธ **Voice Assistants**: Detect when user has finished speaking
- ๐ฌ **Chatbots**: Identify complete user intents
- ๐ **Call Centers**: Segment customer utterances in real-time
- ๐ **Multilingual Applications**: Support English, Hindi, and Spanish speakers
- โก **Real-time Systems**: Fast inference with quantized model
## Training Details
### Training Data
The model was trained using knowledge distillation on a multilingual dataset:
- **English**: 76,258 samples
- **Hindi**: 75,103 samples
- **Spanish**: 75,963 samples
- **Total**: ~211K samples
### Training Configuration
- **Base Model**: DistilBERT multilingual
- **Method**: Knowledge distillation from Qwen-based teacher model
- **Epochs**: 8
- **Final Step**: 60,500
- **Optimization**: AdamW optimizer
- **Max Sequence Length**: 128 tokens
### Distillation Process
The model was created using sparse Mixture-of-Experts (MoE) based knowledge distillation:
1. Teacher model (Qwen-based) provides soft labels
2. Student model (DistilBERT) learns to mimic teacher predictions
3. Multi-stage training with progressive difficulty
4. Language-specific accuracy monitoring
## Evaluation
The model was evaluated on:
1. **Validation Set**: Balanced multilingual dataset
2. **TURNS-2K**: Standard benchmark for turn-taking detection
3. **Per-Language Metrics**: Individual language performance tracking
### Inference Speed
Approximate inference times (CPU, single sample):
- ONNX Optimized: ~70-120ms
- ONNX Quantized INT8: ~40-50ms
*Note: Actual speeds vary by hardware*
## Limitations
- Model performance is slightly lower on Spanish compared to English and Hindi
- Optimal threshold (0.86) may need adjustment for specific use cases
- Maximum sequence length is 128 tokens (longer texts will be truncated)
- Best performance on conversational, task-oriented dialogue
- May require fine-tuning for domain-specific applications
## Citation
If you use this model in your research or applications, please cite:
```bibtex
@model{turnlet-bert-multilingual-eou,
title={Turnlet BERT Multilingual: End-of-Utterance Detection},
author={Your Name},
year={2024},
publisher={Hugging Face},
note={Knowledge-distilled DistilBERT for multilingual EOU detection}
}
```
## License
Please specify your license here (e.g., Apache 2.0, MIT, etc.)
## Model Card Contact
For questions or feedback, please open an issue in the repository.
---
**Model Version**: Step 60500
**Last Updated**: November 2024
**Framework**: PyTorch, ONNX Runtime
**Languages**: English (en), Hindi (hi), Spanish (es)
|