Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants
Browse files- .gitattributes +3 -32
- README.md +247 -0
- UPLOAD_GUIDE.md +211 -0
- bert_model_optimized.onnx +3 -0
- bert_model_optimized_dynamic_int8.onnx +3 -0
- config.json +24 -0
- inference_example.py +265 -0
- metrics.yaml +23 -0
- model.safetensors +3 -0
- model_card.json +71 -0
- requirements.txt +6 -0
- special_tokens_map.json +7 -0
- tokenizer.json +0 -0
- tokenizer_config.json +56 -0
- vocab.txt +0 -0
.gitattributes
CHANGED
|
@@ -1,35 +1,6 @@
|
|
| 1 |
-
*.
|
| 2 |
-
*.
|
| 3 |
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
-
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
-
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
-
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
-
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
-
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
-
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
-
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
-
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
-
|
| 15 |
-
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
-
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
-
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
-
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
-
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
-
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
-
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
-
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
-
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
-
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
-
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
-
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
-
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
-
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
-
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
-
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
-
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
-
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
-
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
-
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
-
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
| 1 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 3 |
*.bin filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
*.h5 filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
ADDED
|
@@ -0,0 +1,247 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Turnlet BERT Multilingual - End-of-Utterance Detection
|
| 2 |
+
|
| 3 |
+
A lightweight, multilingual DistilBERT model fine-tuned for End-of-Utterance (EOU) detection in conversational AI systems. This model supports **English, Hindi, and Spanish** with high accuracy and fast inference.
|
| 4 |
+
|
| 5 |
+
## Model Description
|
| 6 |
+
|
| 7 |
+
- **Architecture**: DistilBERT (6 layers, 768 hidden dimensions)
|
| 8 |
+
- **Parameters**: ~67M parameters (DistilBERT base)
|
| 9 |
+
- **Languages**: English, Hindi, Spanish
|
| 10 |
+
- **Task**: Binary sequence classification (EOU vs Non-EOU)
|
| 11 |
+
- **Training**: Knowledge distillation from teacher model
|
| 12 |
+
- **Model Size**:
|
| 13 |
+
- PyTorch (safetensors): 517 MB
|
| 14 |
+
- ONNX (optimized FP32): 517 MB
|
| 15 |
+
- ONNX (quantized INT8): 132 MB (74% size reduction)
|
| 16 |
+
|
| 17 |
+
## Performance Metrics
|
| 18 |
+
|
| 19 |
+
### Validation Set Performance (Step 60500)
|
| 20 |
+
|
| 21 |
+
| Language | Accuracy | Samples |
|
| 22 |
+
|----------|----------|---------|
|
| 23 |
+
| **English** | 97.01% | 16,258 |
|
| 24 |
+
| **Hindi** | 96.89% | 12,103 |
|
| 25 |
+
| **Spanish** | 94.52% | 7,963 |
|
| 26 |
+
| **Overall** | 96.43% | 36,324 |
|
| 27 |
+
|
| 28 |
+
**Validation Metrics:**
|
| 29 |
+
- F1 Score: 0.9635
|
| 30 |
+
- Precision: 0.9491
|
| 31 |
+
- Recall: 0.9783
|
| 32 |
+
|
| 33 |
+
### TURNS-2K Benchmark
|
| 34 |
+
|
| 35 |
+
- **Accuracy**: 91.10%
|
| 36 |
+
- **F1 Score**: 0.9150
|
| 37 |
+
- **Precision**: 0.9796
|
| 38 |
+
- **Recall**: 0.8584
|
| 39 |
+
- **Optimal Threshold**: 0.86
|
| 40 |
+
|
| 41 |
+
## Model Variants
|
| 42 |
+
|
| 43 |
+
This repository includes three model formats:
|
| 44 |
+
|
| 45 |
+
1. **PyTorch (safetensors)**: `model.safetensors` - Full precision PyTorch model
|
| 46 |
+
2. **ONNX Optimized (FP32)**: `bert_model_optimized.onnx` - Optimized for inference, full precision
|
| 47 |
+
3. **ONNX Quantized (INT8)**: `bert_model_optimized_dynamic_int8.onnx` - **Recommended** for production
|
| 48 |
+
|
| 49 |
+
### Why Use the Quantized INT8 Model?
|
| 50 |
+
|
| 51 |
+
- ✅ **74% smaller** (132 MB vs 517 MB)
|
| 52 |
+
- ✅ **Faster inference** on CPU
|
| 53 |
+
- ✅ **Minimal accuracy loss** (<0.5%)
|
| 54 |
+
- ✅ **Lower memory footprint**
|
| 55 |
+
- ✅ **Better for deployment**
|
| 56 |
+
|
| 57 |
+
## Quick Start
|
| 58 |
+
|
| 59 |
+
### Interactive Demo (Easiest Way)
|
| 60 |
+
|
| 61 |
+
```bash
|
| 62 |
+
# Clone the model repository
|
| 63 |
+
git clone https://huggingface.co/your-username/turnlet-bert-multilingual-eou
|
| 64 |
+
cd turnlet-bert-multilingual-eou
|
| 65 |
+
|
| 66 |
+
# Install dependencies
|
| 67 |
+
pip install -r requirements.txt
|
| 68 |
+
|
| 69 |
+
# Run interactive mode (default - uses fast ONNX INT8)
|
| 70 |
+
python inference_example.py
|
| 71 |
+
|
| 72 |
+
# Or explicitly use interactive mode
|
| 73 |
+
python inference_example.py --interactive
|
| 74 |
+
|
| 75 |
+
# Use PyTorch instead of ONNX
|
| 76 |
+
python inference_example.py --interactive --pytorch
|
| 77 |
+
|
| 78 |
+
# Adjust threshold
|
| 79 |
+
python inference_example.py --interactive --threshold 0.9
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
The interactive mode allows you to:
|
| 83 |
+
- 🎮 Type text and get instant EOU predictions
|
| 84 |
+
- 🌐 Test in English, Hindi, or Spanish
|
| 85 |
+
- 📊 See confidence scores and inference times
|
| 86 |
+
- 📈 View visual confidence bars
|
| 87 |
+
- 💡 Type 'examples' to see sample inputs
|
| 88 |
+
- 🚪 Type 'quit' or 'exit' to stop
|
| 89 |
+
|
| 90 |
+
### One-off Prediction
|
| 91 |
+
|
| 92 |
+
```bash
|
| 93 |
+
# Single prediction with ONNX (fast)
|
| 94 |
+
python inference_example.py --text "Thanks for your help!"
|
| 95 |
+
|
| 96 |
+
# Test suite with multiple examples
|
| 97 |
+
python inference_example.py --test-suite
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
### Using PyTorch (in Python)
|
| 101 |
+
|
| 102 |
+
```python
|
| 103 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 104 |
+
import torch
|
| 105 |
+
|
| 106 |
+
# Load model and tokenizer
|
| 107 |
+
model = AutoModelForSequenceClassification.from_pretrained("your-username/turnlet-bert-multilingual-eou")
|
| 108 |
+
tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")
|
| 109 |
+
|
| 110 |
+
# Predict
|
| 111 |
+
text = "Thanks for your help!"
|
| 112 |
+
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
|
| 113 |
+
outputs = model(**inputs)
|
| 114 |
+
probs = torch.softmax(outputs.logits, dim=-1)
|
| 115 |
+
is_eou = probs[0][1] > 0.86 # Using optimal threshold
|
| 116 |
+
|
| 117 |
+
print(f"EOU Probability: {probs[0][1]:.3f}")
|
| 118 |
+
print(f"Is EOU: {is_eou}")
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
### Using ONNX (Quantized INT8) - Recommended for Production
|
| 122 |
+
|
| 123 |
+
```python
|
| 124 |
+
import onnxruntime as ort
|
| 125 |
+
import numpy as np
|
| 126 |
+
from transformers import AutoTokenizer
|
| 127 |
+
|
| 128 |
+
# Load tokenizer
|
| 129 |
+
tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")
|
| 130 |
+
|
| 131 |
+
# Create ONNX session
|
| 132 |
+
session = ort.InferenceSession("bert_model_optimized_dynamic_int8.onnx")
|
| 133 |
+
|
| 134 |
+
# Tokenize
|
| 135 |
+
text = "Thanks for your help!"
|
| 136 |
+
inputs = tokenizer(text, padding="max_length", max_length=128, truncation=True, return_tensors="np")
|
| 137 |
+
|
| 138 |
+
# Prepare ONNX inputs
|
| 139 |
+
ort_inputs = {
|
| 140 |
+
'input_ids': inputs['input_ids'].astype(np.int64),
|
| 141 |
+
'attention_mask': inputs['attention_mask'].astype(np.int64)
|
| 142 |
+
}
|
| 143 |
+
|
| 144 |
+
# Run inference
|
| 145 |
+
outputs = session.run(None, ort_inputs)
|
| 146 |
+
logits = outputs[0][0]
|
| 147 |
+
|
| 148 |
+
# Calculate probability
|
| 149 |
+
probs = np.exp(logits) / np.sum(np.exp(logits))
|
| 150 |
+
is_eou = probs[1] > 0.86 # Using optimal threshold
|
| 151 |
+
|
| 152 |
+
print(f"EOU Probability: {probs[1]:.3f}")
|
| 153 |
+
print(f"Is EOU: {is_eou}")
|
| 154 |
+
```
|
| 155 |
+
|
| 156 |
+
## Use Cases
|
| 157 |
+
|
| 158 |
+
This model is designed for:
|
| 159 |
+
|
| 160 |
+
- 🗣️ **Voice Assistants**: Detect when user has finished speaking
|
| 161 |
+
- 💬 **Chatbots**: Identify complete user intents
|
| 162 |
+
- 📞 **Call Centers**: Segment customer utterances in real-time
|
| 163 |
+
- 🌐 **Multilingual Applications**: Support English, Hindi, and Spanish speakers
|
| 164 |
+
- ⚡ **Real-time Systems**: Fast inference with quantized model
|
| 165 |
+
|
| 166 |
+
## Training Details
|
| 167 |
+
|
| 168 |
+
### Training Data
|
| 169 |
+
|
| 170 |
+
The model was trained using knowledge distillation on a multilingual dataset:
|
| 171 |
+
|
| 172 |
+
- **English**: 16,258 samples
|
| 173 |
+
- **Hindi**: 12,103 samples
|
| 174 |
+
- **Spanish**: 7,963 samples
|
| 175 |
+
- **Total**: ~36K samples
|
| 176 |
+
|
| 177 |
+
### Training Configuration
|
| 178 |
+
|
| 179 |
+
- **Base Model**: DistilBERT multilingual
|
| 180 |
+
- **Method**: Knowledge distillation from Qwen-based teacher model
|
| 181 |
+
- **Epochs**: 8
|
| 182 |
+
- **Final Step**: 60,500
|
| 183 |
+
- **Optimization**: AdamW optimizer
|
| 184 |
+
- **Max Sequence Length**: 128 tokens
|
| 185 |
+
|
| 186 |
+
### Distillation Process
|
| 187 |
+
|
| 188 |
+
The model was created using sparse Mixture-of-Experts (MoE) based knowledge distillation:
|
| 189 |
+
1. Teacher model (Qwen-based) provides soft labels
|
| 190 |
+
2. Student model (DistilBERT) learns to mimic teacher predictions
|
| 191 |
+
3. Multi-stage training with progressive difficulty
|
| 192 |
+
4. Language-specific accuracy monitoring
|
| 193 |
+
|
| 194 |
+
## Evaluation
|
| 195 |
+
|
| 196 |
+
The model was evaluated on:
|
| 197 |
+
|
| 198 |
+
1. **Validation Set**: Balanced multilingual dataset
|
| 199 |
+
2. **TURNS-2K**: Standard benchmark for turn-taking detection
|
| 200 |
+
3. **Per-Language Metrics**: Individual language performance tracking
|
| 201 |
+
|
| 202 |
+
### Inference Speed
|
| 203 |
+
|
| 204 |
+
Approximate inference times (CPU, single sample):
|
| 205 |
+
- PyTorch: ~15-20ms
|
| 206 |
+
- ONNX Optimized: ~8-12ms
|
| 207 |
+
- ONNX Quantized INT8: ~5-8ms
|
| 208 |
+
|
| 209 |
+
*Note: Actual speeds vary by hardware*
|
| 210 |
+
|
| 211 |
+
## Limitations
|
| 212 |
+
|
| 213 |
+
- Model performance is slightly lower on Spanish compared to English and Hindi
|
| 214 |
+
- Optimal threshold (0.86) may need adjustment for specific use cases
|
| 215 |
+
- Maximum sequence length is 128 tokens (longer texts will be truncated)
|
| 216 |
+
- Best performance on conversational, task-oriented dialogue
|
| 217 |
+
- May require fine-tuning for domain-specific applications
|
| 218 |
+
|
| 219 |
+
## Citation
|
| 220 |
+
|
| 221 |
+
If you use this model in your research or applications, please cite:
|
| 222 |
+
|
| 223 |
+
```bibtex
|
| 224 |
+
@model{turnlet-bert-multilingual-eou,
|
| 225 |
+
title={Turnlet BERT Multilingual: End-of-Utterance Detection},
|
| 226 |
+
author={Your Name},
|
| 227 |
+
year={2024},
|
| 228 |
+
publisher={Hugging Face},
|
| 229 |
+
note={Knowledge-distilled DistilBERT for multilingual EOU detection}
|
| 230 |
+
}
|
| 231 |
+
```
|
| 232 |
+
|
| 233 |
+
## License
|
| 234 |
+
|
| 235 |
+
Please specify your license here (e.g., Apache 2.0, MIT, etc.)
|
| 236 |
+
|
| 237 |
+
## Model Card Contact
|
| 238 |
+
|
| 239 |
+
For questions or feedback, please open an issue in the repository.
|
| 240 |
+
|
| 241 |
+
---
|
| 242 |
+
|
| 243 |
+
**Model Version**: Step 60500
|
| 244 |
+
**Last Updated**: November 2024
|
| 245 |
+
**Framework**: PyTorch, ONNX Runtime
|
| 246 |
+
**Languages**: English (en), Hindi (hi), Spanish (es)
|
| 247 |
+
|
UPLOAD_GUIDE.md
ADDED
|
@@ -0,0 +1,211 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Hugging Face Upload Guide
|
| 2 |
+
|
| 3 |
+
This guide will help you upload the Turnlet BERT Multilingual EOU model to Hugging Face.
|
| 4 |
+
|
| 5 |
+
## 📦 Package Contents
|
| 6 |
+
|
| 7 |
+
This folder contains everything needed for a complete Hugging Face model repository:
|
| 8 |
+
|
| 9 |
+
### Model Files
|
| 10 |
+
- **`model.safetensors`** (517 MB) - PyTorch model weights in safetensors format
|
| 11 |
+
- **`bert_model_optimized.onnx`** (517 MB) - Optimized ONNX model (FP32)
|
| 12 |
+
- **`bert_model_optimized_dynamic_int8.onnx`** (132 MB) - ⭐ Quantized ONNX model (INT8, recommended)
|
| 13 |
+
|
| 14 |
+
### Tokenizer Files
|
| 15 |
+
- **`tokenizer.json`** - Fast tokenizer
|
| 16 |
+
- **`tokenizer_config.json`** - Tokenizer configuration
|
| 17 |
+
- **`vocab.txt`** - Vocabulary file
|
| 18 |
+
- **`special_tokens_map.json`** - Special tokens mapping
|
| 19 |
+
|
| 20 |
+
### Configuration Files
|
| 21 |
+
- **`config.json`** - Model architecture configuration
|
| 22 |
+
- **`metrics.yaml`** - Training and validation metrics
|
| 23 |
+
|
| 24 |
+
### Documentation
|
| 25 |
+
- **`README.md`** - Comprehensive model card and documentation
|
| 26 |
+
- **`model_card.json`** - Machine-readable model metadata
|
| 27 |
+
- **`requirements.txt`** - Python dependencies
|
| 28 |
+
- **`.gitattributes`** - Git LFS configuration for large files
|
| 29 |
+
|
| 30 |
+
### Code Examples
|
| 31 |
+
- **`inference_example.py`** - Interactive demo and usage examples
|
| 32 |
+
- **`UPLOAD_GUIDE.md`** - This file
|
| 33 |
+
|
| 34 |
+
## 🚀 Upload Steps
|
| 35 |
+
|
| 36 |
+
### Option 1: Using Hugging Face CLI (Recommended)
|
| 37 |
+
|
| 38 |
+
```bash
|
| 39 |
+
# Install Hugging Face CLI
|
| 40 |
+
pip install huggingface-hub
|
| 41 |
+
|
| 42 |
+
# Login to Hugging Face
|
| 43 |
+
huggingface-cli login
|
| 44 |
+
|
| 45 |
+
# Navigate to the model folder
|
| 46 |
+
cd /home/ubuntu/hf_upload/turnlet-bert-multilingual-eou
|
| 47 |
+
|
| 48 |
+
# Create repository (replace YOUR_USERNAME with your HF username)
|
| 49 |
+
huggingface-cli repo create turnlet-bert-multilingual-eou --type model
|
| 50 |
+
|
| 51 |
+
# Initialize git and git-lfs
|
| 52 |
+
git init
|
| 53 |
+
git lfs install
|
| 54 |
+
git lfs track "*.onnx"
|
| 55 |
+
git lfs track "*.safetensors"
|
| 56 |
+
|
| 57 |
+
# Add all files
|
| 58 |
+
git add .
|
| 59 |
+
|
| 60 |
+
# Commit
|
| 61 |
+
git commit -m "Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants"
|
| 62 |
+
|
| 63 |
+
# Add remote (replace YOUR_USERNAME)
|
| 64 |
+
git remote add origin https://huggingface.co/YOUR_USERNAME/turnlet-bert-multilingual-eou
|
| 65 |
+
|
| 66 |
+
# Push to Hugging Face
|
| 67 |
+
git push -u origin main
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
### Option 2: Using Python API
|
| 71 |
+
|
| 72 |
+
```python
|
| 73 |
+
from huggingface_hub import HfApi, create_repo
|
| 74 |
+
|
| 75 |
+
# Initialize API
|
| 76 |
+
api = HfApi()
|
| 77 |
+
|
| 78 |
+
# Login (you'll be prompted for token)
|
| 79 |
+
from huggingface_hub import login
|
| 80 |
+
login()
|
| 81 |
+
|
| 82 |
+
# Create repository
|
| 83 |
+
repo_id = "YOUR_USERNAME/turnlet-bert-multilingual-eou"
|
| 84 |
+
create_repo(repo_id, repo_type="model", exist_ok=True)
|
| 85 |
+
|
| 86 |
+
# Upload folder
|
| 87 |
+
api.upload_folder(
|
| 88 |
+
folder_path="/home/ubuntu/hf_upload/turnlet-bert-multilingual-eou",
|
| 89 |
+
repo_id=repo_id,
|
| 90 |
+
repo_type="model",
|
| 91 |
+
)
|
| 92 |
+
|
| 93 |
+
print(f"✅ Model uploaded to: https://huggingface.co/{repo_id}")
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
### Option 3: Manual Upload via Web Interface
|
| 97 |
+
|
| 98 |
+
1. Go to https://huggingface.co/new
|
| 99 |
+
2. Create a new model repository: `turnlet-bert-multilingual-eou`
|
| 100 |
+
3. Use the web interface to upload files:
|
| 101 |
+
- Upload large files (`.onnx`, `.safetensors`) via Git LFS
|
| 102 |
+
- Upload smaller files directly via web interface
|
| 103 |
+
4. Copy the README.md content to the model card
|
| 104 |
+
|
| 105 |
+
## ⚠️ Important Notes
|
| 106 |
+
|
| 107 |
+
### Git LFS Required
|
| 108 |
+
The model files are large and require Git LFS (Large File Storage):
|
| 109 |
+
- Make sure Git LFS is installed: `git lfs install`
|
| 110 |
+
- The `.gitattributes` file is already configured
|
| 111 |
+
- Files tracked: `*.onnx`, `*.safetensors`
|
| 112 |
+
|
| 113 |
+
### File Sizes
|
| 114 |
+
- Total repository size: ~1.2 GB
|
| 115 |
+
- Largest files: ONNX FP32 (517 MB) and PyTorch (517 MB)
|
| 116 |
+
- Recommended for deployment: INT8 ONNX (132 MB)
|
| 117 |
+
|
| 118 |
+
### Model Naming
|
| 119 |
+
Consider these naming conventions:
|
| 120 |
+
- `YOUR_USERNAME/turnlet-bert-multilingual-eou`
|
| 121 |
+
- `YOUR_ORG/turnlet-eou-detection-multilingual`
|
| 122 |
+
- `YOUR_USERNAME/distilbert-eou-en-hi-es`
|
| 123 |
+
|
| 124 |
+
### Tags to Add
|
| 125 |
+
When creating the repository, add these tags:
|
| 126 |
+
- `end-of-utterance`
|
| 127 |
+
- `eou-detection`
|
| 128 |
+
- `multilingual`
|
| 129 |
+
- `distilbert`
|
| 130 |
+
- `onnx`
|
| 131 |
+
- `quantized`
|
| 132 |
+
- `conversational-ai`
|
| 133 |
+
- `dialogue`
|
| 134 |
+
- `turn-taking`
|
| 135 |
+
- `text-classification`
|
| 136 |
+
|
| 137 |
+
## 🧪 Testing After Upload
|
| 138 |
+
|
| 139 |
+
After uploading, test the model:
|
| 140 |
+
|
| 141 |
+
```python
|
| 142 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 143 |
+
|
| 144 |
+
# Test loading
|
| 145 |
+
model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")
|
| 146 |
+
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")
|
| 147 |
+
|
| 148 |
+
# Quick test
|
| 149 |
+
text = "Thanks for your help!"
|
| 150 |
+
inputs = tokenizer(text, return_tensors="pt")
|
| 151 |
+
outputs = model(**inputs)
|
| 152 |
+
print(f"✅ Model loaded and working! Logits: {outputs.logits}")
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
+
## 📝 Post-Upload Checklist
|
| 156 |
+
|
| 157 |
+
After successful upload:
|
| 158 |
+
|
| 159 |
+
- [ ] Verify all files are uploaded
|
| 160 |
+
- [ ] Test model loading via transformers
|
| 161 |
+
- [ ] Test ONNX model download
|
| 162 |
+
- [ ] Update README with correct username/repo paths
|
| 163 |
+
- [ ] Add license information
|
| 164 |
+
- [ ] Add model tags and metadata
|
| 165 |
+
- [ ] Test interactive script
|
| 166 |
+
- [ ] Share on social media/communities
|
| 167 |
+
|
| 168 |
+
## 🔗 Useful Links
|
| 169 |
+
|
| 170 |
+
- Hugging Face Hub Documentation: https://huggingface.co/docs/hub
|
| 171 |
+
- Git LFS: https://git-lfs.github.com/
|
| 172 |
+
- Model Cards Guide: https://huggingface.co/docs/hub/model-cards
|
| 173 |
+
- ONNX Models: https://huggingface.co/docs/hub/onnx
|
| 174 |
+
|
| 175 |
+
## 💡 Tips
|
| 176 |
+
|
| 177 |
+
1. **Use descriptive commit messages** when updating the model
|
| 178 |
+
2. **Version your models** by creating tags (v1.0, v2.0, etc.)
|
| 179 |
+
3. **Monitor downloads** via your Hugging Face dashboard
|
| 180 |
+
4. **Respond to community questions** in the community tab
|
| 181 |
+
5. **Update metrics** as you improve the model
|
| 182 |
+
|
| 183 |
+
## 🆘 Troubleshooting
|
| 184 |
+
|
| 185 |
+
### Git LFS Bandwidth Issues
|
| 186 |
+
If you hit LFS bandwidth limits:
|
| 187 |
+
- Use smaller model variant first
|
| 188 |
+
- Upload during off-peak hours
|
| 189 |
+
- Consider Hugging Face Pro for more bandwidth
|
| 190 |
+
|
| 191 |
+
### Authentication Issues
|
| 192 |
+
```bash
|
| 193 |
+
# Re-login
|
| 194 |
+
huggingface-cli login --token YOUR_TOKEN
|
| 195 |
+
|
| 196 |
+
# Or set token as environment variable
|
| 197 |
+
export HUGGING_FACE_HUB_TOKEN=YOUR_TOKEN
|
| 198 |
+
```
|
| 199 |
+
|
| 200 |
+
### Large File Upload Timeout
|
| 201 |
+
```bash
|
| 202 |
+
# Increase timeout
|
| 203 |
+
git config http.postBuffer 524288000
|
| 204 |
+
git config http.lowSpeedLimit 0
|
| 205 |
+
git config http.lowSpeedTime 999999
|
| 206 |
+
```
|
| 207 |
+
|
| 208 |
+
## ✅ Ready to Upload!
|
| 209 |
+
|
| 210 |
+
Your model is fully prepared and ready for upload to Hugging Face! 🎉
|
| 211 |
+
|
bert_model_optimized.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2f1972ac9ff31da8fcf9d5e4e053caa0a6218c5ae1899cbec14e5da6ab043dc6
|
| 3 |
+
size 541380730
|
bert_model_optimized_dynamic_int8.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8d5084af77f9164892dc3402d5419c7dbc1dfb559333f7ec141248a5f49e1591
|
| 3 |
+
size 137635060
|
config.json
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"activation": "gelu",
|
| 3 |
+
"architectures": [
|
| 4 |
+
"DistilBertForSequenceClassification"
|
| 5 |
+
],
|
| 6 |
+
"attention_dropout": 0.1,
|
| 7 |
+
"dim": 768,
|
| 8 |
+
"dropout": 0.1,
|
| 9 |
+
"dtype": "float32",
|
| 10 |
+
"hidden_dim": 3072,
|
| 11 |
+
"initializer_range": 0.02,
|
| 12 |
+
"max_position_embeddings": 512,
|
| 13 |
+
"model_type": "distilbert",
|
| 14 |
+
"n_heads": 12,
|
| 15 |
+
"n_layers": 6,
|
| 16 |
+
"output_past": true,
|
| 17 |
+
"pad_token_id": 0,
|
| 18 |
+
"qa_dropout": 0.1,
|
| 19 |
+
"seq_classif_dropout": 0.2,
|
| 20 |
+
"sinusoidal_pos_embds": false,
|
| 21 |
+
"tie_weights_": true,
|
| 22 |
+
"transformers_version": "4.57.1",
|
| 23 |
+
"vocab_size": 119547
|
| 24 |
+
}
|
inference_example.py
ADDED
|
@@ -0,0 +1,265 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Simple inference example for Turnlet BERT Multilingual EOU model
|
| 4 |
+
Demonstrates both PyTorch and ONNX usage
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import argparse
|
| 8 |
+
import numpy as np
|
| 9 |
+
|
| 10 |
+
def test_pytorch(text, threshold=0.86):
|
| 11 |
+
"""Test using PyTorch model"""
|
| 12 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 13 |
+
import torch
|
| 14 |
+
|
| 15 |
+
print("🔥 Loading PyTorch model...")
|
| 16 |
+
model = AutoModelForSequenceClassification.from_pretrained(".")
|
| 17 |
+
tokenizer = AutoTokenizer.from_pretrained(".")
|
| 18 |
+
model.eval()
|
| 19 |
+
|
| 20 |
+
print(f"\n📝 Input: {text}")
|
| 21 |
+
|
| 22 |
+
# Tokenize and predict
|
| 23 |
+
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
|
| 24 |
+
|
| 25 |
+
with torch.no_grad():
|
| 26 |
+
outputs = model(**inputs)
|
| 27 |
+
probs = torch.softmax(outputs.logits, dim=-1)
|
| 28 |
+
|
| 29 |
+
prob_eou = probs[0][1].item()
|
| 30 |
+
is_eou = prob_eou > threshold
|
| 31 |
+
|
| 32 |
+
print(f"✅ EOU Probability: {prob_eou:.4f}")
|
| 33 |
+
print(f"🎯 Prediction: {'EOU (End of Utterance)' if is_eou else 'Non-EOU (Incomplete)'}")
|
| 34 |
+
print(f"📊 Threshold: {threshold}")
|
| 35 |
+
|
| 36 |
+
return is_eou, prob_eou
|
| 37 |
+
|
| 38 |
+
def test_onnx(text, model_path="bert_model_optimized_dynamic_int8.onnx", threshold=0.86):
|
| 39 |
+
"""Test using ONNX quantized model (faster)"""
|
| 40 |
+
import onnxruntime as ort
|
| 41 |
+
from transformers import AutoTokenizer
|
| 42 |
+
|
| 43 |
+
print("⚡ Loading ONNX Quantized INT8 model...")
|
| 44 |
+
|
| 45 |
+
# Load tokenizer and model
|
| 46 |
+
tokenizer = AutoTokenizer.from_pretrained(".")
|
| 47 |
+
session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
|
| 48 |
+
|
| 49 |
+
print(f"\n📝 Input: {text}")
|
| 50 |
+
|
| 51 |
+
# Tokenize
|
| 52 |
+
inputs = tokenizer(text, padding="max_length", max_length=128, truncation=True, return_tensors="np")
|
| 53 |
+
|
| 54 |
+
# Prepare ONNX inputs
|
| 55 |
+
ort_inputs = {
|
| 56 |
+
'input_ids': inputs['input_ids'].astype(np.int64),
|
| 57 |
+
'attention_mask': inputs['attention_mask'].astype(np.int64)
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
# Run inference
|
| 61 |
+
import time
|
| 62 |
+
start = time.time()
|
| 63 |
+
outputs = session.run(None, ort_inputs)
|
| 64 |
+
inference_time = (time.time() - start) * 1000
|
| 65 |
+
|
| 66 |
+
logits = outputs[0][0]
|
| 67 |
+
probs = np.exp(logits) / np.sum(np.exp(logits))
|
| 68 |
+
prob_eou = probs[1]
|
| 69 |
+
is_eou = prob_eou > threshold
|
| 70 |
+
|
| 71 |
+
print(f"✅ EOU Probability: {prob_eou:.4f}")
|
| 72 |
+
print(f"🎯 Prediction: {'EOU (End of Utterance)' if is_eou else 'Non-EOU (Incomplete)'}")
|
| 73 |
+
print(f"📊 Threshold: {threshold}")
|
| 74 |
+
print(f"⚡ Inference Time: {inference_time:.2f}ms")
|
| 75 |
+
|
| 76 |
+
return is_eou, prob_eou
|
| 77 |
+
|
| 78 |
+
def test_multiple_examples(use_onnx=True):
|
| 79 |
+
"""Test multiple examples in different languages"""
|
| 80 |
+
examples = [
|
| 81 |
+
("Thanks for your help!", "en", True),
|
| 82 |
+
("I need a train to Cambridge.", "en", True),
|
| 83 |
+
("What time does the", "en", False),
|
| 84 |
+
("धन्यवाद!", "hi", True), # Hindi: "Thank you!"
|
| 85 |
+
("मुझे मदद चाहिए", "hi", False), # Hindi: "I need help" (incomplete)
|
| 86 |
+
("¡Gracias por tu ayuda!", "es", True), # Spanish: "Thanks for your help!"
|
| 87 |
+
("Necesito un tren a", "es", False), # Spanish: "I need a train to" (incomplete)
|
| 88 |
+
]
|
| 89 |
+
|
| 90 |
+
print("\n" + "="*70)
|
| 91 |
+
print("🌐 MULTILINGUAL EOU DETECTION TEST")
|
| 92 |
+
print("="*70)
|
| 93 |
+
|
| 94 |
+
correct = 0
|
| 95 |
+
total = len(examples)
|
| 96 |
+
|
| 97 |
+
for text, lang, expected_eou in examples:
|
| 98 |
+
print(f"\n{'─'*70}")
|
| 99 |
+
print(f"🌍 Language: {lang.upper()}")
|
| 100 |
+
|
| 101 |
+
if use_onnx:
|
| 102 |
+
is_eou, prob = test_onnx(text, threshold=0.86)
|
| 103 |
+
else:
|
| 104 |
+
is_eou, prob = test_pytorch(text, threshold=0.86)
|
| 105 |
+
|
| 106 |
+
expected_str = "EOU" if expected_eou else "Non-EOU"
|
| 107 |
+
predicted_str = "EOU" if is_eou else "Non-EOU"
|
| 108 |
+
|
| 109 |
+
is_correct = is_eou == expected_eou
|
| 110 |
+
correct += is_correct
|
| 111 |
+
|
| 112 |
+
status = "✅ CORRECT" if is_correct else "❌ INCORRECT"
|
| 113 |
+
print(f"💡 Expected: {expected_str} | Got: {predicted_str} | {status}")
|
| 114 |
+
|
| 115 |
+
print(f"\n{'='*70}")
|
| 116 |
+
print(f"📊 ACCURACY: {correct}/{total} ({correct/total*100:.1f}%)")
|
| 117 |
+
print(f"{'='*70}\n")
|
| 118 |
+
|
| 119 |
+
def interactive_mode(use_onnx=True, threshold=0.86):
|
| 120 |
+
"""Interactive mode - continuously ask for input and predict"""
|
| 121 |
+
import onnxruntime as ort
|
| 122 |
+
from transformers import AutoTokenizer
|
| 123 |
+
import time
|
| 124 |
+
|
| 125 |
+
print("\n" + "="*70)
|
| 126 |
+
print("🎮 INTERACTIVE MODE - Multilingual EOU Detection")
|
| 127 |
+
print("="*70)
|
| 128 |
+
print("🌐 Supported languages: English, Hindi, Spanish")
|
| 129 |
+
print("📊 Threshold: {:.2f}".format(threshold))
|
| 130 |
+
|
| 131 |
+
if use_onnx:
|
| 132 |
+
print("⚡ Using: ONNX Quantized INT8 model (fast)")
|
| 133 |
+
tokenizer = AutoTokenizer.from_pretrained(".")
|
| 134 |
+
session = ort.InferenceSession("bert_model_optimized_dynamic_int8.onnx",
|
| 135 |
+
providers=['CPUExecutionProvider'])
|
| 136 |
+
else:
|
| 137 |
+
print("🔥 Using: PyTorch model")
|
| 138 |
+
from transformers import AutoModelForSequenceClassification
|
| 139 |
+
import torch
|
| 140 |
+
tokenizer = AutoTokenizer.from_pretrained(".")
|
| 141 |
+
model = AutoModelForSequenceClassification.from_pretrained(".")
|
| 142 |
+
model.eval()
|
| 143 |
+
|
| 144 |
+
print("\n💡 Type your text and press Enter to get EOU prediction")
|
| 145 |
+
print("💡 Type 'quit' or 'exit' to stop")
|
| 146 |
+
print("💡 Type 'examples' to see sample inputs")
|
| 147 |
+
print("="*70 + "\n")
|
| 148 |
+
|
| 149 |
+
sample_count = 0
|
| 150 |
+
|
| 151 |
+
while True:
|
| 152 |
+
try:
|
| 153 |
+
# Get user input
|
| 154 |
+
user_input = input("📝 Enter text: ").strip()
|
| 155 |
+
|
| 156 |
+
if not user_input:
|
| 157 |
+
continue
|
| 158 |
+
|
| 159 |
+
# Check for exit commands
|
| 160 |
+
if user_input.lower() in ['quit', 'exit', 'q']:
|
| 161 |
+
print("\n👋 Goodbye! Tested {} samples.".format(sample_count))
|
| 162 |
+
break
|
| 163 |
+
|
| 164 |
+
# Show examples
|
| 165 |
+
if user_input.lower() == 'examples':
|
| 166 |
+
print("\n📚 Example inputs to try:")
|
| 167 |
+
print(" English:")
|
| 168 |
+
print(" - 'Thanks for your help!' (EOU)")
|
| 169 |
+
print(" - 'I need to book a' (Non-EOU)")
|
| 170 |
+
print(" Hindi:")
|
| 171 |
+
print(" - 'धन्यवाद!' (Thank you! - EOU)")
|
| 172 |
+
print(" - 'मुझे मदद चाहिए' (I need help - could be EOU)")
|
| 173 |
+
print(" Spanish:")
|
| 174 |
+
print(" - '¡Muchas gracias!' (Thank you! - EOU)")
|
| 175 |
+
print(" - 'Necesito un tren a' (I need a train to - Non-EOU)")
|
| 176 |
+
print()
|
| 177 |
+
continue
|
| 178 |
+
|
| 179 |
+
sample_count += 1
|
| 180 |
+
print()
|
| 181 |
+
|
| 182 |
+
# Tokenize
|
| 183 |
+
inputs = tokenizer(user_input, padding="max_length", max_length=128,
|
| 184 |
+
truncation=True, return_tensors="np" if use_onnx else "pt")
|
| 185 |
+
|
| 186 |
+
# Predict
|
| 187 |
+
start = time.time()
|
| 188 |
+
|
| 189 |
+
if use_onnx:
|
| 190 |
+
# ONNX inference
|
| 191 |
+
ort_inputs = {
|
| 192 |
+
'input_ids': inputs['input_ids'].astype(np.int64),
|
| 193 |
+
'attention_mask': inputs['attention_mask'].astype(np.int64)
|
| 194 |
+
}
|
| 195 |
+
outputs = session.run(None, ort_inputs)
|
| 196 |
+
logits = outputs[0][0]
|
| 197 |
+
probs = np.exp(logits) / np.sum(np.exp(logits))
|
| 198 |
+
prob_eou = probs[1]
|
| 199 |
+
else:
|
| 200 |
+
# PyTorch inference
|
| 201 |
+
import torch
|
| 202 |
+
with torch.no_grad():
|
| 203 |
+
outputs = model(**inputs)
|
| 204 |
+
probs = torch.softmax(outputs.logits, dim=-1)
|
| 205 |
+
prob_eou = probs[0][1].item()
|
| 206 |
+
|
| 207 |
+
inference_time = (time.time() - start) * 1000
|
| 208 |
+
|
| 209 |
+
# Determine prediction
|
| 210 |
+
is_eou = prob_eou > threshold
|
| 211 |
+
|
| 212 |
+
# Display results with color coding
|
| 213 |
+
print("─" * 70)
|
| 214 |
+
if is_eou:
|
| 215 |
+
print("✅ Prediction: EOU (End of Utterance)")
|
| 216 |
+
print(" └─ The user has likely finished their thought")
|
| 217 |
+
else:
|
| 218 |
+
print("⏳ Prediction: Non-EOU (Incomplete)")
|
| 219 |
+
print(" └─ The user may still be speaking")
|
| 220 |
+
|
| 221 |
+
print(f"📊 Confidence: {prob_eou:.4f} (threshold: {threshold})")
|
| 222 |
+
print(f"⚡ Inference time: {inference_time:.2f}ms")
|
| 223 |
+
|
| 224 |
+
# Confidence bar
|
| 225 |
+
bar_length = 40
|
| 226 |
+
filled = int(bar_length * prob_eou)
|
| 227 |
+
bar = "█" * filled + "░" * (bar_length - filled)
|
| 228 |
+
print(f"📈 [{bar}] {prob_eou*100:.1f}%")
|
| 229 |
+
print("─" * 70 + "\n")
|
| 230 |
+
|
| 231 |
+
except KeyboardInterrupt:
|
| 232 |
+
print("\n\n👋 Interrupted! Tested {} samples. Goodbye!".format(sample_count))
|
| 233 |
+
break
|
| 234 |
+
except Exception as e:
|
| 235 |
+
print(f"❌ Error: {e}\n")
|
| 236 |
+
continue
|
| 237 |
+
|
| 238 |
+
def main():
|
| 239 |
+
parser = argparse.ArgumentParser(description="Test Turnlet BERT Multilingual EOU model")
|
| 240 |
+
parser.add_argument("--text", type=str, help="Text to classify")
|
| 241 |
+
parser.add_argument("--threshold", type=float, default=0.86, help="EOU threshold (default: 0.86)")
|
| 242 |
+
parser.add_argument("--pytorch", action="store_true", help="Use PyTorch instead of ONNX")
|
| 243 |
+
parser.add_argument("--test-suite", action="store_true", help="Run full test suite")
|
| 244 |
+
parser.add_argument("--interactive", "-i", action="store_true", help="Run in interactive mode")
|
| 245 |
+
|
| 246 |
+
args = parser.parse_args()
|
| 247 |
+
|
| 248 |
+
if args.interactive:
|
| 249 |
+
interactive_mode(use_onnx=not args.pytorch, threshold=args.threshold)
|
| 250 |
+
elif args.test_suite:
|
| 251 |
+
test_multiple_examples(use_onnx=not args.pytorch)
|
| 252 |
+
elif args.text:
|
| 253 |
+
if args.pytorch:
|
| 254 |
+
test_pytorch(args.text, args.threshold)
|
| 255 |
+
else:
|
| 256 |
+
test_onnx(args.text, threshold=args.threshold)
|
| 257 |
+
else:
|
| 258 |
+
# Default to interactive mode if no arguments provided
|
| 259 |
+
print("No arguments provided. Starting interactive mode...")
|
| 260 |
+
print("(Use --help to see all options)\n")
|
| 261 |
+
interactive_mode(use_onnx=True, threshold=args.threshold)
|
| 262 |
+
|
| 263 |
+
if __name__ == "__main__":
|
| 264 |
+
main()
|
| 265 |
+
|
metrics.yaml
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
epoch: 8
|
| 2 |
+
external:
|
| 3 |
+
turns2k:
|
| 4 |
+
accuracy: 0.911
|
| 5 |
+
f1: 0.9149952244508118
|
| 6 |
+
precision: 0.9795501022494888
|
| 7 |
+
recall: 0.8584229390681004
|
| 8 |
+
step: 60500
|
| 9 |
+
thresholds:
|
| 10 |
+
turns2k: 0.86
|
| 11 |
+
thresholds_met:
|
| 12 |
+
turns2k: true
|
| 13 |
+
validation:
|
| 14 |
+
accuracy: 0.964266049994494
|
| 15 |
+
en_accuracy: 0.9701070242342231
|
| 16 |
+
en_samples: 16258
|
| 17 |
+
es_accuracy: 0.9452467662941103
|
| 18 |
+
es_samples: 7963
|
| 19 |
+
f1: 0.9634921527816842
|
| 20 |
+
hi_accuracy: 0.968933322316781
|
| 21 |
+
hi_samples: 12103
|
| 22 |
+
precision: 0.9491300011082788
|
| 23 |
+
recall: 0.9782956362805575
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0d4b6dff583e55fa1ac04e5877b826d09a671fa9108d866e3cb30f1ba0b619c9
|
| 3 |
+
size 541317368
|
model_card.json
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_name": "Turnlet BERT Multilingual EOU",
|
| 3 |
+
"model_type": "DistilBERT",
|
| 4 |
+
"task": "text-classification",
|
| 5 |
+
"languages": ["en", "hi", "es"],
|
| 6 |
+
"tags": [
|
| 7 |
+
"end-of-utterance",
|
| 8 |
+
"eou-detection",
|
| 9 |
+
"multilingual",
|
| 10 |
+
"distilbert",
|
| 11 |
+
"onnx",
|
| 12 |
+
"quantized",
|
| 13 |
+
"conversational-ai",
|
| 14 |
+
"dialogue",
|
| 15 |
+
"turn-taking"
|
| 16 |
+
],
|
| 17 |
+
"license": "apache-2.0",
|
| 18 |
+
"datasets": ["turns-2k"],
|
| 19 |
+
"metrics": {
|
| 20 |
+
"validation": {
|
| 21 |
+
"overall_accuracy": 0.9643,
|
| 22 |
+
"en_accuracy": 0.9701,
|
| 23 |
+
"hi_accuracy": 0.9689,
|
| 24 |
+
"es_accuracy": 0.9452,
|
| 25 |
+
"f1_score": 0.9635,
|
| 26 |
+
"precision": 0.9491,
|
| 27 |
+
"recall": 0.9783
|
| 28 |
+
},
|
| 29 |
+
"turns2k": {
|
| 30 |
+
"accuracy": 0.9110,
|
| 31 |
+
"f1_score": 0.9150,
|
| 32 |
+
"precision": 0.9796,
|
| 33 |
+
"recall": 0.8584,
|
| 34 |
+
"threshold": 0.86
|
| 35 |
+
}
|
| 36 |
+
},
|
| 37 |
+
"model_variants": {
|
| 38 |
+
"pytorch": {
|
| 39 |
+
"file": "model.safetensors",
|
| 40 |
+
"size_mb": 517,
|
| 41 |
+
"format": "safetensors"
|
| 42 |
+
},
|
| 43 |
+
"onnx_optimized": {
|
| 44 |
+
"file": "bert_model_optimized.onnx",
|
| 45 |
+
"size_mb": 517,
|
| 46 |
+
"format": "onnx",
|
| 47 |
+
"precision": "fp32"
|
| 48 |
+
},
|
| 49 |
+
"onnx_quantized": {
|
| 50 |
+
"file": "bert_model_optimized_dynamic_int8.onnx",
|
| 51 |
+
"size_mb": 132,
|
| 52 |
+
"format": "onnx",
|
| 53 |
+
"precision": "int8",
|
| 54 |
+
"recommended": true
|
| 55 |
+
}
|
| 56 |
+
},
|
| 57 |
+
"training": {
|
| 58 |
+
"method": "knowledge_distillation",
|
| 59 |
+
"teacher_model": "qwen-based",
|
| 60 |
+
"student_model": "distilbert",
|
| 61 |
+
"epochs": 8,
|
| 62 |
+
"final_step": 60500,
|
| 63 |
+
"max_length": 128
|
| 64 |
+
},
|
| 65 |
+
"inference": {
|
| 66 |
+
"recommended_threshold": 0.86,
|
| 67 |
+
"max_sequence_length": 128,
|
| 68 |
+
"batch_size_support": true
|
| 69 |
+
}
|
| 70 |
+
}
|
| 71 |
+
|
requirements.txt
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
transformers>=4.30.0
|
| 2 |
+
torch>=2.0.0
|
| 3 |
+
onnxruntime>=1.15.0
|
| 4 |
+
numpy>=1.24.0
|
| 5 |
+
safetensors>=0.3.0
|
| 6 |
+
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cls_token": "[CLS]",
|
| 3 |
+
"mask_token": "[MASK]",
|
| 4 |
+
"pad_token": "[PAD]",
|
| 5 |
+
"sep_token": "[SEP]",
|
| 6 |
+
"unk_token": "[UNK]"
|
| 7 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"added_tokens_decoder": {
|
| 3 |
+
"0": {
|
| 4 |
+
"content": "[PAD]",
|
| 5 |
+
"lstrip": false,
|
| 6 |
+
"normalized": false,
|
| 7 |
+
"rstrip": false,
|
| 8 |
+
"single_word": false,
|
| 9 |
+
"special": true
|
| 10 |
+
},
|
| 11 |
+
"100": {
|
| 12 |
+
"content": "[UNK]",
|
| 13 |
+
"lstrip": false,
|
| 14 |
+
"normalized": false,
|
| 15 |
+
"rstrip": false,
|
| 16 |
+
"single_word": false,
|
| 17 |
+
"special": true
|
| 18 |
+
},
|
| 19 |
+
"101": {
|
| 20 |
+
"content": "[CLS]",
|
| 21 |
+
"lstrip": false,
|
| 22 |
+
"normalized": false,
|
| 23 |
+
"rstrip": false,
|
| 24 |
+
"single_word": false,
|
| 25 |
+
"special": true
|
| 26 |
+
},
|
| 27 |
+
"102": {
|
| 28 |
+
"content": "[SEP]",
|
| 29 |
+
"lstrip": false,
|
| 30 |
+
"normalized": false,
|
| 31 |
+
"rstrip": false,
|
| 32 |
+
"single_word": false,
|
| 33 |
+
"special": true
|
| 34 |
+
},
|
| 35 |
+
"103": {
|
| 36 |
+
"content": "[MASK]",
|
| 37 |
+
"lstrip": false,
|
| 38 |
+
"normalized": false,
|
| 39 |
+
"rstrip": false,
|
| 40 |
+
"single_word": false,
|
| 41 |
+
"special": true
|
| 42 |
+
}
|
| 43 |
+
},
|
| 44 |
+
"clean_up_tokenization_spaces": false,
|
| 45 |
+
"cls_token": "[CLS]",
|
| 46 |
+
"do_lower_case": false,
|
| 47 |
+
"extra_special_tokens": {},
|
| 48 |
+
"mask_token": "[MASK]",
|
| 49 |
+
"model_max_length": 512,
|
| 50 |
+
"pad_token": "[PAD]",
|
| 51 |
+
"sep_token": "[SEP]",
|
| 52 |
+
"strip_accents": null,
|
| 53 |
+
"tokenize_chinese_chars": true,
|
| 54 |
+
"tokenizer_class": "DistilBertTokenizer",
|
| 55 |
+
"unk_token": "[UNK]"
|
| 56 |
+
}
|
vocab.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|