File size: 7,104 Bytes
f70597d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e5cc772
f70597d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e5cc772
f70597d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e5cc772
 
 
 
f70597d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e5cc772
 
f70597d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
# Turnlet BERT Multilingual - End-of-Utterance Detection

A lightweight, multilingual DistilBERT model fine-tuned for End-of-Utterance (EOU) detection in conversational AI systems. This model supports **English, Hindi, and Spanish** with high accuracy and fast inference.

## Model Description

- **Architecture**: DistilBERT (6 layers, 768 hidden dimensions)
- **Parameters**: ~67M parameters (DistilBERT base)
- **Languages**: English, Hindi, Spanish
- **Task**: Binary sequence classification (EOU vs Non-EOU)
- **Training**: Knowledge distillation from teacher model
- **Model Size**: 
  - PyTorch (safetensors): 517 MB
  - ONNX (optimized FP32): 517 MB
  - ONNX (quantized INT8): 132 MB (74% size reduction)

## Performance Metrics

### Validation Set Performance (Step 60500)

| Language | Accuracy | Samples |
|----------|----------|---------|
| **English** | 97.01% | 16,258 |
| **Hindi** | 96.89% | 12,103 |
| **Spanish** | 94.52% | 7,963 |
| **Overall** | 96.43% | 36,324 |

**Validation Metrics:**
- F1 Score: 0.9635
- Precision: 0.9491
- Recall: 0.9783

### TURNS-2K Benchmark

- **Accuracy**: 91.10%
- **F1 Score**: 0.9150
- **Precision**: 0.9796
- **Recall**: 0.8584

## Model Variants

This repository includes three model formats:

1. **PyTorch (safetensors)**: `model.safetensors` - Full precision PyTorch model
2. **ONNX Optimized (FP32)**: `bert_model_optimized.onnx` - Optimized for inference, full precision
3. **ONNX Quantized (INT8)**: `bert_model_optimized_dynamic_int8.onnx` - **Recommended** for production

### Why Use the Quantized INT8 Model?

- โœ… **74% smaller** (132 MB vs 517 MB)
- โœ… **Faster inference** on CPU
- โœ… **Minimal accuracy loss** (<0.5%)
- โœ… **Lower memory footprint**
- โœ… **Better for deployment**

## Quick Start

### Interactive Demo (Easiest Way)

```bash
# Clone the model repository
git clone https://huggingface.co/your-username/turnlet-bert-multilingual-eou
cd turnlet-bert-multilingual-eou

# Install dependencies
pip install -r requirements.txt

# Run interactive mode (default - uses fast ONNX INT8)
python inference_example.py

# Or explicitly use interactive mode
python inference_example.py --interactive

# Use PyTorch instead of ONNX
python inference_example.py --interactive --pytorch

# Adjust threshold
python inference_example.py --interactive --threshold 0.9
```

The interactive mode allows you to:
- ๐ŸŽฎ Type text and get instant EOU predictions
- ๐ŸŒ Test in English, Hindi, or Spanish
- ๐Ÿ“Š See confidence scores and inference times
- ๐Ÿ“ˆ View visual confidence bars
- ๐Ÿ’ก Type 'examples' to see sample inputs
- ๐Ÿšช Type 'quit' or 'exit' to stop

### One-off Prediction

```bash
# Single prediction with ONNX (fast)
python inference_example.py --text "Thanks for your help!"

# Test suite with multiple examples
python inference_example.py --test-suite
```

### Using PyTorch (in Python)

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("your-username/turnlet-bert-multilingual-eou")
tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")

# Predict
text = "Thanks for your help!"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
is_eou = probs[0][1] > 0.5  # Using optimal threshold

print(f"EOU Probability: {probs[0][1]:.3f}")
print(f"Is EOU: {is_eou}")
```

### Using ONNX (Quantized INT8) - Recommended for Production

```python
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")

# Create ONNX session
session = ort.InferenceSession("bert_model_optimized_dynamic_int8.onnx")

# Tokenize
text = "Thanks for your help!"
inputs = tokenizer(text, padding="max_length", max_length=128, truncation=True, return_tensors="np")

# Prepare ONNX inputs
ort_inputs = {
    'input_ids': inputs['input_ids'].astype(np.int64),
    'attention_mask': inputs['attention_mask'].astype(np.int64)
}

# Run inference
outputs = session.run(None, ort_inputs)
logits = outputs[0][0]

# Calculate probability
probs = np.exp(logits) / np.sum(np.exp(logits))
is_eou = probs[1] > 0.5 # Using optimal threshold

print(f"EOU Probability: {probs[1]:.3f}")
print(f"Is EOU: {is_eou}")
```

## Use Cases

This model is designed for:

- ๐Ÿ—ฃ๏ธ **Voice Assistants**: Detect when user has finished speaking
- ๐Ÿ’ฌ **Chatbots**: Identify complete user intents
- ๐Ÿ“ž **Call Centers**: Segment customer utterances in real-time
- ๐ŸŒ **Multilingual Applications**: Support English, Hindi, and Spanish speakers
- โšก **Real-time Systems**: Fast inference with quantized model

## Training Details

### Training Data

The model was trained using knowledge distillation on a multilingual dataset:

- **English**: 76,258 samples
- **Hindi**: 75,103 samples  
- **Spanish**: 75,963 samples
- **Total**: ~211K samples

### Training Configuration

- **Base Model**: DistilBERT multilingual
- **Method**: Knowledge distillation from Qwen-based teacher model
- **Epochs**: 8
- **Final Step**: 60,500
- **Optimization**: AdamW optimizer
- **Max Sequence Length**: 128 tokens

### Distillation Process

The model was created using sparse Mixture-of-Experts (MoE) based knowledge distillation:
1. Teacher model (Qwen-based) provides soft labels
2. Student model (DistilBERT) learns to mimic teacher predictions
3. Multi-stage training with progressive difficulty
4. Language-specific accuracy monitoring

## Evaluation

The model was evaluated on:

1. **Validation Set**: Balanced multilingual dataset
2. **TURNS-2K**: Standard benchmark for turn-taking detection
3. **Per-Language Metrics**: Individual language performance tracking

### Inference Speed

Approximate inference times (CPU, single sample):
- ONNX Optimized: ~70-120ms
- ONNX Quantized INT8: ~40-50ms

*Note: Actual speeds vary by hardware*

## Limitations

- Model performance is slightly lower on Spanish compared to English and Hindi
- Optimal threshold (0.86) may need adjustment for specific use cases
- Maximum sequence length is 128 tokens (longer texts will be truncated)
- Best performance on conversational, task-oriented dialogue
- May require fine-tuning for domain-specific applications

## Citation

If you use this model in your research or applications, please cite:

```bibtex
@model{turnlet-bert-multilingual-eou,
  title={Turnlet BERT Multilingual: End-of-Utterance Detection},
  author={Your Name},
  year={2024},
  publisher={Hugging Face},
  note={Knowledge-distilled DistilBERT for multilingual EOU detection}
}
```

## License

Please specify your license here (e.g., Apache 2.0, MIT, etc.)

## Model Card Contact

For questions or feedback, please open an issue in the repository.

---

**Model Version**: Step 60500  
**Last Updated**: November 2024  
**Framework**: PyTorch, ONNX Runtime  
**Languages**: English (en), Hindi (hi), Spanish (es)