Estonel commited on
Commit
f70597d
·
verified ·
1 Parent(s): 664d3d2

Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants

Browse files
.gitattributes CHANGED
@@ -1,35 +1,6 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
  *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
  *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ *.onnx filter=lfs diff=lfs merge=lfs -text
2
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
 
 
 
 
4
  *.h5 filter=lfs diff=lfs merge=lfs -text
 
 
 
 
5
  *.msgpack filter=lfs diff=lfs merge=lfs -text
6
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md ADDED
@@ -0,0 +1,247 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Turnlet BERT Multilingual - End-of-Utterance Detection
2
+
3
+ A lightweight, multilingual DistilBERT model fine-tuned for End-of-Utterance (EOU) detection in conversational AI systems. This model supports **English, Hindi, and Spanish** with high accuracy and fast inference.
4
+
5
+ ## Model Description
6
+
7
+ - **Architecture**: DistilBERT (6 layers, 768 hidden dimensions)
8
+ - **Parameters**: ~67M parameters (DistilBERT base)
9
+ - **Languages**: English, Hindi, Spanish
10
+ - **Task**: Binary sequence classification (EOU vs Non-EOU)
11
+ - **Training**: Knowledge distillation from teacher model
12
+ - **Model Size**:
13
+ - PyTorch (safetensors): 517 MB
14
+ - ONNX (optimized FP32): 517 MB
15
+ - ONNX (quantized INT8): 132 MB (74% size reduction)
16
+
17
+ ## Performance Metrics
18
+
19
+ ### Validation Set Performance (Step 60500)
20
+
21
+ | Language | Accuracy | Samples |
22
+ |----------|----------|---------|
23
+ | **English** | 97.01% | 16,258 |
24
+ | **Hindi** | 96.89% | 12,103 |
25
+ | **Spanish** | 94.52% | 7,963 |
26
+ | **Overall** | 96.43% | 36,324 |
27
+
28
+ **Validation Metrics:**
29
+ - F1 Score: 0.9635
30
+ - Precision: 0.9491
31
+ - Recall: 0.9783
32
+
33
+ ### TURNS-2K Benchmark
34
+
35
+ - **Accuracy**: 91.10%
36
+ - **F1 Score**: 0.9150
37
+ - **Precision**: 0.9796
38
+ - **Recall**: 0.8584
39
+ - **Optimal Threshold**: 0.86
40
+
41
+ ## Model Variants
42
+
43
+ This repository includes three model formats:
44
+
45
+ 1. **PyTorch (safetensors)**: `model.safetensors` - Full precision PyTorch model
46
+ 2. **ONNX Optimized (FP32)**: `bert_model_optimized.onnx` - Optimized for inference, full precision
47
+ 3. **ONNX Quantized (INT8)**: `bert_model_optimized_dynamic_int8.onnx` - **Recommended** for production
48
+
49
+ ### Why Use the Quantized INT8 Model?
50
+
51
+ - ✅ **74% smaller** (132 MB vs 517 MB)
52
+ - ✅ **Faster inference** on CPU
53
+ - ✅ **Minimal accuracy loss** (<0.5%)
54
+ - ✅ **Lower memory footprint**
55
+ - ✅ **Better for deployment**
56
+
57
+ ## Quick Start
58
+
59
+ ### Interactive Demo (Easiest Way)
60
+
61
+ ```bash
62
+ # Clone the model repository
63
+ git clone https://huggingface.co/your-username/turnlet-bert-multilingual-eou
64
+ cd turnlet-bert-multilingual-eou
65
+
66
+ # Install dependencies
67
+ pip install -r requirements.txt
68
+
69
+ # Run interactive mode (default - uses fast ONNX INT8)
70
+ python inference_example.py
71
+
72
+ # Or explicitly use interactive mode
73
+ python inference_example.py --interactive
74
+
75
+ # Use PyTorch instead of ONNX
76
+ python inference_example.py --interactive --pytorch
77
+
78
+ # Adjust threshold
79
+ python inference_example.py --interactive --threshold 0.9
80
+ ```
81
+
82
+ The interactive mode allows you to:
83
+ - 🎮 Type text and get instant EOU predictions
84
+ - 🌐 Test in English, Hindi, or Spanish
85
+ - 📊 See confidence scores and inference times
86
+ - 📈 View visual confidence bars
87
+ - 💡 Type 'examples' to see sample inputs
88
+ - 🚪 Type 'quit' or 'exit' to stop
89
+
90
+ ### One-off Prediction
91
+
92
+ ```bash
93
+ # Single prediction with ONNX (fast)
94
+ python inference_example.py --text "Thanks for your help!"
95
+
96
+ # Test suite with multiple examples
97
+ python inference_example.py --test-suite
98
+ ```
99
+
100
+ ### Using PyTorch (in Python)
101
+
102
+ ```python
103
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
104
+ import torch
105
+
106
+ # Load model and tokenizer
107
+ model = AutoModelForSequenceClassification.from_pretrained("your-username/turnlet-bert-multilingual-eou")
108
+ tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")
109
+
110
+ # Predict
111
+ text = "Thanks for your help!"
112
+ inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
113
+ outputs = model(**inputs)
114
+ probs = torch.softmax(outputs.logits, dim=-1)
115
+ is_eou = probs[0][1] > 0.86 # Using optimal threshold
116
+
117
+ print(f"EOU Probability: {probs[0][1]:.3f}")
118
+ print(f"Is EOU: {is_eou}")
119
+ ```
120
+
121
+ ### Using ONNX (Quantized INT8) - Recommended for Production
122
+
123
+ ```python
124
+ import onnxruntime as ort
125
+ import numpy as np
126
+ from transformers import AutoTokenizer
127
+
128
+ # Load tokenizer
129
+ tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")
130
+
131
+ # Create ONNX session
132
+ session = ort.InferenceSession("bert_model_optimized_dynamic_int8.onnx")
133
+
134
+ # Tokenize
135
+ text = "Thanks for your help!"
136
+ inputs = tokenizer(text, padding="max_length", max_length=128, truncation=True, return_tensors="np")
137
+
138
+ # Prepare ONNX inputs
139
+ ort_inputs = {
140
+ 'input_ids': inputs['input_ids'].astype(np.int64),
141
+ 'attention_mask': inputs['attention_mask'].astype(np.int64)
142
+ }
143
+
144
+ # Run inference
145
+ outputs = session.run(None, ort_inputs)
146
+ logits = outputs[0][0]
147
+
148
+ # Calculate probability
149
+ probs = np.exp(logits) / np.sum(np.exp(logits))
150
+ is_eou = probs[1] > 0.86 # Using optimal threshold
151
+
152
+ print(f"EOU Probability: {probs[1]:.3f}")
153
+ print(f"Is EOU: {is_eou}")
154
+ ```
155
+
156
+ ## Use Cases
157
+
158
+ This model is designed for:
159
+
160
+ - 🗣️ **Voice Assistants**: Detect when user has finished speaking
161
+ - 💬 **Chatbots**: Identify complete user intents
162
+ - 📞 **Call Centers**: Segment customer utterances in real-time
163
+ - 🌐 **Multilingual Applications**: Support English, Hindi, and Spanish speakers
164
+ - ⚡ **Real-time Systems**: Fast inference with quantized model
165
+
166
+ ## Training Details
167
+
168
+ ### Training Data
169
+
170
+ The model was trained using knowledge distillation on a multilingual dataset:
171
+
172
+ - **English**: 16,258 samples
173
+ - **Hindi**: 12,103 samples
174
+ - **Spanish**: 7,963 samples
175
+ - **Total**: ~36K samples
176
+
177
+ ### Training Configuration
178
+
179
+ - **Base Model**: DistilBERT multilingual
180
+ - **Method**: Knowledge distillation from Qwen-based teacher model
181
+ - **Epochs**: 8
182
+ - **Final Step**: 60,500
183
+ - **Optimization**: AdamW optimizer
184
+ - **Max Sequence Length**: 128 tokens
185
+
186
+ ### Distillation Process
187
+
188
+ The model was created using sparse Mixture-of-Experts (MoE) based knowledge distillation:
189
+ 1. Teacher model (Qwen-based) provides soft labels
190
+ 2. Student model (DistilBERT) learns to mimic teacher predictions
191
+ 3. Multi-stage training with progressive difficulty
192
+ 4. Language-specific accuracy monitoring
193
+
194
+ ## Evaluation
195
+
196
+ The model was evaluated on:
197
+
198
+ 1. **Validation Set**: Balanced multilingual dataset
199
+ 2. **TURNS-2K**: Standard benchmark for turn-taking detection
200
+ 3. **Per-Language Metrics**: Individual language performance tracking
201
+
202
+ ### Inference Speed
203
+
204
+ Approximate inference times (CPU, single sample):
205
+ - PyTorch: ~15-20ms
206
+ - ONNX Optimized: ~8-12ms
207
+ - ONNX Quantized INT8: ~5-8ms
208
+
209
+ *Note: Actual speeds vary by hardware*
210
+
211
+ ## Limitations
212
+
213
+ - Model performance is slightly lower on Spanish compared to English and Hindi
214
+ - Optimal threshold (0.86) may need adjustment for specific use cases
215
+ - Maximum sequence length is 128 tokens (longer texts will be truncated)
216
+ - Best performance on conversational, task-oriented dialogue
217
+ - May require fine-tuning for domain-specific applications
218
+
219
+ ## Citation
220
+
221
+ If you use this model in your research or applications, please cite:
222
+
223
+ ```bibtex
224
+ @model{turnlet-bert-multilingual-eou,
225
+ title={Turnlet BERT Multilingual: End-of-Utterance Detection},
226
+ author={Your Name},
227
+ year={2024},
228
+ publisher={Hugging Face},
229
+ note={Knowledge-distilled DistilBERT for multilingual EOU detection}
230
+ }
231
+ ```
232
+
233
+ ## License
234
+
235
+ Please specify your license here (e.g., Apache 2.0, MIT, etc.)
236
+
237
+ ## Model Card Contact
238
+
239
+ For questions or feedback, please open an issue in the repository.
240
+
241
+ ---
242
+
243
+ **Model Version**: Step 60500
244
+ **Last Updated**: November 2024
245
+ **Framework**: PyTorch, ONNX Runtime
246
+ **Languages**: English (en), Hindi (hi), Spanish (es)
247
+
UPLOAD_GUIDE.md ADDED
@@ -0,0 +1,211 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hugging Face Upload Guide
2
+
3
+ This guide will help you upload the Turnlet BERT Multilingual EOU model to Hugging Face.
4
+
5
+ ## 📦 Package Contents
6
+
7
+ This folder contains everything needed for a complete Hugging Face model repository:
8
+
9
+ ### Model Files
10
+ - **`model.safetensors`** (517 MB) - PyTorch model weights in safetensors format
11
+ - **`bert_model_optimized.onnx`** (517 MB) - Optimized ONNX model (FP32)
12
+ - **`bert_model_optimized_dynamic_int8.onnx`** (132 MB) - ⭐ Quantized ONNX model (INT8, recommended)
13
+
14
+ ### Tokenizer Files
15
+ - **`tokenizer.json`** - Fast tokenizer
16
+ - **`tokenizer_config.json`** - Tokenizer configuration
17
+ - **`vocab.txt`** - Vocabulary file
18
+ - **`special_tokens_map.json`** - Special tokens mapping
19
+
20
+ ### Configuration Files
21
+ - **`config.json`** - Model architecture configuration
22
+ - **`metrics.yaml`** - Training and validation metrics
23
+
24
+ ### Documentation
25
+ - **`README.md`** - Comprehensive model card and documentation
26
+ - **`model_card.json`** - Machine-readable model metadata
27
+ - **`requirements.txt`** - Python dependencies
28
+ - **`.gitattributes`** - Git LFS configuration for large files
29
+
30
+ ### Code Examples
31
+ - **`inference_example.py`** - Interactive demo and usage examples
32
+ - **`UPLOAD_GUIDE.md`** - This file
33
+
34
+ ## 🚀 Upload Steps
35
+
36
+ ### Option 1: Using Hugging Face CLI (Recommended)
37
+
38
+ ```bash
39
+ # Install Hugging Face CLI
40
+ pip install huggingface-hub
41
+
42
+ # Login to Hugging Face
43
+ huggingface-cli login
44
+
45
+ # Navigate to the model folder
46
+ cd /home/ubuntu/hf_upload/turnlet-bert-multilingual-eou
47
+
48
+ # Create repository (replace YOUR_USERNAME with your HF username)
49
+ huggingface-cli repo create turnlet-bert-multilingual-eou --type model
50
+
51
+ # Initialize git and git-lfs
52
+ git init
53
+ git lfs install
54
+ git lfs track "*.onnx"
55
+ git lfs track "*.safetensors"
56
+
57
+ # Add all files
58
+ git add .
59
+
60
+ # Commit
61
+ git commit -m "Initial commit: Turnlet BERT Multilingual EOU model with ONNX variants"
62
+
63
+ # Add remote (replace YOUR_USERNAME)
64
+ git remote add origin https://huggingface.co/YOUR_USERNAME/turnlet-bert-multilingual-eou
65
+
66
+ # Push to Hugging Face
67
+ git push -u origin main
68
+ ```
69
+
70
+ ### Option 2: Using Python API
71
+
72
+ ```python
73
+ from huggingface_hub import HfApi, create_repo
74
+
75
+ # Initialize API
76
+ api = HfApi()
77
+
78
+ # Login (you'll be prompted for token)
79
+ from huggingface_hub import login
80
+ login()
81
+
82
+ # Create repository
83
+ repo_id = "YOUR_USERNAME/turnlet-bert-multilingual-eou"
84
+ create_repo(repo_id, repo_type="model", exist_ok=True)
85
+
86
+ # Upload folder
87
+ api.upload_folder(
88
+ folder_path="/home/ubuntu/hf_upload/turnlet-bert-multilingual-eou",
89
+ repo_id=repo_id,
90
+ repo_type="model",
91
+ )
92
+
93
+ print(f"✅ Model uploaded to: https://huggingface.co/{repo_id}")
94
+ ```
95
+
96
+ ### Option 3: Manual Upload via Web Interface
97
+
98
+ 1. Go to https://huggingface.co/new
99
+ 2. Create a new model repository: `turnlet-bert-multilingual-eou`
100
+ 3. Use the web interface to upload files:
101
+ - Upload large files (`.onnx`, `.safetensors`) via Git LFS
102
+ - Upload smaller files directly via web interface
103
+ 4. Copy the README.md content to the model card
104
+
105
+ ## ⚠️ Important Notes
106
+
107
+ ### Git LFS Required
108
+ The model files are large and require Git LFS (Large File Storage):
109
+ - Make sure Git LFS is installed: `git lfs install`
110
+ - The `.gitattributes` file is already configured
111
+ - Files tracked: `*.onnx`, `*.safetensors`
112
+
113
+ ### File Sizes
114
+ - Total repository size: ~1.2 GB
115
+ - Largest files: ONNX FP32 (517 MB) and PyTorch (517 MB)
116
+ - Recommended for deployment: INT8 ONNX (132 MB)
117
+
118
+ ### Model Naming
119
+ Consider these naming conventions:
120
+ - `YOUR_USERNAME/turnlet-bert-multilingual-eou`
121
+ - `YOUR_ORG/turnlet-eou-detection-multilingual`
122
+ - `YOUR_USERNAME/distilbert-eou-en-hi-es`
123
+
124
+ ### Tags to Add
125
+ When creating the repository, add these tags:
126
+ - `end-of-utterance`
127
+ - `eou-detection`
128
+ - `multilingual`
129
+ - `distilbert`
130
+ - `onnx`
131
+ - `quantized`
132
+ - `conversational-ai`
133
+ - `dialogue`
134
+ - `turn-taking`
135
+ - `text-classification`
136
+
137
+ ## 🧪 Testing After Upload
138
+
139
+ After uploading, test the model:
140
+
141
+ ```python
142
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
143
+
144
+ # Test loading
145
+ model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")
146
+ tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/turnlet-bert-multilingual-eou")
147
+
148
+ # Quick test
149
+ text = "Thanks for your help!"
150
+ inputs = tokenizer(text, return_tensors="pt")
151
+ outputs = model(**inputs)
152
+ print(f"✅ Model loaded and working! Logits: {outputs.logits}")
153
+ ```
154
+
155
+ ## 📝 Post-Upload Checklist
156
+
157
+ After successful upload:
158
+
159
+ - [ ] Verify all files are uploaded
160
+ - [ ] Test model loading via transformers
161
+ - [ ] Test ONNX model download
162
+ - [ ] Update README with correct username/repo paths
163
+ - [ ] Add license information
164
+ - [ ] Add model tags and metadata
165
+ - [ ] Test interactive script
166
+ - [ ] Share on social media/communities
167
+
168
+ ## 🔗 Useful Links
169
+
170
+ - Hugging Face Hub Documentation: https://huggingface.co/docs/hub
171
+ - Git LFS: https://git-lfs.github.com/
172
+ - Model Cards Guide: https://huggingface.co/docs/hub/model-cards
173
+ - ONNX Models: https://huggingface.co/docs/hub/onnx
174
+
175
+ ## 💡 Tips
176
+
177
+ 1. **Use descriptive commit messages** when updating the model
178
+ 2. **Version your models** by creating tags (v1.0, v2.0, etc.)
179
+ 3. **Monitor downloads** via your Hugging Face dashboard
180
+ 4. **Respond to community questions** in the community tab
181
+ 5. **Update metrics** as you improve the model
182
+
183
+ ## 🆘 Troubleshooting
184
+
185
+ ### Git LFS Bandwidth Issues
186
+ If you hit LFS bandwidth limits:
187
+ - Use smaller model variant first
188
+ - Upload during off-peak hours
189
+ - Consider Hugging Face Pro for more bandwidth
190
+
191
+ ### Authentication Issues
192
+ ```bash
193
+ # Re-login
194
+ huggingface-cli login --token YOUR_TOKEN
195
+
196
+ # Or set token as environment variable
197
+ export HUGGING_FACE_HUB_TOKEN=YOUR_TOKEN
198
+ ```
199
+
200
+ ### Large File Upload Timeout
201
+ ```bash
202
+ # Increase timeout
203
+ git config http.postBuffer 524288000
204
+ git config http.lowSpeedLimit 0
205
+ git config http.lowSpeedTime 999999
206
+ ```
207
+
208
+ ## ✅ Ready to Upload!
209
+
210
+ Your model is fully prepared and ready for upload to Hugging Face! 🎉
211
+
bert_model_optimized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f1972ac9ff31da8fcf9d5e4e053caa0a6218c5ae1899cbec14e5da6ab043dc6
3
+ size 541380730
bert_model_optimized_dynamic_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d5084af77f9164892dc3402d5419c7dbc1dfb559333f7ec141248a5f49e1591
3
+ size 137635060
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "initializer_range": 0.02,
12
+ "max_position_embeddings": 512,
13
+ "model_type": "distilbert",
14
+ "n_heads": 12,
15
+ "n_layers": 6,
16
+ "output_past": true,
17
+ "pad_token_id": 0,
18
+ "qa_dropout": 0.1,
19
+ "seq_classif_dropout": 0.2,
20
+ "sinusoidal_pos_embds": false,
21
+ "tie_weights_": true,
22
+ "transformers_version": "4.57.1",
23
+ "vocab_size": 119547
24
+ }
inference_example.py ADDED
@@ -0,0 +1,265 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Simple inference example for Turnlet BERT Multilingual EOU model
4
+ Demonstrates both PyTorch and ONNX usage
5
+ """
6
+
7
+ import argparse
8
+ import numpy as np
9
+
10
+ def test_pytorch(text, threshold=0.86):
11
+ """Test using PyTorch model"""
12
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
13
+ import torch
14
+
15
+ print("🔥 Loading PyTorch model...")
16
+ model = AutoModelForSequenceClassification.from_pretrained(".")
17
+ tokenizer = AutoTokenizer.from_pretrained(".")
18
+ model.eval()
19
+
20
+ print(f"\n📝 Input: {text}")
21
+
22
+ # Tokenize and predict
23
+ inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
24
+
25
+ with torch.no_grad():
26
+ outputs = model(**inputs)
27
+ probs = torch.softmax(outputs.logits, dim=-1)
28
+
29
+ prob_eou = probs[0][1].item()
30
+ is_eou = prob_eou > threshold
31
+
32
+ print(f"✅ EOU Probability: {prob_eou:.4f}")
33
+ print(f"🎯 Prediction: {'EOU (End of Utterance)' if is_eou else 'Non-EOU (Incomplete)'}")
34
+ print(f"📊 Threshold: {threshold}")
35
+
36
+ return is_eou, prob_eou
37
+
38
+ def test_onnx(text, model_path="bert_model_optimized_dynamic_int8.onnx", threshold=0.86):
39
+ """Test using ONNX quantized model (faster)"""
40
+ import onnxruntime as ort
41
+ from transformers import AutoTokenizer
42
+
43
+ print("⚡ Loading ONNX Quantized INT8 model...")
44
+
45
+ # Load tokenizer and model
46
+ tokenizer = AutoTokenizer.from_pretrained(".")
47
+ session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
48
+
49
+ print(f"\n📝 Input: {text}")
50
+
51
+ # Tokenize
52
+ inputs = tokenizer(text, padding="max_length", max_length=128, truncation=True, return_tensors="np")
53
+
54
+ # Prepare ONNX inputs
55
+ ort_inputs = {
56
+ 'input_ids': inputs['input_ids'].astype(np.int64),
57
+ 'attention_mask': inputs['attention_mask'].astype(np.int64)
58
+ }
59
+
60
+ # Run inference
61
+ import time
62
+ start = time.time()
63
+ outputs = session.run(None, ort_inputs)
64
+ inference_time = (time.time() - start) * 1000
65
+
66
+ logits = outputs[0][0]
67
+ probs = np.exp(logits) / np.sum(np.exp(logits))
68
+ prob_eou = probs[1]
69
+ is_eou = prob_eou > threshold
70
+
71
+ print(f"✅ EOU Probability: {prob_eou:.4f}")
72
+ print(f"🎯 Prediction: {'EOU (End of Utterance)' if is_eou else 'Non-EOU (Incomplete)'}")
73
+ print(f"📊 Threshold: {threshold}")
74
+ print(f"⚡ Inference Time: {inference_time:.2f}ms")
75
+
76
+ return is_eou, prob_eou
77
+
78
+ def test_multiple_examples(use_onnx=True):
79
+ """Test multiple examples in different languages"""
80
+ examples = [
81
+ ("Thanks for your help!", "en", True),
82
+ ("I need a train to Cambridge.", "en", True),
83
+ ("What time does the", "en", False),
84
+ ("धन्यवाद!", "hi", True), # Hindi: "Thank you!"
85
+ ("मुझे मदद चाहिए", "hi", False), # Hindi: "I need help" (incomplete)
86
+ ("¡Gracias por tu ayuda!", "es", True), # Spanish: "Thanks for your help!"
87
+ ("Necesito un tren a", "es", False), # Spanish: "I need a train to" (incomplete)
88
+ ]
89
+
90
+ print("\n" + "="*70)
91
+ print("🌐 MULTILINGUAL EOU DETECTION TEST")
92
+ print("="*70)
93
+
94
+ correct = 0
95
+ total = len(examples)
96
+
97
+ for text, lang, expected_eou in examples:
98
+ print(f"\n{'─'*70}")
99
+ print(f"🌍 Language: {lang.upper()}")
100
+
101
+ if use_onnx:
102
+ is_eou, prob = test_onnx(text, threshold=0.86)
103
+ else:
104
+ is_eou, prob = test_pytorch(text, threshold=0.86)
105
+
106
+ expected_str = "EOU" if expected_eou else "Non-EOU"
107
+ predicted_str = "EOU" if is_eou else "Non-EOU"
108
+
109
+ is_correct = is_eou == expected_eou
110
+ correct += is_correct
111
+
112
+ status = "✅ CORRECT" if is_correct else "❌ INCORRECT"
113
+ print(f"💡 Expected: {expected_str} | Got: {predicted_str} | {status}")
114
+
115
+ print(f"\n{'='*70}")
116
+ print(f"📊 ACCURACY: {correct}/{total} ({correct/total*100:.1f}%)")
117
+ print(f"{'='*70}\n")
118
+
119
+ def interactive_mode(use_onnx=True, threshold=0.86):
120
+ """Interactive mode - continuously ask for input and predict"""
121
+ import onnxruntime as ort
122
+ from transformers import AutoTokenizer
123
+ import time
124
+
125
+ print("\n" + "="*70)
126
+ print("🎮 INTERACTIVE MODE - Multilingual EOU Detection")
127
+ print("="*70)
128
+ print("🌐 Supported languages: English, Hindi, Spanish")
129
+ print("📊 Threshold: {:.2f}".format(threshold))
130
+
131
+ if use_onnx:
132
+ print("⚡ Using: ONNX Quantized INT8 model (fast)")
133
+ tokenizer = AutoTokenizer.from_pretrained(".")
134
+ session = ort.InferenceSession("bert_model_optimized_dynamic_int8.onnx",
135
+ providers=['CPUExecutionProvider'])
136
+ else:
137
+ print("🔥 Using: PyTorch model")
138
+ from transformers import AutoModelForSequenceClassification
139
+ import torch
140
+ tokenizer = AutoTokenizer.from_pretrained(".")
141
+ model = AutoModelForSequenceClassification.from_pretrained(".")
142
+ model.eval()
143
+
144
+ print("\n💡 Type your text and press Enter to get EOU prediction")
145
+ print("💡 Type 'quit' or 'exit' to stop")
146
+ print("💡 Type 'examples' to see sample inputs")
147
+ print("="*70 + "\n")
148
+
149
+ sample_count = 0
150
+
151
+ while True:
152
+ try:
153
+ # Get user input
154
+ user_input = input("📝 Enter text: ").strip()
155
+
156
+ if not user_input:
157
+ continue
158
+
159
+ # Check for exit commands
160
+ if user_input.lower() in ['quit', 'exit', 'q']:
161
+ print("\n👋 Goodbye! Tested {} samples.".format(sample_count))
162
+ break
163
+
164
+ # Show examples
165
+ if user_input.lower() == 'examples':
166
+ print("\n📚 Example inputs to try:")
167
+ print(" English:")
168
+ print(" - 'Thanks for your help!' (EOU)")
169
+ print(" - 'I need to book a' (Non-EOU)")
170
+ print(" Hindi:")
171
+ print(" - 'धन्यवाद!' (Thank you! - EOU)")
172
+ print(" - 'मुझे मदद चाहिए' (I need help - could be EOU)")
173
+ print(" Spanish:")
174
+ print(" - '¡Muchas gracias!' (Thank you! - EOU)")
175
+ print(" - 'Necesito un tren a' (I need a train to - Non-EOU)")
176
+ print()
177
+ continue
178
+
179
+ sample_count += 1
180
+ print()
181
+
182
+ # Tokenize
183
+ inputs = tokenizer(user_input, padding="max_length", max_length=128,
184
+ truncation=True, return_tensors="np" if use_onnx else "pt")
185
+
186
+ # Predict
187
+ start = time.time()
188
+
189
+ if use_onnx:
190
+ # ONNX inference
191
+ ort_inputs = {
192
+ 'input_ids': inputs['input_ids'].astype(np.int64),
193
+ 'attention_mask': inputs['attention_mask'].astype(np.int64)
194
+ }
195
+ outputs = session.run(None, ort_inputs)
196
+ logits = outputs[0][0]
197
+ probs = np.exp(logits) / np.sum(np.exp(logits))
198
+ prob_eou = probs[1]
199
+ else:
200
+ # PyTorch inference
201
+ import torch
202
+ with torch.no_grad():
203
+ outputs = model(**inputs)
204
+ probs = torch.softmax(outputs.logits, dim=-1)
205
+ prob_eou = probs[0][1].item()
206
+
207
+ inference_time = (time.time() - start) * 1000
208
+
209
+ # Determine prediction
210
+ is_eou = prob_eou > threshold
211
+
212
+ # Display results with color coding
213
+ print("─" * 70)
214
+ if is_eou:
215
+ print("✅ Prediction: EOU (End of Utterance)")
216
+ print(" └─ The user has likely finished their thought")
217
+ else:
218
+ print("⏳ Prediction: Non-EOU (Incomplete)")
219
+ print(" └─ The user may still be speaking")
220
+
221
+ print(f"📊 Confidence: {prob_eou:.4f} (threshold: {threshold})")
222
+ print(f"⚡ Inference time: {inference_time:.2f}ms")
223
+
224
+ # Confidence bar
225
+ bar_length = 40
226
+ filled = int(bar_length * prob_eou)
227
+ bar = "█" * filled + "░" * (bar_length - filled)
228
+ print(f"📈 [{bar}] {prob_eou*100:.1f}%")
229
+ print("─" * 70 + "\n")
230
+
231
+ except KeyboardInterrupt:
232
+ print("\n\n👋 Interrupted! Tested {} samples. Goodbye!".format(sample_count))
233
+ break
234
+ except Exception as e:
235
+ print(f"❌ Error: {e}\n")
236
+ continue
237
+
238
+ def main():
239
+ parser = argparse.ArgumentParser(description="Test Turnlet BERT Multilingual EOU model")
240
+ parser.add_argument("--text", type=str, help="Text to classify")
241
+ parser.add_argument("--threshold", type=float, default=0.86, help="EOU threshold (default: 0.86)")
242
+ parser.add_argument("--pytorch", action="store_true", help="Use PyTorch instead of ONNX")
243
+ parser.add_argument("--test-suite", action="store_true", help="Run full test suite")
244
+ parser.add_argument("--interactive", "-i", action="store_true", help="Run in interactive mode")
245
+
246
+ args = parser.parse_args()
247
+
248
+ if args.interactive:
249
+ interactive_mode(use_onnx=not args.pytorch, threshold=args.threshold)
250
+ elif args.test_suite:
251
+ test_multiple_examples(use_onnx=not args.pytorch)
252
+ elif args.text:
253
+ if args.pytorch:
254
+ test_pytorch(args.text, args.threshold)
255
+ else:
256
+ test_onnx(args.text, threshold=args.threshold)
257
+ else:
258
+ # Default to interactive mode if no arguments provided
259
+ print("No arguments provided. Starting interactive mode...")
260
+ print("(Use --help to see all options)\n")
261
+ interactive_mode(use_onnx=True, threshold=args.threshold)
262
+
263
+ if __name__ == "__main__":
264
+ main()
265
+
metrics.yaml ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ epoch: 8
2
+ external:
3
+ turns2k:
4
+ accuracy: 0.911
5
+ f1: 0.9149952244508118
6
+ precision: 0.9795501022494888
7
+ recall: 0.8584229390681004
8
+ step: 60500
9
+ thresholds:
10
+ turns2k: 0.86
11
+ thresholds_met:
12
+ turns2k: true
13
+ validation:
14
+ accuracy: 0.964266049994494
15
+ en_accuracy: 0.9701070242342231
16
+ en_samples: 16258
17
+ es_accuracy: 0.9452467662941103
18
+ es_samples: 7963
19
+ f1: 0.9634921527816842
20
+ hi_accuracy: 0.968933322316781
21
+ hi_samples: 12103
22
+ precision: 0.9491300011082788
23
+ recall: 0.9782956362805575
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d4b6dff583e55fa1ac04e5877b826d09a671fa9108d866e3cb30f1ba0b619c9
3
+ size 541317368
model_card.json ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "Turnlet BERT Multilingual EOU",
3
+ "model_type": "DistilBERT",
4
+ "task": "text-classification",
5
+ "languages": ["en", "hi", "es"],
6
+ "tags": [
7
+ "end-of-utterance",
8
+ "eou-detection",
9
+ "multilingual",
10
+ "distilbert",
11
+ "onnx",
12
+ "quantized",
13
+ "conversational-ai",
14
+ "dialogue",
15
+ "turn-taking"
16
+ ],
17
+ "license": "apache-2.0",
18
+ "datasets": ["turns-2k"],
19
+ "metrics": {
20
+ "validation": {
21
+ "overall_accuracy": 0.9643,
22
+ "en_accuracy": 0.9701,
23
+ "hi_accuracy": 0.9689,
24
+ "es_accuracy": 0.9452,
25
+ "f1_score": 0.9635,
26
+ "precision": 0.9491,
27
+ "recall": 0.9783
28
+ },
29
+ "turns2k": {
30
+ "accuracy": 0.9110,
31
+ "f1_score": 0.9150,
32
+ "precision": 0.9796,
33
+ "recall": 0.8584,
34
+ "threshold": 0.86
35
+ }
36
+ },
37
+ "model_variants": {
38
+ "pytorch": {
39
+ "file": "model.safetensors",
40
+ "size_mb": 517,
41
+ "format": "safetensors"
42
+ },
43
+ "onnx_optimized": {
44
+ "file": "bert_model_optimized.onnx",
45
+ "size_mb": 517,
46
+ "format": "onnx",
47
+ "precision": "fp32"
48
+ },
49
+ "onnx_quantized": {
50
+ "file": "bert_model_optimized_dynamic_int8.onnx",
51
+ "size_mb": 132,
52
+ "format": "onnx",
53
+ "precision": "int8",
54
+ "recommended": true
55
+ }
56
+ },
57
+ "training": {
58
+ "method": "knowledge_distillation",
59
+ "teacher_model": "qwen-based",
60
+ "student_model": "distilbert",
61
+ "epochs": 8,
62
+ "final_step": 60500,
63
+ "max_length": 128
64
+ },
65
+ "inference": {
66
+ "recommended_threshold": 0.86,
67
+ "max_sequence_length": 128,
68
+ "batch_size_support": true
69
+ }
70
+ }
71
+
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ transformers>=4.30.0
2
+ torch>=2.0.0
3
+ onnxruntime>=1.15.0
4
+ numpy>=1.24.0
5
+ safetensors>=0.3.0
6
+
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": false,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff