| # Turnlet BERT Multilingual - End-of-Utterance Detection | |
| A lightweight, multilingual DistilBERT model fine-tuned for End-of-Utterance (EOU) detection in conversational AI systems. This model supports **English, Hindi, and Spanish** with high accuracy and fast inference. | |
| ## Model Description | |
| - **Architecture**: DistilBERT (6 layers, 768 hidden dimensions) | |
| - **Parameters**: ~67M parameters (DistilBERT base) | |
| - **Languages**: English, Hindi, Spanish | |
| - **Task**: Binary sequence classification (EOU vs Non-EOU) | |
| - **Training**: Knowledge distillation from teacher model | |
| - **Model Size**: | |
| - PyTorch (safetensors): 517 MB | |
| - ONNX (optimized FP32): 517 MB | |
| - ONNX (quantized INT8): 132 MB (74% size reduction) | |
| ## Performance Metrics | |
| ### Validation Set Performance (Step 60500) | |
| | Language | Accuracy | Samples | | |
| |----------|----------|---------| | |
| | **English** | 97.01% | 16,258 | | |
| | **Hindi** | 96.89% | 12,103 | | |
| | **Spanish** | 94.52% | 7,963 | | |
| | **Overall** | 96.43% | 36,324 | | |
| **Validation Metrics:** | |
| - F1 Score: 0.9635 | |
| - Precision: 0.9491 | |
| - Recall: 0.9783 | |
| ### TURNS-2K Benchmark | |
| - **Accuracy**: 91.10% | |
| - **F1 Score**: 0.9150 | |
| - **Precision**: 0.9796 | |
| - **Recall**: 0.8584 | |
| ## Model Variants | |
| This repository includes three model formats: | |
| 1. **PyTorch (safetensors)**: `model.safetensors` - Full precision PyTorch model | |
| 2. **ONNX Optimized (FP32)**: `bert_model_optimized.onnx` - Optimized for inference, full precision | |
| 3. **ONNX Quantized (INT8)**: `bert_model_optimized_dynamic_int8.onnx` - **Recommended** for production | |
| ### Why Use the Quantized INT8 Model? | |
| - โ **74% smaller** (132 MB vs 517 MB) | |
| - โ **Faster inference** on CPU | |
| - โ **Minimal accuracy loss** (<0.5%) | |
| - โ **Lower memory footprint** | |
| - โ **Better for deployment** | |
| ## Quick Start | |
| ### Interactive Demo (Easiest Way) | |
| ```bash | |
| # Clone the model repository | |
| git clone https://huggingface.co/your-username/turnlet-bert-multilingual-eou | |
| cd turnlet-bert-multilingual-eou | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run interactive mode (default - uses fast ONNX INT8) | |
| python inference_example.py | |
| # Or explicitly use interactive mode | |
| python inference_example.py --interactive | |
| # Use PyTorch instead of ONNX | |
| python inference_example.py --interactive --pytorch | |
| # Adjust threshold | |
| python inference_example.py --interactive --threshold 0.9 | |
| ``` | |
| The interactive mode allows you to: | |
| - ๐ฎ Type text and get instant EOU predictions | |
| - ๐ Test in English, Hindi, or Spanish | |
| - ๐ See confidence scores and inference times | |
| - ๐ View visual confidence bars | |
| - ๐ก Type 'examples' to see sample inputs | |
| - ๐ช Type 'quit' or 'exit' to stop | |
| ### One-off Prediction | |
| ```bash | |
| # Single prediction with ONNX (fast) | |
| python inference_example.py --text "Thanks for your help!" | |
| # Test suite with multiple examples | |
| python inference_example.py --test-suite | |
| ``` | |
| ### Using PyTorch (in Python) | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| import torch | |
| # Load model and tokenizer | |
| model = AutoModelForSequenceClassification.from_pretrained("your-username/turnlet-bert-multilingual-eou") | |
| tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou") | |
| # Predict | |
| text = "Thanks for your help!" | |
| inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128) | |
| outputs = model(**inputs) | |
| probs = torch.softmax(outputs.logits, dim=-1) | |
| is_eou = probs[0][1] > 0.5 # Using optimal threshold | |
| print(f"EOU Probability: {probs[0][1]:.3f}") | |
| print(f"Is EOU: {is_eou}") | |
| ``` | |
| ### Using ONNX (Quantized INT8) - Recommended for Production | |
| ```python | |
| import onnxruntime as ort | |
| import numpy as np | |
| from transformers import AutoTokenizer | |
| # Load tokenizer | |
| tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou") | |
| # Create ONNX session | |
| session = ort.InferenceSession("bert_model_optimized_dynamic_int8.onnx") | |
| # Tokenize | |
| text = "Thanks for your help!" | |
| inputs = tokenizer(text, padding="max_length", max_length=128, truncation=True, return_tensors="np") | |
| # Prepare ONNX inputs | |
| ort_inputs = { | |
| 'input_ids': inputs['input_ids'].astype(np.int64), | |
| 'attention_mask': inputs['attention_mask'].astype(np.int64) | |
| } | |
| # Run inference | |
| outputs = session.run(None, ort_inputs) | |
| logits = outputs[0][0] | |
| # Calculate probability | |
| probs = np.exp(logits) / np.sum(np.exp(logits)) | |
| is_eou = probs[1] > 0.5 # Using optimal threshold | |
| print(f"EOU Probability: {probs[1]:.3f}") | |
| print(f"Is EOU: {is_eou}") | |
| ``` | |
| ## Use Cases | |
| This model is designed for: | |
| - ๐ฃ๏ธ **Voice Assistants**: Detect when user has finished speaking | |
| - ๐ฌ **Chatbots**: Identify complete user intents | |
| - ๐ **Call Centers**: Segment customer utterances in real-time | |
| - ๐ **Multilingual Applications**: Support English, Hindi, and Spanish speakers | |
| - โก **Real-time Systems**: Fast inference with quantized model | |
| ## Training Details | |
| ### Training Data | |
| The model was trained using knowledge distillation on a multilingual dataset: | |
| - **English**: 76,258 samples | |
| - **Hindi**: 75,103 samples | |
| - **Spanish**: 75,963 samples | |
| - **Total**: ~211K samples | |
| ### Training Configuration | |
| - **Base Model**: DistilBERT multilingual | |
| - **Method**: Knowledge distillation from Qwen-based teacher model | |
| - **Epochs**: 8 | |
| - **Final Step**: 60,500 | |
| - **Optimization**: AdamW optimizer | |
| - **Max Sequence Length**: 128 tokens | |
| ### Distillation Process | |
| The model was created using sparse Mixture-of-Experts (MoE) based knowledge distillation: | |
| 1. Teacher model (Qwen-based) provides soft labels | |
| 2. Student model (DistilBERT) learns to mimic teacher predictions | |
| 3. Multi-stage training with progressive difficulty | |
| 4. Language-specific accuracy monitoring | |
| ## Evaluation | |
| The model was evaluated on: | |
| 1. **Validation Set**: Balanced multilingual dataset | |
| 2. **TURNS-2K**: Standard benchmark for turn-taking detection | |
| 3. **Per-Language Metrics**: Individual language performance tracking | |
| ### Inference Speed | |
| Approximate inference times (CPU, single sample): | |
| - ONNX Optimized: ~70-120ms | |
| - ONNX Quantized INT8: ~40-50ms | |
| *Note: Actual speeds vary by hardware* | |
| ## Limitations | |
| - Model performance is slightly lower on Spanish compared to English and Hindi | |
| - Optimal threshold (0.86) may need adjustment for specific use cases | |
| - Maximum sequence length is 128 tokens (longer texts will be truncated) | |
| - Best performance on conversational, task-oriented dialogue | |
| - May require fine-tuning for domain-specific applications | |
| ## Citation | |
| If you use this model in your research or applications, please cite: | |
| ```bibtex | |
| @model{turnlet-bert-multilingual-eou, | |
| title={Turnlet BERT Multilingual: End-of-Utterance Detection}, | |
| author={Your Name}, | |
| year={2024}, | |
| publisher={Hugging Face}, | |
| note={Knowledge-distilled DistilBERT for multilingual EOU detection} | |
| } | |
| ``` | |
| ## License | |
| Please specify your license here (e.g., Apache 2.0, MIT, etc.) | |
| ## Model Card Contact | |
| For questions or feedback, please open an issue in the repository. | |
| --- | |
| **Model Version**: Step 60500 | |
| **Last Updated**: November 2024 | |
| **Framework**: PyTorch, ONNX Runtime | |
| **Languages**: English (en), Hindi (hi), Spanish (es) | |