# Turnlet BERT Multilingual - End-of-Utterance Detection A lightweight, multilingual DistilBERT model fine-tuned for End-of-Utterance (EOU) detection in conversational AI systems. This model supports **English, Hindi, and Spanish** with high accuracy and fast inference. ## Model Description - **Architecture**: DistilBERT (6 layers, 768 hidden dimensions) - **Parameters**: ~67M parameters (DistilBERT base) - **Languages**: English, Hindi, Spanish - **Task**: Binary sequence classification (EOU vs Non-EOU) - **Training**: Knowledge distillation from teacher model - **Model Size**: - PyTorch (safetensors): 517 MB - ONNX (optimized FP32): 517 MB - ONNX (quantized INT8): 132 MB (74% size reduction) ## Performance Metrics ### Validation Set Performance (Step 60500) | Language | Accuracy | Samples | |----------|----------|---------| | **English** | 97.01% | 16,258 | | **Hindi** | 96.89% | 12,103 | | **Spanish** | 94.52% | 7,963 | | **Overall** | 96.43% | 36,324 | **Validation Metrics:** - F1 Score: 0.9635 - Precision: 0.9491 - Recall: 0.9783 ### TURNS-2K Benchmark - **Accuracy**: 91.10% - **F1 Score**: 0.9150 - **Precision**: 0.9796 - **Recall**: 0.8584 ## Model Variants This repository includes three model formats: 1. **PyTorch (safetensors)**: `model.safetensors` - Full precision PyTorch model 2. **ONNX Optimized (FP32)**: `bert_model_optimized.onnx` - Optimized for inference, full precision 3. **ONNX Quantized (INT8)**: `bert_model_optimized_dynamic_int8.onnx` - **Recommended** for production ### Why Use the Quantized INT8 Model? - ✅ **74% smaller** (132 MB vs 517 MB) - ✅ **Faster inference** on CPU - ✅ **Minimal accuracy loss** (<0.5%) - ✅ **Lower memory footprint** - ✅ **Better for deployment** ## Quick Start ### Interactive Demo (Easiest Way) ```bash # Clone the model repository git clone https://huggingface.co/your-username/turnlet-bert-multilingual-eou cd turnlet-bert-multilingual-eou # Install dependencies pip install -r requirements.txt # Run interactive mode (default - uses fast ONNX INT8) python inference_example.py # Or explicitly use interactive mode python inference_example.py --interactive # Use PyTorch instead of ONNX python inference_example.py --interactive --pytorch # Adjust threshold python inference_example.py --interactive --threshold 0.9 ``` The interactive mode allows you to: - 🎮 Type text and get instant EOU predictions - 🌐 Test in English, Hindi, or Spanish - 📊 See confidence scores and inference times - 📈 View visual confidence bars - 💡 Type 'examples' to see sample inputs - 🚪 Type 'quit' or 'exit' to stop ### One-off Prediction ```bash # Single prediction with ONNX (fast) python inference_example.py --text "Thanks for your help!" # Test suite with multiple examples python inference_example.py --test-suite ``` ### Using PyTorch (in Python) ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model = AutoModelForSequenceClassification.from_pretrained("your-username/turnlet-bert-multilingual-eou") tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou") # Predict text = "Thanks for your help!" inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128) outputs = model(**inputs) probs = torch.softmax(outputs.logits, dim=-1) is_eou = probs[0][1] > 0.5 # Using optimal threshold print(f"EOU Probability: {probs[0][1]:.3f}") print(f"Is EOU: {is_eou}") ``` ### Using ONNX (Quantized INT8) - Recommended for Production ```python import onnxruntime as ort import numpy as np from transformers import AutoTokenizer # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou") # Create ONNX session session = ort.InferenceSession("bert_model_optimized_dynamic_int8.onnx") # Tokenize text = "Thanks for your help!" inputs = tokenizer(text, padding="max_length", max_length=128, truncation=True, return_tensors="np") # Prepare ONNX inputs ort_inputs = { 'input_ids': inputs['input_ids'].astype(np.int64), 'attention_mask': inputs['attention_mask'].astype(np.int64) } # Run inference outputs = session.run(None, ort_inputs) logits = outputs[0][0] # Calculate probability probs = np.exp(logits) / np.sum(np.exp(logits)) is_eou = probs[1] > 0.5 # Using optimal threshold print(f"EOU Probability: {probs[1]:.3f}") print(f"Is EOU: {is_eou}") ``` ## Use Cases This model is designed for: - 🗣️ **Voice Assistants**: Detect when user has finished speaking - 💬 **Chatbots**: Identify complete user intents - 📞 **Call Centers**: Segment customer utterances in real-time - 🌐 **Multilingual Applications**: Support English, Hindi, and Spanish speakers - ⚡ **Real-time Systems**: Fast inference with quantized model ## Training Details ### Training Data The model was trained using knowledge distillation on a multilingual dataset: - **English**: 76,258 samples - **Hindi**: 75,103 samples - **Spanish**: 75,963 samples - **Total**: ~211K samples ### Training Configuration - **Base Model**: DistilBERT multilingual - **Method**: Knowledge distillation from Qwen-based teacher model - **Epochs**: 8 - **Final Step**: 60,500 - **Optimization**: AdamW optimizer - **Max Sequence Length**: 128 tokens ### Distillation Process The model was created using sparse Mixture-of-Experts (MoE) based knowledge distillation: 1. Teacher model (Qwen-based) provides soft labels 2. Student model (DistilBERT) learns to mimic teacher predictions 3. Multi-stage training with progressive difficulty 4. Language-specific accuracy monitoring ## Evaluation The model was evaluated on: 1. **Validation Set**: Balanced multilingual dataset 2. **TURNS-2K**: Standard benchmark for turn-taking detection 3. **Per-Language Metrics**: Individual language performance tracking ### Inference Speed Approximate inference times (CPU, single sample): - ONNX Optimized: ~70-120ms - ONNX Quantized INT8: ~40-50ms *Note: Actual speeds vary by hardware* ## Limitations - Model performance is slightly lower on Spanish compared to English and Hindi - Optimal threshold (0.86) may need adjustment for specific use cases - Maximum sequence length is 128 tokens (longer texts will be truncated) - Best performance on conversational, task-oriented dialogue - May require fine-tuning for domain-specific applications ## Citation If you use this model in your research or applications, please cite: ```bibtex @model{turnlet-bert-multilingual-eou, title={Turnlet BERT Multilingual: End-of-Utterance Detection}, author={Your Name}, year={2024}, publisher={Hugging Face}, note={Knowledge-distilled DistilBERT for multilingual EOU detection} } ``` ## License Please specify your license here (e.g., Apache 2.0, MIT, etc.) ## Model Card Contact For questions or feedback, please open an issue in the repository. --- **Model Version**: Step 60500 **Last Updated**: November 2024 **Framework**: PyTorch, ONNX Runtime **Languages**: English (en), Hindi (hi), Spanish (es)