Turnlet BERT Multilingual - End-of-Utterance Detection
A lightweight, multilingual DistilBERT model fine-tuned for End-of-Utterance (EOU) detection in conversational AI systems. This model supports English, Hindi, and Spanish with high accuracy and fast inference.
Model Description
- Architecture: DistilBERT (6 layers, 768 hidden dimensions)
- Parameters: ~67M parameters (DistilBERT base)
- Languages: English, Hindi, Spanish
- Task: Binary sequence classification (EOU vs Non-EOU)
- Training: Knowledge distillation from teacher model
- Model Size:
- PyTorch (safetensors): 517 MB
- ONNX (optimized FP32): 517 MB
- ONNX (quantized INT8): 132 MB (74% size reduction)
Performance Metrics
Validation Set Performance (Step 60500)
| Language | Accuracy | Samples |
|---|---|---|
| English | 97.01% | 16,258 |
| Hindi | 96.89% | 12,103 |
| Spanish | 94.52% | 7,963 |
| Overall | 96.43% | 36,324 |
Validation Metrics:
- F1 Score: 0.9635
- Precision: 0.9491
- Recall: 0.9783
TURNS-2K Benchmark
- Accuracy: 91.10%
- F1 Score: 0.9150
- Precision: 0.9796
- Recall: 0.8584
Model Variants
This repository includes three model formats:
- PyTorch (safetensors):
model.safetensors- Full precision PyTorch model - ONNX Optimized (FP32):
bert_model_optimized.onnx- Optimized for inference, full precision - ONNX Quantized (INT8):
bert_model_optimized_dynamic_int8.onnx- Recommended for production
Why Use the Quantized INT8 Model?
- โ 74% smaller (132 MB vs 517 MB)
- โ Faster inference on CPU
- โ Minimal accuracy loss (<0.5%)
- โ Lower memory footprint
- โ Better for deployment
Quick Start
Interactive Demo (Easiest Way)
# Clone the model repository
git clone https://huggingface.co/your-username/turnlet-bert-multilingual-eou
cd turnlet-bert-multilingual-eou
# Install dependencies
pip install -r requirements.txt
# Run interactive mode (default - uses fast ONNX INT8)
python inference_example.py
# Or explicitly use interactive mode
python inference_example.py --interactive
# Use PyTorch instead of ONNX
python inference_example.py --interactive --pytorch
# Adjust threshold
python inference_example.py --interactive --threshold 0.9
The interactive mode allows you to:
- ๐ฎ Type text and get instant EOU predictions
- ๐ Test in English, Hindi, or Spanish
- ๐ See confidence scores and inference times
- ๐ View visual confidence bars
- ๐ก Type 'examples' to see sample inputs
- ๐ช Type 'quit' or 'exit' to stop
One-off Prediction
# Single prediction with ONNX (fast)
python inference_example.py --text "Thanks for your help!"
# Test suite with multiple examples
python inference_example.py --test-suite
Using PyTorch (in Python)
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("your-username/turnlet-bert-multilingual-eou")
tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")
# Predict
text = "Thanks for your help!"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
is_eou = probs[0][1] > 0.5 # Using optimal threshold
print(f"EOU Probability: {probs[0][1]:.3f}")
print(f"Is EOU: {is_eou}")
Using ONNX (Quantized INT8) - Recommended for Production
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")
# Create ONNX session
session = ort.InferenceSession("bert_model_optimized_dynamic_int8.onnx")
# Tokenize
text = "Thanks for your help!"
inputs = tokenizer(text, padding="max_length", max_length=128, truncation=True, return_tensors="np")
# Prepare ONNX inputs
ort_inputs = {
'input_ids': inputs['input_ids'].astype(np.int64),
'attention_mask': inputs['attention_mask'].astype(np.int64)
}
# Run inference
outputs = session.run(None, ort_inputs)
logits = outputs[0][0]
# Calculate probability
probs = np.exp(logits) / np.sum(np.exp(logits))
is_eou = probs[1] > 0.5 # Using optimal threshold
print(f"EOU Probability: {probs[1]:.3f}")
print(f"Is EOU: {is_eou}")
Use Cases
This model is designed for:
- ๐ฃ๏ธ Voice Assistants: Detect when user has finished speaking
- ๐ฌ Chatbots: Identify complete user intents
- ๐ Call Centers: Segment customer utterances in real-time
- ๐ Multilingual Applications: Support English, Hindi, and Spanish speakers
- โก Real-time Systems: Fast inference with quantized model
Training Details
Training Data
The model was trained using knowledge distillation on a multilingual dataset:
- English: 76,258 samples
- Hindi: 75,103 samples
- Spanish: 75,963 samples
- Total: ~211K samples
Training Configuration
- Base Model: DistilBERT multilingual
- Method: Knowledge distillation from Qwen-based teacher model
- Epochs: 8
- Final Step: 60,500
- Optimization: AdamW optimizer
- Max Sequence Length: 128 tokens
Distillation Process
The model was created using sparse Mixture-of-Experts (MoE) based knowledge distillation:
- Teacher model (Qwen-based) provides soft labels
- Student model (DistilBERT) learns to mimic teacher predictions
- Multi-stage training with progressive difficulty
- Language-specific accuracy monitoring
Evaluation
The model was evaluated on:
- Validation Set: Balanced multilingual dataset
- TURNS-2K: Standard benchmark for turn-taking detection
- Per-Language Metrics: Individual language performance tracking
Inference Speed
Approximate inference times (CPU, single sample):
- ONNX Optimized: ~70-120ms
- ONNX Quantized INT8: ~40-50ms
Note: Actual speeds vary by hardware
Limitations
- Model performance is slightly lower on Spanish compared to English and Hindi
- Optimal threshold (0.86) may need adjustment for specific use cases
- Maximum sequence length is 128 tokens (longer texts will be truncated)
- Best performance on conversational, task-oriented dialogue
- May require fine-tuning for domain-specific applications
Citation
If you use this model in your research or applications, please cite:
@model{turnlet-bert-multilingual-eou,
title={Turnlet BERT Multilingual: End-of-Utterance Detection},
author={Your Name},
year={2024},
publisher={Hugging Face},
note={Knowledge-distilled DistilBERT for multilingual EOU detection}
}
License
Please specify your license here (e.g., Apache 2.0, MIT, etc.)
Model Card Contact
For questions or feedback, please open an issue in the repository.
Model Version: Step 60500
Last Updated: November 2024
Framework: PyTorch, ONNX Runtime
Languages: English (en), Hindi (hi), Spanish (es)
- Downloads last month
- 1