YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Turnlet BERT Multilingual - End-of-Utterance Detection

A lightweight, multilingual DistilBERT model fine-tuned for End-of-Utterance (EOU) detection in conversational AI systems. This model supports English, Hindi, and Spanish with high accuracy and fast inference.

Model Description

Architecture: DistilBERT (6 layers, 768 hidden dimensions)
Parameters: ~67M parameters (DistilBERT base)
Languages: English, Hindi, Spanish
Task: Binary sequence classification (EOU vs Non-EOU)
Training: Knowledge distillation from teacher model
Model Size:
- PyTorch (safetensors): 517 MB
- ONNX (optimized FP32): 517 MB
- ONNX (quantized INT8): 132 MB (74% size reduction)

Performance Metrics

Validation Set Performance (Step 60500)

Language	Accuracy	Samples
English	97.01%	16,258
Hindi	96.89%	12,103
Spanish	94.52%	7,963
Overall	96.43%	36,324

Validation Metrics:

F1 Score: 0.9635
Precision: 0.9491
Recall: 0.9783

TURNS-2K Benchmark

Accuracy: 91.10%
F1 Score: 0.9150
Precision: 0.9796
Recall: 0.8584

Model Variants

This repository includes three model formats:

PyTorch (safetensors): model.safetensors - Full precision PyTorch model
ONNX Optimized (FP32): bert_model_optimized.onnx - Optimized for inference, full precision
ONNX Quantized (INT8): bert_model_optimized_dynamic_int8.onnx - Recommended for production

Why Use the Quantized INT8 Model?

✅ 74% smaller (132 MB vs 517 MB)
✅ Faster inference on CPU
✅ Minimal accuracy loss (<0.5%)
✅ Lower memory footprint
✅ Better for deployment

Quick Start

Interactive Demo (Easiest Way)

# Clone the model repository
git clone https://huggingface.co/your-username/turnlet-bert-multilingual-eou
cd turnlet-bert-multilingual-eou

# Install dependencies
pip install -r requirements.txt

# Run interactive mode (default - uses fast ONNX INT8)
python inference_example.py

# Or explicitly use interactive mode
python inference_example.py --interactive

# Use PyTorch instead of ONNX
python inference_example.py --interactive --pytorch

# Adjust threshold
python inference_example.py --interactive --threshold 0.9

The interactive mode allows you to:

🎮 Type text and get instant EOU predictions
🌐 Test in English, Hindi, or Spanish
📊 See confidence scores and inference times
📈 View visual confidence bars
💡 Type 'examples' to see sample inputs
🚪 Type 'quit' or 'exit' to stop

One-off Prediction

# Single prediction with ONNX (fast)
python inference_example.py --text "Thanks for your help!"

# Test suite with multiple examples
python inference_example.py --test-suite

Using PyTorch (in Python)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("your-username/turnlet-bert-multilingual-eou")
tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")

# Predict
text = "Thanks for your help!"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
is_eou = probs[0][1] > 0.5  # Using optimal threshold

print(f"EOU Probability: {probs[0][1]:.3f}")
print(f"Is EOU: {is_eou}")

Using ONNX (Quantized INT8) - Recommended for Production

import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")

# Create ONNX session
session = ort.InferenceSession("bert_model_optimized_dynamic_int8.onnx")

# Tokenize
text = "Thanks for your help!"
inputs = tokenizer(text, padding="max_length", max_length=128, truncation=True, return_tensors="np")

# Prepare ONNX inputs
ort_inputs = {
    'input_ids': inputs['input_ids'].astype(np.int64),
    'attention_mask': inputs['attention_mask'].astype(np.int64)
}

# Run inference
outputs = session.run(None, ort_inputs)
logits = outputs[0][0]

# Calculate probability
probs = np.exp(logits) / np.sum(np.exp(logits))
is_eou = probs[1] > 0.5 # Using optimal threshold

print(f"EOU Probability: {probs[1]:.3f}")
print(f"Is EOU: {is_eou}")

Use Cases

This model is designed for:

🗣️ Voice Assistants: Detect when user has finished speaking
💬 Chatbots: Identify complete user intents
📞 Call Centers: Segment customer utterances in real-time
🌐 Multilingual Applications: Support English, Hindi, and Spanish speakers
⚡ Real-time Systems: Fast inference with quantized model

Training Details

Training Data

The model was trained using knowledge distillation on a multilingual dataset:

English: 76,258 samples
Hindi: 75,103 samples
Spanish: 75,963 samples
Total: ~211K samples

Training Configuration

Base Model: DistilBERT multilingual
Method: Knowledge distillation from Qwen-based teacher model
Epochs: 8
Final Step: 60,500
Optimization: AdamW optimizer
Max Sequence Length: 128 tokens

Distillation Process

The model was created using sparse Mixture-of-Experts (MoE) based knowledge distillation:

Teacher model (Qwen-based) provides soft labels
Student model (DistilBERT) learns to mimic teacher predictions
Multi-stage training with progressive difficulty
Language-specific accuracy monitoring

Evaluation

The model was evaluated on:

Validation Set: Balanced multilingual dataset
TURNS-2K: Standard benchmark for turn-taking detection
Per-Language Metrics: Individual language performance tracking

Inference Speed

Approximate inference times (CPU, single sample):

ONNX Optimized: ~70-120ms
ONNX Quantized INT8: ~40-50ms

Note: Actual speeds vary by hardware

Limitations

Model performance is slightly lower on Spanish compared to English and Hindi
Optimal threshold (0.86) may need adjustment for specific use cases
Maximum sequence length is 128 tokens (longer texts will be truncated)
Best performance on conversational, task-oriented dialogue
May require fine-tuning for domain-specific applications

Citation

If you use this model in your research or applications, please cite:

@model{turnlet-bert-multilingual-eou,
  title={Turnlet BERT Multilingual: End-of-Utterance Detection},
  author={Your Name},
  year={2024},
  publisher={Hugging Face},
  note={Knowledge-distilled DistilBERT for multilingual EOU detection}
}

License

Please specify your license here (e.g., Apache 2.0, MIT, etc.)

Model Card Contact

For questions or feedback, please open an issue in the repository.

Model Version: Step 60500
Last Updated: November 2024
Framework: PyTorch, ONNX Runtime
Languages: English (en), Hindi (hi), Spanish (es)

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support