YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Turnlet BERT Multilingual - End-of-Utterance Detection

A lightweight, multilingual DistilBERT model fine-tuned for End-of-Utterance (EOU) detection in conversational AI systems. This model supports English, Hindi, and Spanish with high accuracy and fast inference.

Model Description

  • Architecture: DistilBERT (6 layers, 768 hidden dimensions)
  • Parameters: ~67M parameters (DistilBERT base)
  • Languages: English, Hindi, Spanish
  • Task: Binary sequence classification (EOU vs Non-EOU)
  • Training: Knowledge distillation from teacher model
  • Model Size:
    • PyTorch (safetensors): 517 MB
    • ONNX (optimized FP32): 517 MB
    • ONNX (quantized INT8): 132 MB (74% size reduction)

Performance Metrics

Validation Set Performance (Step 60500)

Language Accuracy Samples
English 97.01% 16,258
Hindi 96.89% 12,103
Spanish 94.52% 7,963
Overall 96.43% 36,324

Validation Metrics:

  • F1 Score: 0.9635
  • Precision: 0.9491
  • Recall: 0.9783

TURNS-2K Benchmark

  • Accuracy: 91.10%
  • F1 Score: 0.9150
  • Precision: 0.9796
  • Recall: 0.8584

Model Variants

This repository includes three model formats:

  1. PyTorch (safetensors): model.safetensors - Full precision PyTorch model
  2. ONNX Optimized (FP32): bert_model_optimized.onnx - Optimized for inference, full precision
  3. ONNX Quantized (INT8): bert_model_optimized_dynamic_int8.onnx - Recommended for production

Why Use the Quantized INT8 Model?

  • โœ… 74% smaller (132 MB vs 517 MB)
  • โœ… Faster inference on CPU
  • โœ… Minimal accuracy loss (<0.5%)
  • โœ… Lower memory footprint
  • โœ… Better for deployment

Quick Start

Interactive Demo (Easiest Way)

# Clone the model repository
git clone https://huggingface.co/your-username/turnlet-bert-multilingual-eou
cd turnlet-bert-multilingual-eou

# Install dependencies
pip install -r requirements.txt

# Run interactive mode (default - uses fast ONNX INT8)
python inference_example.py

# Or explicitly use interactive mode
python inference_example.py --interactive

# Use PyTorch instead of ONNX
python inference_example.py --interactive --pytorch

# Adjust threshold
python inference_example.py --interactive --threshold 0.9

The interactive mode allows you to:

  • ๐ŸŽฎ Type text and get instant EOU predictions
  • ๐ŸŒ Test in English, Hindi, or Spanish
  • ๐Ÿ“Š See confidence scores and inference times
  • ๐Ÿ“ˆ View visual confidence bars
  • ๐Ÿ’ก Type 'examples' to see sample inputs
  • ๐Ÿšช Type 'quit' or 'exit' to stop

One-off Prediction

# Single prediction with ONNX (fast)
python inference_example.py --text "Thanks for your help!"

# Test suite with multiple examples
python inference_example.py --test-suite

Using PyTorch (in Python)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("your-username/turnlet-bert-multilingual-eou")
tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")

# Predict
text = "Thanks for your help!"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
is_eou = probs[0][1] > 0.5  # Using optimal threshold

print(f"EOU Probability: {probs[0][1]:.3f}")
print(f"Is EOU: {is_eou}")

Using ONNX (Quantized INT8) - Recommended for Production

import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/turnlet-bert-multilingual-eou")

# Create ONNX session
session = ort.InferenceSession("bert_model_optimized_dynamic_int8.onnx")

# Tokenize
text = "Thanks for your help!"
inputs = tokenizer(text, padding="max_length", max_length=128, truncation=True, return_tensors="np")

# Prepare ONNX inputs
ort_inputs = {
    'input_ids': inputs['input_ids'].astype(np.int64),
    'attention_mask': inputs['attention_mask'].astype(np.int64)
}

# Run inference
outputs = session.run(None, ort_inputs)
logits = outputs[0][0]

# Calculate probability
probs = np.exp(logits) / np.sum(np.exp(logits))
is_eou = probs[1] > 0.5 # Using optimal threshold

print(f"EOU Probability: {probs[1]:.3f}")
print(f"Is EOU: {is_eou}")

Use Cases

This model is designed for:

  • ๐Ÿ—ฃ๏ธ Voice Assistants: Detect when user has finished speaking
  • ๐Ÿ’ฌ Chatbots: Identify complete user intents
  • ๐Ÿ“ž Call Centers: Segment customer utterances in real-time
  • ๐ŸŒ Multilingual Applications: Support English, Hindi, and Spanish speakers
  • โšก Real-time Systems: Fast inference with quantized model

Training Details

Training Data

The model was trained using knowledge distillation on a multilingual dataset:

  • English: 76,258 samples
  • Hindi: 75,103 samples
  • Spanish: 75,963 samples
  • Total: ~211K samples

Training Configuration

  • Base Model: DistilBERT multilingual
  • Method: Knowledge distillation from Qwen-based teacher model
  • Epochs: 8
  • Final Step: 60,500
  • Optimization: AdamW optimizer
  • Max Sequence Length: 128 tokens

Distillation Process

The model was created using sparse Mixture-of-Experts (MoE) based knowledge distillation:

  1. Teacher model (Qwen-based) provides soft labels
  2. Student model (DistilBERT) learns to mimic teacher predictions
  3. Multi-stage training with progressive difficulty
  4. Language-specific accuracy monitoring

Evaluation

The model was evaluated on:

  1. Validation Set: Balanced multilingual dataset
  2. TURNS-2K: Standard benchmark for turn-taking detection
  3. Per-Language Metrics: Individual language performance tracking

Inference Speed

Approximate inference times (CPU, single sample):

  • ONNX Optimized: ~70-120ms
  • ONNX Quantized INT8: ~40-50ms

Note: Actual speeds vary by hardware

Limitations

  • Model performance is slightly lower on Spanish compared to English and Hindi
  • Optimal threshold (0.86) may need adjustment for specific use cases
  • Maximum sequence length is 128 tokens (longer texts will be truncated)
  • Best performance on conversational, task-oriented dialogue
  • May require fine-tuning for domain-specific applications

Citation

If you use this model in your research or applications, please cite:

@model{turnlet-bert-multilingual-eou,
  title={Turnlet BERT Multilingual: End-of-Utterance Detection},
  author={Your Name},
  year={2024},
  publisher={Hugging Face},
  note={Knowledge-distilled DistilBERT for multilingual EOU detection}
}

License

Please specify your license here (e.g., Apache 2.0, MIT, etc.)

Model Card Contact

For questions or feedback, please open an issue in the repository.


Model Version: Step 60500
Last Updated: November 2024
Framework: PyTorch, ONNX Runtime
Languages: English (en), Hindi (hi), Spanish (es)

Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support