EOT Detector - SmolLM2 135M

A fine-tuned model for End-of-Turn (EOT) detection in conversations, based on SmolLM2-135M.

Model Description

This model predicts whether a user has finished speaking in a conversation (end-of-turn) or is still continuing. It's designed for voice AI applications where accurate turn-taking is critical to avoid interrupting users.

Key Features

Base Model: SmolLM2-135M (135M parameters)
Fine-tuning Method: LoRA (r=4, alpha=8)
Task: Binary classification (complete vs incomplete turn)
Inference Speed: ~10ms on CPU

Training Details

Parameter	Value
Base Model	HuggingFaceTB/SmolLM2-135M
LoRA Rank	4
LoRA Alpha	8
Learning Rate	2e-4
Epochs	3
Training Samples	50
Hardware	T4 GPU

Evaluation Results

Evaluated on Vurtnec/eot-detection-testset (30 samples):

Metric	Value
Accuracy	76.67%
Precision	100%
Recall	53.33%
F1 Score	69.57%

Classification Report

              precision    recall  f1-score   support

  Incomplete       0.68      1.00      0.81        15
    Complete       1.00      0.53      0.70        15

    accuracy                           0.77        30
   macro avg       0.84      0.77      0.75        30

Analysis

High Precision (100%): When the model predicts "complete", it's always correct
Lower Recall (53%): The model is conservative, sometimes missing completed turns
This is preferable for voice AI: better to wait slightly longer than to interrupt users

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load model
base_model = "HuggingFaceTB/SmolLM2-135M"
adapter_model = "Vurtnec/eot-detector-smollm2"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)
model = PeftModel.from_pretrained(model, adapter_model)

# Format input
def format_conversation(messages):
    text = ""
    for msg in messages:
        text += f"<|im_start|>{msg['role']}\n{msg['content']}<|im_end|>\n"
    text += "<|im_start|>label\n"
    return text

# Example
messages = [
    {"role": "user", "content": "Hi, I need help"},
    {"role": "assistant", "content": "Sure, what do you need?"},
    {"role": "user", "content": "Well, um..."}
]

input_text = format_conversation(messages)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=10)
result = tokenizer.decode(outputs[0])

# Check for <|eot|> (complete) or <|continue|> (incomplete)

Datasets

Training: Vurtnec/eot-detection-dataset (50 samples)
Testing: Vurtnec/eot-detection-testset (30 samples)

Limitations

Trained on limited English data (50 samples)
May not generalize well to domain-specific conversations
Conservative prediction style (prefers "incomplete" when uncertain)

License

Apache 2.0

Downloads last month: 2

Model tree for Vurtnec/eot-detector-smollm2

Base model

HuggingFaceTB/SmolLM2-135M

Adapter

(22)

this model

Dataset used to train Vurtnec/eot-detector-smollm2

Evaluation results

Accuracy on EOT Detection Test Set
self-reported

0.767
Precision on EOT Detection Test Set
self-reported

1.000
Recall on EOT Detection Test Set
self-reported

0.533
F1 on EOT Detection Test Set
self-reported

0.696