Russian Turn Detection Model (Qwen/Qwen3-0.3B Fine-Tuned)

This model is a specialized End-of-Utterance (EOU) / Turn Detection model designed for Russian spoken dialogue systems. It is fine-tuned from Qwen/Qwen/Qwen/Qwen3-0.6B to classify whether a user has finished speaking or is pausing mid-sentence.

It is optimized for real-time voice agents (like those using LiveKit) to minimize interruptions and reduce latency in conversational flows.

🎯 Model Capabilities

Task: Classifies text input as either COMPLETE (user finished) or CONTINUE (user is thinking/pausing).
Language: Russian (primary), handles mixed English/Russian technical terms.
Latency: Extremely fast inference (based on 0.5B parameter model), suitable for edge or cloud deployment.
Nuance: Correctly handles hesitation markers (e.g., "ну...", "эээ...", "как бы") as CONTINUE.

💻 Usage

Inference with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "RAS1981/qwen3-turn-detector-merged"
device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def predict_turn(text):
    messages = [
        {"role": "system", "content": "Ты голосовой ассистент. Определяй, закончил ли пользователь говорить."},
        {"role": "user", "content": text}
    ]
    
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to(device)

    outputs = model.generate(
        inputs, 
        max_new_tokens=2, 
        use_cache=True, 
        pad_token_id=tokenizer.eos_token_id
    )
    
    decoded = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
    return decoded.strip()

# Test Cases
print(predict_turn("Привет, я хочу заказать пиццу"))  # Output: COMPLETE
print(predict_turn("Ну я думаю что может быть..."))   # Output: CONTINUE

📊 Training Details

Dataset

Source: Custom dataset generated via Gemini 2.5 Flash Lite based on IlyaGusev/ru_turbo_alpaca and ss-corpus-ru.
Preprocessing:
- Converted formal text to spoken Russian (added hesitations, fillers, self-corrections).
- Normalized using NFKC and lowercased specific punctuation.
- Balanced 50/50 split between COMPLETE and CONTINUE labels to prevent bias.
Size: ~400 high-quality curated examples (incremental training).

Hyperparameters

Framework: Unsloth + TRL (SFTTrainer)
Quantization: 4-bit (QLoRA)
Learning Rate: 2e-4
Epochs: 62 (Early stopping based on loss convergence)
Final Loss: ~0.086
Optimizer: AdamW 8-bit

⚠️ Limitations

Context: The model looks at the current utterance context. Extremely long pauses in audio might still need VAD (Voice Activity Detection) support.
Domain: Fine-tuned on general conversation and real-estate inquiries; may need adaptation for highly specific medical or legal jargon.

🛠 Intended Use

LiveKit Agents: Use as a semantic turn detector in the EOU plugin.
Customer Support Bots: Prevent the bot from interrupting users while they think.
Voice Assistants: Improve natural flow in Russian dialogue.

Downloads last month: 5

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for RAS1981/qwen3-turn-detector-merged

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

(1051)

this model

Quantizations

1 model