Russian Turn Detection Model (Qwen/Qwen3-0.3B Fine-Tuned)

This model is a specialized End-of-Utterance (EOU) / Turn Detection model designed for Russian spoken dialogue systems. It is fine-tuned from Qwen/Qwen/Qwen/Qwen3-0.6B to classify whether a user has finished speaking or is pausing mid-sentence.

It is optimized for real-time voice agents (like those using LiveKit) to minimize interruptions and reduce latency in conversational flows.

🎯 Model Capabilities

  • Task: Classifies text input as either COMPLETE (user finished) or CONTINUE (user is thinking/pausing).
  • Language: Russian (primary), handles mixed English/Russian technical terms.
  • Latency: Extremely fast inference (based on 0.5B parameter model), suitable for edge or cloud deployment.
  • Nuance: Correctly handles hesitation markers (e.g., "Π½Ρƒ...", "эээ...", "ΠΊΠ°ΠΊ Π±Ρ‹") as CONTINUE.

πŸ’» Usage

Inference with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "RAS1981/qwen3-turn-detector-merged"
device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def predict_turn(text):
    messages = [
        {"role": "system", "content": "Π’Ρ‹ голосовой ассистСнт. ΠžΠΏΡ€Π΅Π΄Π΅Π»ΡΠΉ, Π·Π°ΠΊΠΎΠ½Ρ‡ΠΈΠ» Π»ΠΈ ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒ Π³ΠΎΠ²ΠΎΡ€ΠΈΡ‚ΡŒ."},
        {"role": "user", "content": text}
    ]
    
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to(device)

    outputs = model.generate(
        inputs, 
        max_new_tokens=2, 
        use_cache=True, 
        pad_token_id=tokenizer.eos_token_id
    )
    
    decoded = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
    return decoded.strip()

# Test Cases
print(predict_turn("ΠŸΡ€ΠΈΠ²Π΅Ρ‚, я Ρ…ΠΎΡ‡Ρƒ Π·Π°ΠΊΠ°Π·Π°Ρ‚ΡŒ ΠΏΠΈΡ†Ρ†Ρƒ"))  # Output: COMPLETE
print(predict_turn("Ну я Π΄ΡƒΠΌΠ°ΡŽ Ρ‡Ρ‚ΠΎ ΠΌΠΎΠΆΠ΅Ρ‚ Π±Ρ‹Ρ‚ΡŒ..."))   # Output: CONTINUE

πŸ“Š Training Details

Dataset

  • Source: Custom dataset generated via Gemini 2.5 Flash Lite based on IlyaGusev/ru_turbo_alpaca and ss-corpus-ru.
  • Preprocessing:
    • Converted formal text to spoken Russian (added hesitations, fillers, self-corrections).
    • Normalized using NFKC and lowercased specific punctuation.
    • Balanced 50/50 split between COMPLETE and CONTINUE labels to prevent bias.
  • Size: ~400 high-quality curated examples (incremental training).

Hyperparameters

  • Framework: Unsloth + TRL (SFTTrainer)
  • Quantization: 4-bit (QLoRA)
  • Learning Rate: 2e-4
  • Epochs: 62 (Early stopping based on loss convergence)
  • Final Loss: ~0.086
  • Optimizer: AdamW 8-bit

⚠️ Limitations

  • Context: The model looks at the current utterance context. Extremely long pauses in audio might still need VAD (Voice Activity Detection) support.
  • Domain: Fine-tuned on general conversation and real-estate inquiries; may need adaptation for highly specific medical or legal jargon.

πŸ›  Intended Use

  • LiveKit Agents: Use as a semantic turn detector in the EOU plugin.
  • Customer Support Bots: Prevent the bot from interrupting users while they think.
  • Voice Assistants: Improve natural flow in Russian dialogue.
Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for RAS1981/qwen3-turn-detector-merged

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(756)
this model
Quantizations
1 model