SmolDuplex: Full-Duplex Interaction Model (135M)

A sub-200M parameter full-duplex spoken interaction model with 200ms turn-taking, built on SmolLM2-135M + CosyVoice. See ARCHITECTURE.md for the complete PRD.

Key Specs

  • Trainable params: ~139M
  • Turn-taking latency: <400ms
  • Chunk size: 200ms (13 tokens per chunk)
  • Capabilities: ASR, TTS, simultaneous listen+speak, backchanneling, interruption handling
  • Training cost: ~$44 cloud / ~$2 own GPU
  • Training time: ~38 GPU-hours
  • Inference: Any 8GB+ GPU, real-time

Based On

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "PranavHarshan/SmolDuplex"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for PranavHarshan/SmolDuplex