SmolDuplex: Full-Duplex Interaction Model (135M)

A sub-200M parameter full-duplex spoken interaction model with 200ms turn-taking, built on SmolLM2-135M + CosyVoice. See ARCHITECTURE.md for the complete PRD.

Key Specs

Trainable params: ~139M
Turn-taking latency: <400ms
Chunk size: 200ms (13 tokens per chunk)
Capabilities: ASR, TTS, simultaneous listen+speak, backchanneling, interruption handling
Training cost: ~$44 cloud / ~$2 own GPU
Training time: ~38 GPU-hours
Inference: Any 8GB+ GPU, real-time

Based On

OmniFlatten — proven full-duplex at 500M
SyncLLM — time-synchronized chunks
SmolLM2-135M — LLM backbone
CosyVoice — speech tokenizer/detokenizer

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "PranavHarshan/SmolDuplex"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for PranavHarshan/SmolDuplex

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Paper • 2410.17799 • Published Oct 23, 2024 • 17

Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents

Paper • 2409.15594 • Published Sep 23, 2024