My First Blog

Community Article Published January 27, 2026

NVIDIA just dropped a game-changer for real-time speech interaction: PersonaPlex-7B-v1 the first open-source full-duplex Speech-to-Speech dialogue model!

Built on the Moshi architecture with NVIDIA Helium LLM as the backbone, it replaces the clunky ASR→LLM→TTS pipeline with a single Transformer. Dual-stream design enables listen-while-speaking seamless turn-taking (0.908 takeover rate, 0.17s latency) and smooth user interrupt handling (0.950 rate, 0.24s latency) that feels human-like

Key wins for devs:

✅ Zero-shot voice cloning via voice prompts (16 preset voices + custom audio)

✅ Role customization with text prompts (define identity/business rules in 200 tokens)

✅ 3x faster inference with FlashAttention/FlashInfer (24kHz sampling, Mimi codec)

✅ Runs on RTX 3090/4090 + A100/H100 (≥3×80GB VRAM recommended)

✅ Commercial-friendly (NVIDIA Open Model License + MIT code license)

Crushes FullDuplexBench/ServiceDuplexBench leading metrics for pausing, turn adherence, and task completion. Perfect for customer support, game NPCs, virtual assistants (English-only for now)

👉 Repo: https://huggingface.co/nvidia/personaplex-7b-v1 📄 Paper: https://research.nvidia.com/labs/adlr/files/personaplex/personaplex_preprint.pdf

Community

Sign up or log in to comment