PersonaPlex-7B MLX 4-bit

PersonaPlex 7B full-duplex speech-to-speech model converted to MLX safetensors with 4-bit quantization for Apple Silicon.

Converted from nvidia/personaplex-7b-v1 (based on Kyutai Moshi architecture).

Swift inference: ivan-digital/qwen3-asr-swift

Model Details

Component Architecture Size
Temporal Transformer 32-layer, 4096d, 32 heads (7B params) ~3.5 GB (4-bit)
Depformer 6-layer, 1024d, 16 heads, per-codebook weights ~50 MB (fp16)
Mimi Codec SEANet encoder/decoder + 8L transformer + 16 RVQ codebooks ~370 MB (fp16)
Embeddings Text + 16 audio embeddings + output heads ~940 MB (fp16)
Total ~4.9 GB

Architecture

[User Audio 24kHz] β†’ [Mimi Encoder] β†’ 16 codebook tokens @ 12.5Hz
                                              ↓
              [Temporal Transformer: 32L, dim=4096, 7B params]
                  17 streams: text + 8 user audio + 8 agent audio
                                              ↓
              [Depformer: 6L, dim=1024, per-codebook weights]
                  16 sequential steps β†’ agent audio codebook tokens
                                              ↓
[Agent Audio 24kHz] ← [Mimi Decoder] ← codebook tokens @ 12.5Hz

Voices

18 voice presets available:

Category Voices
Natural Female NATF0, NATF1, NATF2, NATF3
Natural Male NATM0, NATM1, NATM2, NATM3
Variety Female VARF0, VARF1, VARF2, VARF3, VARF4
Variety Male VARM0, VARM1, VARM2, VARM3, VARM4

Files

  • temporal.safetensors β€” Temporal transformer (4-bit quantized, group_size=64)
  • depformer.safetensors β€” Depformer layers + input projections (fp16)
  • embeddings.safetensors β€” Text/audio embeddings + output heads (fp16)
  • mimi.safetensors β€” Mimi neural audio codec (fp16)
  • voices/*.safetensors β€” Voice preset embeddings
  • tokenizer_spm_32k_3.model β€” SentencePiece tokenizer
  • config.json β€” Model configuration

Quantization

  • Temporal transformer attention (Q/K/V output projections) and FFN: 4-bit with group_size=64
  • Attention input projection (in_proj): kept fp16 (packed Q+K+V format)
  • Depformer: kept fp16 (~50 MB, not worth quantizing)
  • Mimi codec: kept fp16 (audio quality sensitive)

Usage

import PersonaPlex

let model = try await PersonaPlexModel.fromPretrained()
let response = model.respond(
    userAudio: audioSamples,  // [Float] 24kHz mono
    voice: .NATM0,
    maxSteps: 500
)

CLI

swift run personaplex-cli --input question.wav --output response.wav --voice NATM0

See ivan-digital/qwen3-asr-swift for build instructions.

License

CC-BY-NC-4.0 (same as upstream PersonaPlex)

Citation

@article{nguyen2025personaplex,
  title={PersonaPlex: Enhancing Human-Centric AI Through Full-Duplex Multi-Turn Conversations With Persona-Conditioned Voice Responses},
  author={Nguyen, Tu Anh and others},
  journal={arXiv preprint arXiv:2504.07966},
  year={2025}
}
Downloads last month
89
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for aufklarer/PersonaPlex-7B-MLX-4bit

Finetuned
(29)
this model

Paper for aufklarer/PersonaPlex-7B-MLX-4bit