PersonaPlex-7B MLX 8-bit

PersonaPlex 7B full-duplex speech-to-speech model converted to MLX safetensors with 8-bit quantization for Apple Silicon.

Converted from nvidia/personaplex-7b-v1 (based on Kyutai Moshi architecture).

Swift inference: ivan-digital/qwen3-asr-swift

Model Details

Component Architecture Size
Temporal Transformer 32-layer, 4096d, 32 heads (7B params) ~6.5 GB (8-bit)
Depformer 6-layer, 1024d, 16 heads, per-codebook weights ~1.3 GB (8-bit)
Mimi Codec SEANet encoder/decoder + 8L transformer + 16 RVQ codebooks ~370 MB (fp16)
Embeddings Text + 16 audio embeddings + output heads ~940 MB (fp16)
Total ~9.1 GB

Usage

let model = try await PersonaPlexModel.fromPretrained(
    modelId: "aufklarer/PersonaPlex-7B-MLX-8bit"
)
let response = model.respond(audio: samples, voice: .NATF0, steps: 100)
audio personaplex input.wav --model aufklarer/PersonaPlex-7B-MLX-8bit -o output.wav

Variants

Variant Quantization Size Model ID
4-bit 4-bit ~4.9 GB aufklarer/PersonaPlex-7B-MLX-4bit
8-bit 8-bit ~9.1 GB aufklarer/PersonaPlex-7B-MLX-8bit

Voices

18 voice presets available: NATF0-3, NATM0-3, VARF0-4, VARM0-4

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aufklarer/PersonaPlex-7B-MLX-8bit

Finetuned
(31)
this model

Paper for aufklarer/PersonaPlex-7B-MLX-8bit