What it takes to solve the Hubble tension through scale-dependent modifications of the primordial power spectrum
Paper
β’ 2504.07966 β’ Published
PersonaPlex 7B full-duplex speech-to-speech model converted to MLX safetensors with 4-bit quantization for Apple Silicon.
Converted from nvidia/personaplex-7b-v1 (based on Kyutai Moshi architecture).
Swift inference: ivan-digital/qwen3-asr-swift
| Component | Architecture | Size |
|---|---|---|
| Temporal Transformer | 32-layer, 4096d, 32 heads (7B params) | ~3.5 GB (4-bit) |
| Depformer | 6-layer, 1024d, 16 heads, per-codebook weights | ~50 MB (fp16) |
| Mimi Codec | SEANet encoder/decoder + 8L transformer + 16 RVQ codebooks | ~370 MB (fp16) |
| Embeddings | Text + 16 audio embeddings + output heads | ~940 MB (fp16) |
| Total | ~4.9 GB |
[User Audio 24kHz] β [Mimi Encoder] β 16 codebook tokens @ 12.5Hz
β
[Temporal Transformer: 32L, dim=4096, 7B params]
17 streams: text + 8 user audio + 8 agent audio
β
[Depformer: 6L, dim=1024, per-codebook weights]
16 sequential steps β agent audio codebook tokens
β
[Agent Audio 24kHz] β [Mimi Decoder] β codebook tokens @ 12.5Hz
18 voice presets available:
| Category | Voices |
|---|---|
| Natural Female | NATF0, NATF1, NATF2, NATF3 |
| Natural Male | NATM0, NATM1, NATM2, NATM3 |
| Variety Female | VARF0, VARF1, VARF2, VARF3, VARF4 |
| Variety Male | VARM0, VARM1, VARM2, VARM3, VARM4 |
temporal.safetensors β Temporal transformer (4-bit quantized, group_size=64)depformer.safetensors β Depformer layers + input projections (fp16)embeddings.safetensors β Text/audio embeddings + output heads (fp16)mimi.safetensors β Mimi neural audio codec (fp16)voices/*.safetensors β Voice preset embeddingstokenizer_spm_32k_3.model β SentencePiece tokenizerconfig.json β Model configurationin_proj): kept fp16 (packed Q+K+V format)import PersonaPlex
let model = try await PersonaPlexModel.fromPretrained()
let response = model.respond(
userAudio: audioSamples, // [Float] 24kHz mono
voice: .NATM0,
maxSteps: 500
)
swift run personaplex-cli --input question.wav --output response.wav --voice NATM0
See ivan-digital/qwen3-asr-swift for build instructions.
CC-BY-NC-4.0 (same as upstream PersonaPlex)
@article{nguyen2025personaplex,
title={PersonaPlex: Enhancing Human-Centric AI Through Full-Duplex Multi-Turn Conversations With Persona-Conditioned Voice Responses},
author={Nguyen, Tu Anh and others},
journal={arXiv preprint arXiv:2504.07966},
year={2025}
}
Quantized