mimi-encoder-mlx / README.md
xocialize's picture
Add Mimi encoder MLX weights (fp32) + model card
e9727e8 verified
metadata
license: cc-by-4.0
library_name: mlx
base_model: kyutai/mimi
pipeline_tag: feature-extraction
tags:
  - mlx
  - audio
  - audio-codec
  - neural-codec
  - mimi
  - rvq
  - apple-silicon

mlx-community/mimi-encoder-mlx

The encoder half of Kyutai's Mimi neural audio codec, converted to MLX format for native inference on Apple Silicon and consumed by the xocialize/mimi-encoder-mlx-swift Swift port. Refer to the original model card for full details.

Model

  • Family: Mimi neural audio codec (Kyutai / Moshi — Défossez et al., arXiv:2410.00037)
  • This artifact: the encoder only (SEANet conv encoder → causal transformer → stride-2 downsample → split RVQ)
  • Input: 24000 Hz, mono
  • Output: [16, T] codebook-index grid at 12.5 Hz (1 semantic + 15 acoustic codebooks)
  • Precision: fp32 (145 tensors)

Files

  • encoder.safetensors — the MLX encoder weights (fp32), extracted/converted from kyutai/mimi.

Usage (Swift / MLX)

import MimiCodecEncoder

let encoder = MimiEncoder(config: .qwen3TTS12Hz)
try encoder.loadWeights(from: encoderWeightsURL)   // encoder.safetensors
let codes = encoder.encode(audio: audioArray)      // [16, T]

Source

License

CC-BY-4.0 (Kyutai) — permissive, attribution required. This is a derivative (encoder-only, format-converted) of kyutai/mimi; attribution to Kyutai is retained.