--- license: cc-by-4.0 library_name: mlx base_model: kyutai/mimi pipeline_tag: feature-extraction tags: - mlx - audio - audio-codec - neural-codec - mimi - rvq - apple-silicon --- # mlx-community/mimi-encoder-mlx The **encoder** half of Kyutai's [Mimi](https://huggingface.co/kyutai/mimi) neural audio codec, converted to MLX format for native inference on Apple Silicon and consumed by the [`xocialize/mimi-encoder-mlx-swift`](https://github.com/xocialize/mimi-encoder-mlx-swift) Swift port. Refer to the [original model card](https://huggingface.co/kyutai/mimi) for full details. ## Model - **Family:** Mimi neural audio codec (Kyutai / Moshi — Défossez et al., [arXiv:2410.00037](https://arxiv.org/abs/2410.00037)) - **This artifact:** the **encoder** only (SEANet conv encoder → causal transformer → stride-2 downsample → split RVQ) - **Input:** 24000 Hz, mono - **Output:** `[16, T]` codebook-index grid at 12.5 Hz (1 semantic + 15 acoustic codebooks) - **Precision:** fp32 (145 tensors) ## Files - `encoder.safetensors` — the MLX encoder weights (fp32), extracted/converted from `kyutai/mimi`. ## Usage (Swift / MLX) ```swift import MimiCodecEncoder let encoder = MimiEncoder(config: .qwen3TTS12Hz) try encoder.loadWeights(from: encoderWeightsURL) // encoder.safetensors let codes = encoder.encode(audio: audioArray) // [16, T] ``` ## Source - **Original model:** https://huggingface.co/kyutai/mimi - **Swift consumer:** https://github.com/xocialize/mimi-encoder-mlx-swift ## License CC-BY-4.0 (Kyutai) — permissive, attribution required. This is a derivative (encoder-only, format-converted) of `kyutai/mimi`; attribution to Kyutai is retained.