File size: 1,685 Bytes
e9727e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
license: cc-by-4.0
library_name: mlx
base_model: kyutai/mimi
pipeline_tag: feature-extraction
tags:
- mlx
- audio
- audio-codec
- neural-codec
- mimi
- rvq
- apple-silicon
---

# mlx-community/mimi-encoder-mlx

The **encoder** half of Kyutai's [Mimi](https://huggingface.co/kyutai/mimi) neural audio codec,
converted to MLX format for native inference on Apple Silicon and consumed by the
[`xocialize/mimi-encoder-mlx-swift`](https://github.com/xocialize/mimi-encoder-mlx-swift) Swift
port. Refer to the [original model card](https://huggingface.co/kyutai/mimi) for full details.

## Model

- **Family:** Mimi neural audio codec (Kyutai / Moshi — Défossez et al., [arXiv:2410.00037](https://arxiv.org/abs/2410.00037))
- **This artifact:** the **encoder** only (SEANet conv encoder → causal transformer → stride-2 downsample → split RVQ)
- **Input:** 24000 Hz, mono
- **Output:** `[16, T]` codebook-index grid at 12.5 Hz (1 semantic + 15 acoustic codebooks)
- **Precision:** fp32 (145 tensors)

## Files

- `encoder.safetensors` — the MLX encoder weights (fp32), extracted/converted from `kyutai/mimi`.

## Usage (Swift / MLX)

```swift
import MimiCodecEncoder

let encoder = MimiEncoder(config: .qwen3TTS12Hz)
try encoder.loadWeights(from: encoderWeightsURL)   // encoder.safetensors
let codes = encoder.encode(audio: audioArray)      // [16, T]
```

## Source

- **Original model:** https://huggingface.co/kyutai/mimi
- **Swift consumer:** https://github.com/xocialize/mimi-encoder-mlx-swift

## License

CC-BY-4.0 (Kyutai) — permissive, attribution required. This is a derivative (encoder-only,
format-converted) of `kyutai/mimi`; attribution to Kyutai is retained.