LiquidAI
/

LFM2-Audio-1.5B

@@ -15,21 +15,16 @@ library_name: liquid-audio
 pipeline_tag: audio-to-audio
 base_model:
 - LiquidAI/LFM2-1.2B
-new_version: LiquidAI/LFM2.5-Audio-1.5B
 ---
 <center>
 <div style="text-align: center;">
   <img
-    src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/2b08LKpev0DNEk6DlnWkY.png"
     alt="Liquid AI"
-    style="width: 100%; max-width: 100%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
   />
 </div>
-<div style="display: flex; justify-content: center; gap: 0.5em;">
-  <a href="https://playground.liquid.ai/chat">
-<a href="https://playground.liquid.ai/"><strong>Try LFM</strong></a> • <a href="https://docs.liquid.ai/lfm"><strong>Documentation</strong></a> • <a href="https://leap.liquid.ai/"><strong>LEAP</strong></a></a>
-</div>
 </center>
 # LFM2‑Audio-1.5B
@@ -167,23 +162,22 @@ Please visit the `liquid-audio` [package repository](https://github.com/Liquid4A
 Higher is better. AlpacaEval, CommonEval and WildVoice are scored out of 5.
-| Model           | Components & Size | AlpacaEval | CommonEval | WildVoice | SD-QA | MMSU  | OBQA  | BBH   | IFEval | ADVBench | Overall |
-| --------------- | ----------------- | ---------- | ---------- | --------- | ----- | ----- | ----- | ----- | ------ | -------- | ------- |
-| LFM2-Audio-1.5B | 1.5B parameters   | 3.71       | 3.49       | 3.17      | 30.56 | 31.95 | 44.40 | 30.54 | 98.85  | 67.33    | 56.78   |
-| Moshi           | 7B parameters     | 2.01       | 1.60       | 1.30      | 15.64 | 24.04 | 25.93 | 47.40 | 10.12  | 44.23    | 29.51   |
-| Qwen2.5-Omni-3B | 5B parameters     | 3.72       | 3.51       | 3.42      | 44.94 | 55.29 | 76.26 | 61.30 | 32.90  | 88.46    | 63.57   |
-| Mini-Omni2      | 0.6B parameters   | 2.32       | 2.18       | 1.79      | 9.31  | 24.27 | 26.59 | 46.40 | 11.56  | 57.50    | 33.49   |
 ### ASR
 Word Error Rate (WER), lower is better.
-| Model                | Components & Size | Audio output  | Open | AMI   | GigaSpeech | LibriSpeech-clean | LibriSpeech-other | TED-LIUM | Average |
-| -------------------- | ----------------- | ------------- | ---- | ----- | ---------- | ----------------- | ----------------- | -------- | ------- |
-| LFM2-Audio-1.5B      | 1.5B parameters   | Yes           | Yes  | 15.58 | 10.67      | 2.01              | 4.39              | 3.56     | 7.24    |
-| Qwen2.5-Omni-3B      | 5B parameters     | Yes           | Yes  | 15.95 | 10.02      | 2.01              | 3.91              | 3.86     | 7.15    |
-| Whisper-large-V3     | 1.5B parameters   | No — ASR only | Yes  | 16.73 | 10.76      | 2.73              | 5.54              | 3.91     | 7.93    |
-| elevenlabs/scribe_v1 | unknown           | No — ASR only | No   | 14.43 | 9.66       | 1.79              | 3.31              | 3.17     | 6.47    |
 ## 📬 Contact
@@ -195,14 +189,3 @@ The code in this the package repository and associated weights are licensed unde
 The code for the audio encoder is based on [Nvidia NeMo](https://github.com/NVIDIA-NeMo/NeMo/tree/main), licensed under [Apache 2.0](https://github.com/NVIDIA-NeMo/NeMo/blob/294ddff187f68c055d87ffe9400e65975b38693d/LICENSE), and the [canary-180m-flash](https://huggingface.co/nvidia/canary-180m-flash) checkpoint, licensed under [CC-BY 4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md). To simplify dependency resolution, we also ship the Python code of [Kyutai Mimi](https://github.com/kyutai-labs/moshi), licensed under the [MIT License](https://github.com/kyutai-labs/moshi/blob/aee53fc0fc0119e4d7343e5ea4dd6ddafd7f09c4/LICENSE-MIT).
 We also redistribute weights for [Kyutai Mimi](https://huggingface.co/kyutai/moshiko-pytorch-bf16), licensed under [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md).
-## Citation
-```
-@article{liquidai2025lfm2,
- title={LFM2 Technical Report},
- author={Liquid AI},
- journal={arXiv preprint arXiv:2511.23404},
- year={2025}
-}
-```

 pipeline_tag: audio-to-audio
 base_model:
 - LiquidAI/LFM2-1.2B
 ---
 <center>
 <div style="text-align: center;">
   <img
+    src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/7_6D7rWrLxp2hb6OHSV1p.png"
     alt="Liquid AI"
+    style="width: 100%; max-width: 66%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
   />
 </div>
 </center>
 # LFM2‑Audio-1.5B
 Higher is better. AlpacaEval, CommonEval and WildVoice are scored out of 5.
+| Model | Components & Size | AlpacaEval | CommonEval | WildVoice | SD-QA | MMSU | OBQA | BBH | IFEval | ADVBench |
+|-------|-------------------|------------|------------|-----------|-------|------|------|-----|--------|----------|
+| LFM2-Audio-1.5B | 1.2B (LLM) + 115M (audio encoder) + 100M (audio decoder) | 3.78 | 3.48 | 3.12 | 34.81 | 33.99 | 45.49 | 51.2 | 30.13 | 98.85 |
+| Qwen2.5Omni-3B | 3.4B (LLM) + 638M (audio encoder) + 834M (audio decoder) | 3.72 | 3.51 | 3.42 | 44.94 | 55.29 | 76.26 | 61.3 | 32.9 | 88.46 |
+| Moshi | 7B (LLM) + 79M (audio tokenizer) | 2.01 | 1.6 | 1.3 | 15.64 | 24.04 | 25.93 | 47.4 | 10.12 | 44.23 |
+| MiniOmni2 | 0.5B (LLM) + 99M (audio encoder) + 39M (audio decoder) | 2.32 | 2.18 | 1.79 | 9.31 | 24.27 | 26.59 | 46.4 | 11.56 | 57.5 |
 ### ASR
 Word Error Rate (WER), lower is better.
+| Model | Components & Size | AMI | Earnings22 | Gigaspeech | Librispeech-clean | Librispeech-other | Tedlium | VoxPopuli |
+|-------|-------------------|-----|------------|------------|-------------------|-------------------|---------|-----------|
+| LFM2-Audio-1.5B | 1.2B (LLM) + 115M (audio encoder) + 100M (audio decoder) | 15.36 | 19.75 | 10.63 | 2.03 | 4.39 | 3.56 | 9.93 |
+| Qwen2.5Omni-3B | 3.4B (LLM) + 638M (audio encoder) + 834M (audio decoder) | 15.05 | 15.81 | 11.76 | 2.14 | 4.52 | 5.08 | 6.59 |
+| Whisper-large-v3-turbo | 0.8B (ASR model only) | 16.13 | 11.63 | 10.14 | 2.1 | 4.24 | 3.57 | 11.87 |
 ## 📬 Contact
 The code for the audio encoder is based on [Nvidia NeMo](https://github.com/NVIDIA-NeMo/NeMo/tree/main), licensed under [Apache 2.0](https://github.com/NVIDIA-NeMo/NeMo/blob/294ddff187f68c055d87ffe9400e65975b38693d/LICENSE), and the [canary-180m-flash](https://huggingface.co/nvidia/canary-180m-flash) checkpoint, licensed under [CC-BY 4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md). To simplify dependency resolution, we also ship the Python code of [Kyutai Mimi](https://github.com/kyutai-labs/moshi), licensed under the [MIT License](https://github.com/kyutai-labs/moshi/blob/aee53fc0fc0119e4d7343e5ea4dd6ddafd7f09c4/LICENSE-MIT).
 We also redistribute weights for [Kyutai Mimi](https://huggingface.co/kyutai/moshiko-pytorch-bf16), licensed under [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md).

config.json CHANGED Viewed

@@ -60,7 +60,7 @@
     "block_auto_adjust_ff_dim": true,
     "block_dim": 2048,
     "block_ff_dim": 12288,
-    "block_ffn_dim_multiplier": 1.0,
     "block_mlp_init_scale": 1,
     "block_multiple_of": 256,
     "block_norm_eps": 1e-05,

     "block_auto_adjust_ff_dim": true,
     "block_dim": 2048,
     "block_ff_dim": 12288,
+    "block_ffn_dim_multiplier": 1,
     "block_mlp_init_scale": 1,
     "block_multiple_of": 256,
     "block_norm_eps": 1e-05,