Fix typo(s) in README.md
#8
by
notcaleb
- opened
README.md
CHANGED
|
@@ -15,21 +15,16 @@ library_name: liquid-audio
|
|
| 15 |
pipeline_tag: audio-to-audio
|
| 16 |
base_model:
|
| 17 |
- LiquidAI/LFM2-1.2B
|
| 18 |
-
new_version: LiquidAI/LFM2.5-Audio-1.5B
|
| 19 |
---
|
| 20 |
|
| 21 |
<center>
|
| 22 |
<div style="text-align: center;">
|
| 23 |
<img
|
| 24 |
-
src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/
|
| 25 |
alt="Liquid AI"
|
| 26 |
-
style="width: 100%; max-width:
|
| 27 |
/>
|
| 28 |
</div>
|
| 29 |
-
<div style="display: flex; justify-content: center; gap: 0.5em;">
|
| 30 |
-
<a href="https://playground.liquid.ai/chat">
|
| 31 |
-
<a href="https://playground.liquid.ai/"><strong>Try LFM</strong></a> • <a href="https://docs.liquid.ai/lfm"><strong>Documentation</strong></a> • <a href="https://leap.liquid.ai/"><strong>LEAP</strong></a></a>
|
| 32 |
-
</div>
|
| 33 |
</center>
|
| 34 |
|
| 35 |
# LFM2‑Audio-1.5B
|
|
@@ -167,23 +162,22 @@ Please visit the `liquid-audio` [package repository](https://github.com/Liquid4A
|
|
| 167 |
|
| 168 |
Higher is better. AlpacaEval, CommonEval and WildVoice are scored out of 5.
|
| 169 |
|
| 170 |
-
| Model
|
| 171 |
-
|
| 172 |
-
| LFM2-Audio-1.5B | 1.
|
| 173 |
-
|
|
| 174 |
-
|
|
| 175 |
-
|
|
| 176 |
|
| 177 |
### ASR
|
| 178 |
|
| 179 |
Word Error Rate (WER), lower is better.
|
| 180 |
|
| 181 |
-
| Model
|
| 182 |
-
|
| 183 |
-
| LFM2-Audio-1.5B
|
| 184 |
-
| Qwen2.
|
| 185 |
-
| Whisper-large-
|
| 186 |
-
| elevenlabs/scribe_v1 | unknown | No — ASR only | No | 14.43 | 9.66 | 1.79 | 3.31 | 3.17 | 6.47 |
|
| 187 |
|
| 188 |
|
| 189 |
## 📬 Contact
|
|
@@ -195,14 +189,3 @@ The code in this the package repository and associated weights are licensed unde
|
|
| 195 |
|
| 196 |
The code for the audio encoder is based on [Nvidia NeMo](https://github.com/NVIDIA-NeMo/NeMo/tree/main), licensed under [Apache 2.0](https://github.com/NVIDIA-NeMo/NeMo/blob/294ddff187f68c055d87ffe9400e65975b38693d/LICENSE), and the [canary-180m-flash](https://huggingface.co/nvidia/canary-180m-flash) checkpoint, licensed under [CC-BY 4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md). To simplify dependency resolution, we also ship the Python code of [Kyutai Mimi](https://github.com/kyutai-labs/moshi), licensed under the [MIT License](https://github.com/kyutai-labs/moshi/blob/aee53fc0fc0119e4d7343e5ea4dd6ddafd7f09c4/LICENSE-MIT).
|
| 197 |
We also redistribute weights for [Kyutai Mimi](https://huggingface.co/kyutai/moshiko-pytorch-bf16), licensed under [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md).
|
| 198 |
-
|
| 199 |
-
## Citation
|
| 200 |
-
|
| 201 |
-
```
|
| 202 |
-
@article{liquidai2025lfm2,
|
| 203 |
-
title={LFM2 Technical Report},
|
| 204 |
-
author={Liquid AI},
|
| 205 |
-
journal={arXiv preprint arXiv:2511.23404},
|
| 206 |
-
year={2025}
|
| 207 |
-
}
|
| 208 |
-
```
|
|
|
|
| 15 |
pipeline_tag: audio-to-audio
|
| 16 |
base_model:
|
| 17 |
- LiquidAI/LFM2-1.2B
|
|
|
|
| 18 |
---
|
| 19 |
|
| 20 |
<center>
|
| 21 |
<div style="text-align: center;">
|
| 22 |
<img
|
| 23 |
+
src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/7_6D7rWrLxp2hb6OHSV1p.png"
|
| 24 |
alt="Liquid AI"
|
| 25 |
+
style="width: 100%; max-width: 66%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
|
| 26 |
/>
|
| 27 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
</center>
|
| 29 |
|
| 30 |
# LFM2‑Audio-1.5B
|
|
|
|
| 162 |
|
| 163 |
Higher is better. AlpacaEval, CommonEval and WildVoice are scored out of 5.
|
| 164 |
|
| 165 |
+
| Model | Components & Size | AlpacaEval | CommonEval | WildVoice | SD-QA | MMSU | OBQA | BBH | IFEval | ADVBench |
|
| 166 |
+
|-------|-------------------|------------|------------|-----------|-------|------|------|-----|--------|----------|
|
| 167 |
+
| LFM2-Audio-1.5B | 1.2B (LLM) + 115M (audio encoder) + 100M (audio decoder) | 3.78 | 3.48 | 3.12 | 34.81 | 33.99 | 45.49 | 51.2 | 30.13 | 98.85 |
|
| 168 |
+
| Qwen2.5Omni-3B | 3.4B (LLM) + 638M (audio encoder) + 834M (audio decoder) | 3.72 | 3.51 | 3.42 | 44.94 | 55.29 | 76.26 | 61.3 | 32.9 | 88.46 |
|
| 169 |
+
| Moshi | 7B (LLM) + 79M (audio tokenizer) | 2.01 | 1.6 | 1.3 | 15.64 | 24.04 | 25.93 | 47.4 | 10.12 | 44.23 |
|
| 170 |
+
| MiniOmni2 | 0.5B (LLM) + 99M (audio encoder) + 39M (audio decoder) | 2.32 | 2.18 | 1.79 | 9.31 | 24.27 | 26.59 | 46.4 | 11.56 | 57.5 |
|
| 171 |
|
| 172 |
### ASR
|
| 173 |
|
| 174 |
Word Error Rate (WER), lower is better.
|
| 175 |
|
| 176 |
+
| Model | Components & Size | AMI | Earnings22 | Gigaspeech | Librispeech-clean | Librispeech-other | Tedlium | VoxPopuli |
|
| 177 |
+
|-------|-------------------|-----|------------|------------|-------------------|-------------------|---------|-----------|
|
| 178 |
+
| LFM2-Audio-1.5B | 1.2B (LLM) + 115M (audio encoder) + 100M (audio decoder) | 15.36 | 19.75 | 10.63 | 2.03 | 4.39 | 3.56 | 9.93 |
|
| 179 |
+
| Qwen2.5Omni-3B | 3.4B (LLM) + 638M (audio encoder) + 834M (audio decoder) | 15.05 | 15.81 | 11.76 | 2.14 | 4.52 | 5.08 | 6.59 |
|
| 180 |
+
| Whisper-large-v3-turbo | 0.8B (ASR model only) | 16.13 | 11.63 | 10.14 | 2.1 | 4.24 | 3.57 | 11.87 |
|
|
|
|
| 181 |
|
| 182 |
|
| 183 |
## 📬 Contact
|
|
|
|
| 189 |
|
| 190 |
The code for the audio encoder is based on [Nvidia NeMo](https://github.com/NVIDIA-NeMo/NeMo/tree/main), licensed under [Apache 2.0](https://github.com/NVIDIA-NeMo/NeMo/blob/294ddff187f68c055d87ffe9400e65975b38693d/LICENSE), and the [canary-180m-flash](https://huggingface.co/nvidia/canary-180m-flash) checkpoint, licensed under [CC-BY 4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md). To simplify dependency resolution, we also ship the Python code of [Kyutai Mimi](https://github.com/kyutai-labs/moshi), licensed under the [MIT License](https://github.com/kyutai-labs/moshi/blob/aee53fc0fc0119e4d7343e5ea4dd6ddafd7f09c4/LICENSE-MIT).
|
| 191 |
We also redistribute weights for [Kyutai Mimi](https://huggingface.co/kyutai/moshiko-pytorch-bf16), licensed under [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|