Fix typo(s) in README.md

#8
by notcaleb - opened
Files changed (1) hide show
  1. README.md +13 -30
README.md CHANGED
@@ -15,21 +15,16 @@ library_name: liquid-audio
15
  pipeline_tag: audio-to-audio
16
  base_model:
17
  - LiquidAI/LFM2-1.2B
18
- new_version: LiquidAI/LFM2.5-Audio-1.5B
19
  ---
20
 
21
  <center>
22
  <div style="text-align: center;">
23
  <img
24
- src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/2b08LKpev0DNEk6DlnWkY.png"
25
  alt="Liquid AI"
26
- style="width: 100%; max-width: 100%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
27
  />
28
  </div>
29
- <div style="display: flex; justify-content: center; gap: 0.5em;">
30
- <a href="https://playground.liquid.ai/chat">
31
- <a href="https://playground.liquid.ai/"><strong>Try LFM</strong></a> • <a href="https://docs.liquid.ai/lfm"><strong>Documentation</strong></a> • <a href="https://leap.liquid.ai/"><strong>LEAP</strong></a></a>
32
- </div>
33
  </center>
34
 
35
  # LFM2‑Audio-1.5B
@@ -167,23 +162,22 @@ Please visit the `liquid-audio` [package repository](https://github.com/Liquid4A
167
 
168
  Higher is better. AlpacaEval, CommonEval and WildVoice are scored out of 5.
169
 
170
- | Model | Components & Size | AlpacaEval | CommonEval | WildVoice | SD-QA | MMSU | OBQA | BBH | IFEval | ADVBench | Overall |
171
- | --------------- | ----------------- | ---------- | ---------- | --------- | ----- | ----- | ----- | ----- | ------ | -------- | ------- |
172
- | LFM2-Audio-1.5B | 1.5B parameters | 3.71 | 3.49 | 3.17 | 30.56 | 31.95 | 44.40 | 30.54 | 98.85 | 67.33 | 56.78 |
173
- | Moshi | 7B parameters | 2.01 | 1.60 | 1.30 | 15.64 | 24.04 | 25.93 | 47.40 | 10.12 | 44.23 | 29.51 |
174
- | Qwen2.5-Omni-3B | 5B parameters | 3.72 | 3.51 | 3.42 | 44.94 | 55.29 | 76.26 | 61.30 | 32.90 | 88.46 | 63.57 |
175
- | Mini-Omni2 | 0.6B parameters | 2.32 | 2.18 | 1.79 | 9.31 | 24.27 | 26.59 | 46.40 | 11.56 | 57.50 | 33.49 |
176
 
177
  ### ASR
178
 
179
  Word Error Rate (WER), lower is better.
180
 
181
- | Model | Components & Size | Audio output | Open | AMI | GigaSpeech | LibriSpeech-clean | LibriSpeech-other | TED-LIUM | Average |
182
- | -------------------- | ----------------- | ------------- | ---- | ----- | ---------- | ----------------- | ----------------- | -------- | ------- |
183
- | LFM2-Audio-1.5B | 1.5B parameters | Yes | Yes | 15.58 | 10.67 | 2.01 | 4.39 | 3.56 | 7.24 |
184
- | Qwen2.5-Omni-3B | 5B parameters | Yes | Yes | 15.95 | 10.02 | 2.01 | 3.91 | 3.86 | 7.15 |
185
- | Whisper-large-V3 | 1.5B parameters | No — ASR only | Yes | 16.73 | 10.76 | 2.73 | 5.54 | 3.91 | 7.93 |
186
- | elevenlabs/scribe_v1 | unknown | No — ASR only | No | 14.43 | 9.66 | 1.79 | 3.31 | 3.17 | 6.47 |
187
 
188
 
189
  ## 📬 Contact
@@ -195,14 +189,3 @@ The code in this the package repository and associated weights are licensed unde
195
 
196
  The code for the audio encoder is based on [Nvidia NeMo](https://github.com/NVIDIA-NeMo/NeMo/tree/main), licensed under [Apache 2.0](https://github.com/NVIDIA-NeMo/NeMo/blob/294ddff187f68c055d87ffe9400e65975b38693d/LICENSE), and the [canary-180m-flash](https://huggingface.co/nvidia/canary-180m-flash) checkpoint, licensed under [CC-BY 4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md). To simplify dependency resolution, we also ship the Python code of [Kyutai Mimi](https://github.com/kyutai-labs/moshi), licensed under the [MIT License](https://github.com/kyutai-labs/moshi/blob/aee53fc0fc0119e4d7343e5ea4dd6ddafd7f09c4/LICENSE-MIT).
197
  We also redistribute weights for [Kyutai Mimi](https://huggingface.co/kyutai/moshiko-pytorch-bf16), licensed under [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md).
198
-
199
- ## Citation
200
-
201
- ```
202
- @article{liquidai2025lfm2,
203
- title={LFM2 Technical Report},
204
- author={Liquid AI},
205
- journal={arXiv preprint arXiv:2511.23404},
206
- year={2025}
207
- }
208
- ```
 
15
  pipeline_tag: audio-to-audio
16
  base_model:
17
  - LiquidAI/LFM2-1.2B
 
18
  ---
19
 
20
  <center>
21
  <div style="text-align: center;">
22
  <img
23
+ src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/7_6D7rWrLxp2hb6OHSV1p.png"
24
  alt="Liquid AI"
25
+ style="width: 100%; max-width: 66%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
26
  />
27
  </div>
 
 
 
 
28
  </center>
29
 
30
  # LFM2‑Audio-1.5B
 
162
 
163
  Higher is better. AlpacaEval, CommonEval and WildVoice are scored out of 5.
164
 
165
+ | Model | Components & Size | AlpacaEval | CommonEval | WildVoice | SD-QA | MMSU | OBQA | BBH | IFEval | ADVBench |
166
+ |-------|-------------------|------------|------------|-----------|-------|------|------|-----|--------|----------|
167
+ | LFM2-Audio-1.5B | 1.2B (LLM) + 115M (audio encoder) + 100M (audio decoder) | 3.78 | 3.48 | 3.12 | 34.81 | 33.99 | 45.49 | 51.2 | 30.13 | 98.85 |
168
+ | Qwen2.5Omni-3B | 3.4B (LLM) + 638M (audio encoder) + 834M (audio decoder) | 3.72 | 3.51 | 3.42 | 44.94 | 55.29 | 76.26 | 61.3 | 32.9 | 88.46 |
169
+ | Moshi | 7B (LLM) + 79M (audio tokenizer) | 2.01 | 1.6 | 1.3 | 15.64 | 24.04 | 25.93 | 47.4 | 10.12 | 44.23 |
170
+ | MiniOmni2 | 0.5B (LLM) + 99M (audio encoder) + 39M (audio decoder) | 2.32 | 2.18 | 1.79 | 9.31 | 24.27 | 26.59 | 46.4 | 11.56 | 57.5 |
171
 
172
  ### ASR
173
 
174
  Word Error Rate (WER), lower is better.
175
 
176
+ | Model | Components & Size | AMI | Earnings22 | Gigaspeech | Librispeech-clean | Librispeech-other | Tedlium | VoxPopuli |
177
+ |-------|-------------------|-----|------------|------------|-------------------|-------------------|---------|-----------|
178
+ | LFM2-Audio-1.5B | 1.2B (LLM) + 115M (audio encoder) + 100M (audio decoder) | 15.36 | 19.75 | 10.63 | 2.03 | 4.39 | 3.56 | 9.93 |
179
+ | Qwen2.5Omni-3B | 3.4B (LLM) + 638M (audio encoder) + 834M (audio decoder) | 15.05 | 15.81 | 11.76 | 2.14 | 4.52 | 5.08 | 6.59 |
180
+ | Whisper-large-v3-turbo | 0.8B (ASR model only) | 16.13 | 11.63 | 10.14 | 2.1 | 4.24 | 3.57 | 11.87 |
 
181
 
182
 
183
  ## 📬 Contact
 
189
 
190
  The code for the audio encoder is based on [Nvidia NeMo](https://github.com/NVIDIA-NeMo/NeMo/tree/main), licensed under [Apache 2.0](https://github.com/NVIDIA-NeMo/NeMo/blob/294ddff187f68c055d87ffe9400e65975b38693d/LICENSE), and the [canary-180m-flash](https://huggingface.co/nvidia/canary-180m-flash) checkpoint, licensed under [CC-BY 4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md). To simplify dependency resolution, we also ship the Python code of [Kyutai Mimi](https://github.com/kyutai-labs/moshi), licensed under the [MIT License](https://github.com/kyutai-labs/moshi/blob/aee53fc0fc0119e4d7343e5ea4dd6ddafd7f09c4/LICENSE-MIT).
191
  We also redistribute weights for [Kyutai Mimi](https://huggingface.co/kyutai/moshiko-pytorch-bf16), licensed under [CC-BY-4.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/cc-by-4.0.md).