Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# SPEAR Base (speech + general audio)
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
This model was pre-trained on 97k hours of mixture data of English speech and general audio, among which 84k hours are speech data, and the rest 13k hours are general audio data. It achieves competitive performance (compared with models with similar sizes) on [SUPERB](https://arxiv.org/abs/2105.01051) benchmark and on [HEAR](https://arxiv.org/abs/2203.03022) benchmark.
|
| 6 |
|
|
@@ -27,7 +38,7 @@ The audio data consists of the following datasets:
|
|
| 27 |
|
| 28 |
|
| 29 |
|
| 30 |
-
[Paper](https://arxiv.org/abs/2510.
|
| 31 |
|
| 32 |
Authors: Xiaoyu Yang, Yifan Yang, Zengrui Jin, Ziyun Cui, Wen Wu, Baoxiang Li, Chao Zhang, Phil Woodland
|
| 33 |
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
# SPEAR Base (speech + general audio)
|
| 6 |
|
| 7 |
+
## UPDATE (2026.Feb)
|
| 8 |
+
|
| 9 |
+
We have an [**updated version**](https://huggingface.co/marcoyang/spear-base-speech-audio-v2) of this model with enhanced capability on overlapped/noisy speech.
|
| 10 |
+
**We recommend using the updated version of the model**. Please refer to our [paper](https://arxiv.org/abs/2510.25955) for more detail.
|
| 11 |
+
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
This is the first version [SPEAR](https://arxiv.org/abs/2510.25955v1) Base dual-domain (speech + general audio) model. The model adopts a [Zipformer](https://arxiv.org/abs/2310.11230) backbone with 93M parameters consisting of 12 Zipformer stacks. It generates 512-dimensional representations at approximately 50~Hz.
|
| 15 |
|
| 16 |
This model was pre-trained on 97k hours of mixture data of English speech and general audio, among which 84k hours are speech data, and the rest 13k hours are general audio data. It achieves competitive performance (compared with models with similar sizes) on [SUPERB](https://arxiv.org/abs/2105.01051) benchmark and on [HEAR](https://arxiv.org/abs/2203.03022) benchmark.
|
| 17 |
|
|
|
|
| 38 |
|
| 39 |
|
| 40 |
|
| 41 |
+
[Paper](https://arxiv.org/abs/2510.25955v1)
|
| 42 |
|
| 43 |
Authors: Xiaoyu Yang, Yifan Yang, Zengrui Jin, Ziyun Cui, Wen Wu, Baoxiang Li, Chao Zhang, Phil Woodland
|
| 44 |
|