update readme
Browse files
README.md
CHANGED
|
@@ -1,52 +1,27 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
- other
|
| 4 |
-
license: Apache License 2.0
|
| 5 |
-
tags: []
|
| 6 |
-
tasks:
|
| 7 |
-
- auto-speech-recognition
|
| 8 |
-
|
| 9 |
-
#model-type:
|
| 10 |
-
##如 gpt、phi、llama、chatglm、baichuan 等
|
| 11 |
-
#- gpt
|
| 12 |
-
|
| 13 |
-
#domain:
|
| 14 |
-
##如 nlp、cv、audio、multi-modal
|
| 15 |
-
#- nlp
|
| 16 |
-
|
| 17 |
-
#language:
|
| 18 |
-
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
|
| 19 |
-
#- cn
|
| 20 |
-
|
| 21 |
-
#metrics:
|
| 22 |
-
##如 CIDEr、Blue、ROUGE 等
|
| 23 |
-
#- CIDEr
|
| 24 |
-
|
| 25 |
-
#tags:
|
| 26 |
-
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
|
| 27 |
-
#- pretrained
|
| 28 |
-
|
| 29 |
-
#tools:
|
| 30 |
-
##如 vllm、fastchat、llamacpp、AdaSeq 等
|
| 31 |
-
#- vllm
|
| 32 |
---
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
+
|
| 5 |
+
# RapidSpeech.cpp (https://github.com/RapidAI/RapidSpeech.cpp)️
|
| 6 |
+
|
| 7 |
+
**RapidSpeech.cpp** is a high-performance, **edge-native speech intelligence framework** built on top of **ggml**.
|
| 8 |
+
It aims to provide **pure C++**, **zero-dependency**, and **on-device inference** for large-scale ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) models.
|
| 9 |
+
|
| 10 |
+
------
|
| 11 |
+
|
| 12 |
+
## 🌟 Key Differentiators
|
| 13 |
+
|
| 14 |
+
While the open-source ecosystem already offers powerful cloud-side frameworks such as **vLLM-omni**, as well as mature on-device solutions like **sherpa-onnx**, **RapidSpeech.cpp** introduces a new generation of design choices focused on edge deployment.
|
| 15 |
+
|
| 16 |
+
### 1. vs. vLLM: Edge-first, not cloud-throughput-first
|
| 17 |
+
|
| 18 |
+
- **vLLM**
|
| 19 |
+
- Designed for data centers and cloud environments
|
| 20 |
+
- Strongly coupled with Python and CUDA
|
| 21 |
+
- Maximizes GPU throughput via techniques such as PageAttention
|
| 22 |
+
|
| 23 |
+
- **RapidSpeech.cpp**
|
| 24 |
+
- Designed specifically for **edge and on-device inference**
|
| 25 |
+
- Optimized for **low latency, low memory footprint, and lightweight deployment**
|
| 26 |
+
- Runs on embedded devices, mobile platforms, laptops, and even NPU-only systems
|
| 27 |
+
- **No Python runtime required**
|