Spaces:

Luigi
/

Streaming-Zipformer

Sleeping

App Files Files Community

Luigi commited on Jun 6, 2025

Commit

cd954ca

1 Parent(s): 231cd3a

update readme

Browse files

Files changed (1) hide show

README.md +107 -1

README.md CHANGED Viewed

@@ -9,4 +9,110 @@ license: mit
 short_description: Streaming zipformer
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 short_description: Streaming zipformer
 ---
+# 🎙️ Real-Time Streaming ASR Demo (FastAPI + Sherpa-ONNX)
+This project demonstrates a real-time speech-to-text (ASR) web application using:
+* 🧠 [Sherpa-ONNX](https://github.com/k2-fsa/sherpa-onnx) streaming Zipformer model
+* 🚀 FastAPI backend with WebSocket support
+* 🧑‍💻 Hugging Face Spaces (Docker CPU-only deployment)
+* 🌐 Browser-based microphone input + UI in vanilla HTML/JS
+---
+## 📦 Model
+This app uses the following **bilingual (Chinese-English)** streaming model:
+**🔗 Model Source:**
+[Zipformer Small Bilingual zh-en (2023-02-16)](https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16-bilingual-chinese-english)
+Model files (ONNX) are located under:
+```
+models/zipformer_bilingual/
+```
+---
+## 🚀 Features
+* 🎤 Real-time microphone input (captured in browser)
+* 🔁 WebSocket-based streaming inference
+* 💬 Partial + final transcription
+* 🌏 Automatic conversion to **Traditional Chinese** using OpenCC
+* 📊 Real-time volume indicator
+* ☁️ Deployed on Hugging Face Spaces (CPU only)
+---
+## 🧪 Local Development
+### 1. Install dependencies
+```bash
+pip install -r requirements.txt
+```
+### 2. Run the app locally
+```bash
+uvicorn app.main:app --reload --host 0.0.0.0 --port 8501
+```
+Then open: [http://localhost:8501](http://localhost:8501)
+---
+## 🐳 Deploy on Hugging Face Spaces
+This repo includes a `Dockerfile` compatible with HF Spaces. It uses:
+* `uvicorn` for serving the FastAPI app
+* `opencc-python-reimplemented` for Simplified → Traditional Chinese
+* `pysoxr` or `scipy` for audio resampling (48kHz → 16kHz)
+---
+## 📁 Project Structure
+```
+.
+├── app
+│   ├── main.py               # FastAPI + WebSocket
+│   ├── asr_worker.py         # Sherpa inference + resampling + OpenCC
+│   └── static/index.html     # Client-side mic UI
+├── models/zipformer_bilingual/
+│   └── ... (onnx, tokens.txt)
+├── requirements.txt
+├── Dockerfile
+└── README.md
+```
+---
+## 🔧 Credits
+* [Sherpa-ONNX](https://github.com/k2-fsa/sherpa-onnx)
+* [OpenCC](https://github.com/BYVoid/OpenCC)
+* [FastAPI](https://fastapi.tiangolo.com/)
+* [Hugging Face Spaces](https://huggingface.co/docs/hub/spaces)
+---
+## 🗣 Languages Supported
+* 🇨🇳 Chinese (Simplified input, converted to Traditional)
+* 🇺🇸 English
+---
+## 🤝 Contributing
+PRs welcome! Feel free to fork this and adapt to other models or languages.
+---
+## 📜 License
+Apache 2.0