Spaces:

Sven33
/

SATE

Runtime error

App Files Files Community

Shuwei Hou commited on Jun 11, 2025

Commit

75814d9

1 Parent(s): a3b7803

update_readme

Browse files

Files changed (1) hide show

README.md +107 -9

README.md CHANGED Viewed

@@ -1,12 +1,110 @@
 ---
-title: SATE
-emoji: ⚡
-colorFrom: purple
-colorTo: blue
-sdk: docker
-pinned: false
-license: apache-2.0
-short_description: Speech Annotatin and Transcription Enhancer
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# SATE: Speech Annotation and Transcription Enhancer (MVP)
+This is the **Minimum Viable Product (MVP)** version of **SATE**, a unified pipeline framework that integrates audio segmentation, speaker diarization, transcription, and linguistic annotation into a single application.
+---
+## Overview
+- **Main Entry**: `main_socket.py`
+- **Input**: Entire audio file (`.mp3`, `.wav`, etc.)
+- **Output**: Word-level timestamped transcription with annotations such as pauses, repetitions, filler words, mispronunciations and syllables.
+- **Preprocessing**:
+  - Audio segmentation
+  - Speaker diarization
+  - Transcription using Crisper Whisper
+- **Annotation**:
+  - Pause
+  - Repetition
+  - Filler Words
+  - Syllable Structure
+  - Mispronunciation Sequence (PLM container is needed)
+- **Feature Extraction**
+---
+## Getting Started
+#### Installation
+##### 1. Clone the repo
+```bash
+git clone https://github.com/SwenHou/SATE.git
+```
+##### 2. Install packages
+```bash
+conda env create -f environment_sate_0.11.yml
+```
+##### 3. Start Inference API in your Local Computer
+Setup your Huggingface Token:
+```bash
+export HF_TOKEN=<your_token_here>
+```
+Start API:
+```bash
+python main_socket.py
+```
+#### Usage
+##### 1. Get Annotations
+```bash
+curl -X POST http://localhost:7860/process \
+  -F "audio_file=@<your local path to audio file>" \
+  -F "device=cuda" \
+  -F "pause_threshold=0.25"
+```
+The annotation file is also available in `SATE/session_data/`
 ---
+## 🐳 Use Docker
+### 1. Build Docker Image
+Tn `Dockerfile`:
+Delete `ENV HF_HOME=/data/.huggingface` and add `ENV HF_TOKEN=<your_token_here>`
+Run the following command in the project root directory:
+```bash
+docker build -t sate_0.11 .
+```
+### 2. Run the Docker Container
+```bash
+docker run --gpus all -it --rm \
+  -p 7860:7860 \
+  sate_0.11
+```
+### 3. Usage
+The usage is same as using local API, but the annotation file will be deleted after container exits.
+```bash
+curl -X POST http://localhost:7860/process \
+  -F "audio_file=@<your local path to audio file>" \
+  -F "device=cuda" \
+  -F "pause_threshold=0.25"
+```
 ---
+## 🤗 Use API from Hugging Face Spaces
+```bash
+curl -X POST https://Sven33-SATE.hf.space/process \
+  -F "audio_file=@<your local path to audio file>" \
+  -F "device=cuda" \
+  -F "pause_threshold=0.25"
+```
+##### Hugging Face Space URL: `https://huggingface.co/spaces/Sven33/SATE`
+Due to Hugging Face's GPU scheduling latency, the initial startup time for the first request is around 5-8 minutes. If there is no visit within five minutes after startup, the service will go back into sleep mode.
+For a 10-minute audio sample, the inference time using a T4 small GPU is approximately under two minutes.