| | --- |
| | title: SATE |
| | emoji: ⚡ |
| | colorFrom: purple |
| | colorTo: blue |
| | sdk: docker |
| | pinned: false |
| | license: apache-2.0 |
| | short_description: Speech Annotatin and Transcription Enhancer |
| | --- |
| | |
| |
|
| | # SATE: Speech Annotation and Transcription Enhancer (MVP) |
| |
|
| | This is the **Minimum Viable Product (MVP)** version of **SATE**, a unified pipeline framework that integrates audio segmentation, speaker diarization, transcription, and linguistic annotation into a single application. |
| |
|
| | --- |
| |
|
| | ## Overview |
| |
|
| | - **Main Entry**: `main_socket.py` |
| | - **Input**: Entire audio file (`.mp3`, `.wav`, etc.) |
| | - **Output**: Word-level timestamped transcription with annotations such as pauses, repetitions, filler words, mispronunciations and syllables. |
| |
|
| | - **Preprocessing**: |
| | - Audio segmentation |
| | - Speaker diarization |
| | - Transcription using Crisper Whisper |
| |
|
| | - **Annotation**: |
| | - Pause |
| | - Repetition |
| | - Filler Words |
| | - Syllable Structure |
| | - Mispronunciation Sequence (PLM container is needed) |
| |
|
| | - **Feature Extraction** |
| |
|
| | --- |
| |
|
| |
|
| | ## Getting Started |
| |
|
| | #### Installation |
| |
|
| | ##### 1. Clone the repo |
| | ```bash |
| | git clone https://github.com/SwenHou/SATE.git |
| | ``` |
| | ##### 2. Install packages |
| | ```bash |
| | conda env create -f environment_sate_0.11.yml |
| | ``` |
| | ##### 3. Start Inference API in your Local Computer |
| | Setup your Huggingface Token: |
| | ```bash |
| | export HF_TOKEN=<your_token_here> |
| | ``` |
| | Start API: |
| | ```bash |
| | python main_socket.py |
| | ``` |
| | #### Usage |
| | ##### 1. Get Annotations |
| |
|
| | ```bash |
| | curl -X POST http://localhost:7860/process \ |
| | -F "audio_file=@<your local path to audio file>" \ |
| | -F "device=cuda" \ |
| | -F "pause_threshold=0.25" |
| | ``` |
| | The annotation file is also available in `SATE/session_data/` |
| |
|
| | --- |
| |
|
| |
|
| | ## 🐳 Use Docker |
| |
|
| | ### 1. Build Docker Image |
| | Tn `Dockerfile`: |
| | Delete `ENV HF_HOME=/data/.huggingface` and add `ENV HF_TOKEN=<your_token_here>` |
| |
|
| | Run the following command in the project root directory: |
| |
|
| | ```bash |
| | docker build -t sate_0.11 . |
| | ``` |
| |
|
| | ### 2. Run the Docker Container |
| | ```bash |
| | docker run --gpus all -it --rm \ |
| | -p 7860:7860 \ |
| | sate_0.11 |
| | ``` |
| |
|
| | ### 3. Usage |
| | The usage is same as using local API, but the annotation file will be deleted after container exits. |
| |
|
| | ```bash |
| | curl -X POST http://localhost:7860/process \ |
| | -F "audio_file=@<your local path to audio file>" \ |
| | -F "device=cuda" \ |
| | -F "pause_threshold=0.25" |
| | ``` |
| |
|
| |
|
| | --- |
| |
|
| |
|
| | ## 🤗 Use API from Hugging Face Spaces |
| |
|
| | ```bash |
| | curl -X POST https://Sven33-SATE.hf.space/process \ |
| | -F "audio_file=@<your local path to audio file>" \ |
| | -F "device=cuda" \ |
| | -F "pause_threshold=0.25" |
| | ``` |
| | ##### Hugging Face Space URL: `https://huggingface.co/spaces/Sven33/SATE` |
| |
|
| | Due to Hugging Face's GPU scheduling latency, the initial startup time for the first request is around 5-8 minutes. If there is no visit within five minutes after startup, the service will go back into sleep mode. |
| |
|
| | For a 10-minute audio sample, the inference time using a T4 small GPU is approximately under two minutes. |