Spaces:
Sleeping
Sleeping
| title: Speechlib API | |
| emoji: π€ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| app_file: app.py | |
| pinned: false | |
| # Speechlib REST API (ECAPA-TDNN) | |
| νμ λΆλ¦¬(Speaker Diarization) + νμ μλ³(Speaker Identification) + μμ± μΈμ(STT) REST API | |
| ## Features | |
| - **νμ λΆλ¦¬**: pyannote/speaker-diarization-3.1λ‘ μ¬λ¬ νμ κ΅¬λΆ | |
| - **νμ μλ³**: speechbrain ECAPA-TDNNμΌλ‘ λ±λ‘λ νμ μλ³ (κ³ μ λ°) | |
| - **μμ± μΈμ**: faster-whisper (large-v3-turbo)λ₯Ό μ¬μ©ν STT | |
| ## API Endpoints | |
| ### GET / | |
| API μν νμΈ | |
| ### GET /health | |
| ν¬μ€ μ²΄ν¬ | |
| ### POST /transcribe | |
| λ¨μ STT + νμ λΆλ¦¬ (νμ μλ³ μμ) | |
| **Parameters (multipart/form-data):** | |
| - `audio`: μ€λμ€ νμΌ (νμ) | |
| - `language`: μΈμ΄ μ½λ (κΈ°λ³Έκ°: ko) | |
| - `hf_token`: HuggingFace ν ν° (νμ) | |
| ### POST /process | |
| μ 체 κΈ°λ₯: νμ λΆλ¦¬ + νμ μλ³ + STT | |
| **Parameters (multipart/form-data):** | |
| - `audio`: λΆμν μ€λμ€ νμΌ (νμ) | |
| - `voice_sample`: νμ μν νμΌ (μ ν) | |
| - `speaker_name`: μλ³ν νμ μ΄λ¦ (κΈ°λ³Έκ°: speaker) | |
| - `language`: μΈμ΄ μ½λ (κΈ°λ³Έκ°: ko) | |
| - `hf_token`: HuggingFace ν ν° (νμ) | |
| ## Usage Example | |
| ### cURL | |
| ```bash | |
| # λ¨μ STT | |
| curl -X POST "https://YOUR_SPACE.hf.space/transcribe" \ | |
| -F "audio=@audio.wav" \ | |
| -F "language=ko" \ | |
| -F "hf_token=hf_YOUR_TOKEN" | |
| # νμ μλ³ ν¬ν¨ | |
| curl -X POST "https://YOUR_SPACE.hf.space/process" \ | |
| -F "audio=@conversation.wav" \ | |
| -F "voice_sample=@speaker_sample.wav" \ | |
| -F "speaker_name=νκΈΈλ" \ | |
| -F "language=ko" \ | |
| -F "hf_token=hf_YOUR_TOKEN" | |
| ``` | |
| ### Python | |
| ```python | |
| import requests | |
| # λ¨μ STT | |
| response = requests.post( | |
| "https://YOUR_SPACE.hf.space/transcribe", | |
| files={"audio": open("audio.wav", "rb")}, | |
| data={"language": "ko", "hf_token": "hf_YOUR_TOKEN"} | |
| ) | |
| print(response.json()) | |
| # νμ μλ³ ν¬ν¨ | |
| response = requests.post( | |
| "https://YOUR_SPACE.hf.space/process", | |
| files={ | |
| "audio": open("conversation.wav", "rb"), | |
| "voice_sample": open("speaker_sample.wav", "rb") | |
| }, | |
| data={ | |
| "speaker_name": "νκΈΈλ", | |
| "language": "ko", | |
| "hf_token": "hf_YOUR_TOKEN" | |
| } | |
| ) | |
| print(response.json()) | |
| ``` | |
| ### JavaScript/Node.js | |
| ```javascript | |
| const FormData = require('form-data'); | |
| const fs = require('fs'); | |
| const axios = require('axios'); | |
| const form = new FormData(); | |
| form.append('audio', fs.createReadStream('audio.wav')); | |
| form.append('language', 'ko'); | |
| form.append('hf_token', 'hf_YOUR_TOKEN'); | |
| const response = await axios.post( | |
| 'https://YOUR_SPACE.hf.space/transcribe', | |
| form, | |
| { headers: form.getHeaders() } | |
| ); | |
| console.log(response.data); | |
| ``` | |
| ## Response Format | |
| ```json | |
| { | |
| "success": true, | |
| "segments": [ | |
| { | |
| "start": 0.0, | |
| "end": 2.5, | |
| "text": "μλ νμΈμ", | |
| "speaker": "νκΈΈλ", | |
| "similarity": 85.3 | |
| } | |
| ], | |
| "speaker_stats": { | |
| "νκΈΈλ": { | |
| "count": 10, | |
| "duration": 45.5 | |
| } | |
| }, | |
| "total_segments": 20 | |
| } | |
| ``` | |
| ## Notes | |
| - ECAPA-TDNNμ μ μ¬λ μκ³κ° 25% μ΄μμΌ λ νμ λ§€μΉ | |
| - GPU μ¬μ© κ°λ₯ μ μλμΌλ‘ GPU νμ© | |
| - μ§μ μ€λμ€ ν¬λ§·: wav, mp3, m4a, ogg, flac, aac | |
| - API λ¬Έμ: https://YOUR_SPACE.hf.space/docs | |