--- title: Speechlib API emoji: 🎀 colorFrom: blue colorTo: purple sdk: docker app_file: app.py pinned: false --- # Speechlib REST API (ECAPA-TDNN) ν™”μž 뢄리(Speaker Diarization) + ν™”μž 식별(Speaker Identification) + μŒμ„± 인식(STT) REST API ## Features - **ν™”μž 뢄리**: pyannote/speaker-diarization-3.1둜 μ—¬λŸ¬ ν™”μž ꡬ뢄 - **ν™”μž 식별**: speechbrain ECAPA-TDNN으둜 λ“±λ‘λœ ν™”μž 식별 (κ³ μ •λ°€) - **μŒμ„± 인식**: faster-whisper (large-v3-turbo)λ₯Ό μ‚¬μš©ν•œ STT ## API Endpoints ### GET / API μƒνƒœ 확인 ### GET /health ν—¬μŠ€ 체크 ### POST /transcribe λ‹¨μˆœ STT + ν™”μž 뢄리 (ν™”μž 식별 μ—†μŒ) **Parameters (multipart/form-data):** - `audio`: μ˜€λ””μ˜€ 파일 (ν•„μˆ˜) - `language`: μ–Έμ–΄ μ½”λ“œ (κΈ°λ³Έκ°’: ko) - `hf_token`: HuggingFace 토큰 (ν•„μˆ˜) ### POST /process 전체 κΈ°λŠ₯: ν™”μž 뢄리 + ν™”μž 식별 + STT **Parameters (multipart/form-data):** - `audio`: 뢄석할 μ˜€λ””μ˜€ 파일 (ν•„μˆ˜) - `voice_sample`: ν™”μž μƒ˜ν”Œ 파일 (선택) - `speaker_name`: 식별할 ν™”μž 이름 (κΈ°λ³Έκ°’: speaker) - `language`: μ–Έμ–΄ μ½”λ“œ (κΈ°λ³Έκ°’: ko) - `hf_token`: HuggingFace 토큰 (ν•„μˆ˜) ## Usage Example ### cURL ```bash # λ‹¨μˆœ STT curl -X POST "https://YOUR_SPACE.hf.space/transcribe" \ -F "audio=@audio.wav" \ -F "language=ko" \ -F "hf_token=hf_YOUR_TOKEN" # ν™”μž 식별 포함 curl -X POST "https://YOUR_SPACE.hf.space/process" \ -F "audio=@conversation.wav" \ -F "voice_sample=@speaker_sample.wav" \ -F "speaker_name=홍길동" \ -F "language=ko" \ -F "hf_token=hf_YOUR_TOKEN" ``` ### Python ```python import requests # λ‹¨μˆœ STT response = requests.post( "https://YOUR_SPACE.hf.space/transcribe", files={"audio": open("audio.wav", "rb")}, data={"language": "ko", "hf_token": "hf_YOUR_TOKEN"} ) print(response.json()) # ν™”μž 식별 포함 response = requests.post( "https://YOUR_SPACE.hf.space/process", files={ "audio": open("conversation.wav", "rb"), "voice_sample": open("speaker_sample.wav", "rb") }, data={ "speaker_name": "홍길동", "language": "ko", "hf_token": "hf_YOUR_TOKEN" } ) print(response.json()) ``` ### JavaScript/Node.js ```javascript const FormData = require('form-data'); const fs = require('fs'); const axios = require('axios'); const form = new FormData(); form.append('audio', fs.createReadStream('audio.wav')); form.append('language', 'ko'); form.append('hf_token', 'hf_YOUR_TOKEN'); const response = await axios.post( 'https://YOUR_SPACE.hf.space/transcribe', form, { headers: form.getHeaders() } ); console.log(response.data); ``` ## Response Format ```json { "success": true, "segments": [ { "start": 0.0, "end": 2.5, "text": "μ•ˆλ…•ν•˜μ„Έμš”", "speaker": "홍길동", "similarity": 85.3 } ], "speaker_stats": { "홍길동": { "count": 10, "duration": 45.5 } }, "total_segments": 20 } ``` ## Notes - ECAPA-TDNN은 μœ μ‚¬λ„ μž„κ³„κ°’ 25% 이상일 λ•Œ ν™”μž λ§€μΉ­ - GPU μ‚¬μš© κ°€λŠ₯ μ‹œ μžλ™μœΌλ‘œ GPU ν™œμš© - 지원 μ˜€λ””μ˜€ 포맷: wav, mp3, m4a, ogg, flac, aac - API λ¬Έμ„œ: https://YOUR_SPACE.hf.space/docs