speechlibProject / README.md
pgkwon1's picture
Upload 4 files
4ebed0f verified
---
title: Speechlib API
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: false
---
# Speechlib REST API (ECAPA-TDNN)
ν™”μž 뢄리(Speaker Diarization) + ν™”μž 식별(Speaker Identification) + μŒμ„± 인식(STT) REST API
## Features
- **ν™”μž 뢄리**: pyannote/speaker-diarization-3.1둜 μ—¬λŸ¬ ν™”μž ꡬ뢄
- **ν™”μž 식별**: speechbrain ECAPA-TDNN으둜 λ“±λ‘λœ ν™”μž 식별 (κ³ μ •λ°€)
- **μŒμ„± 인식**: faster-whisper (large-v3-turbo)λ₯Ό μ‚¬μš©ν•œ STT
## API Endpoints
### GET /
API μƒνƒœ 확인
### GET /health
ν—¬μŠ€ 체크
### POST /transcribe
λ‹¨μˆœ STT + ν™”μž 뢄리 (ν™”μž 식별 μ—†μŒ)
**Parameters (multipart/form-data):**
- `audio`: μ˜€λ””μ˜€ 파일 (ν•„μˆ˜)
- `language`: μ–Έμ–΄ μ½”λ“œ (κΈ°λ³Έκ°’: ko)
- `hf_token`: HuggingFace 토큰 (ν•„μˆ˜)
### POST /process
전체 κΈ°λŠ₯: ν™”μž 뢄리 + ν™”μž 식별 + STT
**Parameters (multipart/form-data):**
- `audio`: 뢄석할 μ˜€λ””μ˜€ 파일 (ν•„μˆ˜)
- `voice_sample`: ν™”μž μƒ˜ν”Œ 파일 (선택)
- `speaker_name`: 식별할 ν™”μž 이름 (κΈ°λ³Έκ°’: speaker)
- `language`: μ–Έμ–΄ μ½”λ“œ (κΈ°λ³Έκ°’: ko)
- `hf_token`: HuggingFace 토큰 (ν•„μˆ˜)
## Usage Example
### cURL
```bash
# λ‹¨μˆœ STT
curl -X POST "https://YOUR_SPACE.hf.space/transcribe" \
-F "audio=@audio.wav" \
-F "language=ko" \
-F "hf_token=hf_YOUR_TOKEN"
# ν™”μž 식별 포함
curl -X POST "https://YOUR_SPACE.hf.space/process" \
-F "audio=@conversation.wav" \
-F "voice_sample=@speaker_sample.wav" \
-F "speaker_name=홍길동" \
-F "language=ko" \
-F "hf_token=hf_YOUR_TOKEN"
```
### Python
```python
import requests
# λ‹¨μˆœ STT
response = requests.post(
"https://YOUR_SPACE.hf.space/transcribe",
files={"audio": open("audio.wav", "rb")},
data={"language": "ko", "hf_token": "hf_YOUR_TOKEN"}
)
print(response.json())
# ν™”μž 식별 포함
response = requests.post(
"https://YOUR_SPACE.hf.space/process",
files={
"audio": open("conversation.wav", "rb"),
"voice_sample": open("speaker_sample.wav", "rb")
},
data={
"speaker_name": "홍길동",
"language": "ko",
"hf_token": "hf_YOUR_TOKEN"
}
)
print(response.json())
```
### JavaScript/Node.js
```javascript
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');
const form = new FormData();
form.append('audio', fs.createReadStream('audio.wav'));
form.append('language', 'ko');
form.append('hf_token', 'hf_YOUR_TOKEN');
const response = await axios.post(
'https://YOUR_SPACE.hf.space/transcribe',
form,
{ headers: form.getHeaders() }
);
console.log(response.data);
```
## Response Format
```json
{
"success": true,
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "μ•ˆλ…•ν•˜μ„Έμš”",
"speaker": "홍길동",
"similarity": 85.3
}
],
"speaker_stats": {
"홍길동": {
"count": 10,
"duration": 45.5
}
},
"total_segments": 20
}
```
## Notes
- ECAPA-TDNN은 μœ μ‚¬λ„ μž„κ³„κ°’ 25% 이상일 λ•Œ ν™”μž λ§€μΉ­
- GPU μ‚¬μš© κ°€λŠ₯ μ‹œ μžλ™μœΌλ‘œ GPU ν™œμš©
- 지원 μ˜€λ””μ˜€ 포맷: wav, mp3, m4a, ogg, flac, aac
- API λ¬Έμ„œ: https://YOUR_SPACE.hf.space/docs