Spaces:
Sleeping
Sleeping
File size: 3,221 Bytes
a8c82be 4ebed0f a8c82be 4ebed0f a8c82be 4ebed0f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
---
title: Speechlib API
emoji: π€
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: false
---
# Speechlib REST API (ECAPA-TDNN)
νμ λΆλ¦¬(Speaker Diarization) + νμ μλ³(Speaker Identification) + μμ± μΈμ(STT) REST API
## Features
- **νμ λΆλ¦¬**: pyannote/speaker-diarization-3.1λ‘ μ¬λ¬ νμ ꡬλΆ
- **νμ μλ³**: speechbrain ECAPA-TDNNμΌλ‘ λ±λ‘λ νμ μλ³ (κ³ μ λ°)
- **μμ± μΈμ**: faster-whisper (large-v3-turbo)λ₯Ό μ¬μ©ν STT
## API Endpoints
### GET /
API μν νμΈ
### GET /health
ν¬μ€ 체ν¬
### POST /transcribe
λ¨μ STT + νμ λΆλ¦¬ (νμ μλ³ μμ)
**Parameters (multipart/form-data):**
- `audio`: μ€λμ€ νμΌ (νμ)
- `language`: μΈμ΄ μ½λ (κΈ°λ³Έκ°: ko)
- `hf_token`: HuggingFace ν ν° (νμ)
### POST /process
μ 체 κΈ°λ₯: νμ λΆλ¦¬ + νμ μλ³ + STT
**Parameters (multipart/form-data):**
- `audio`: λΆμν μ€λμ€ νμΌ (νμ)
- `voice_sample`: νμ μν νμΌ (μ ν)
- `speaker_name`: μλ³ν νμ μ΄λ¦ (κΈ°λ³Έκ°: speaker)
- `language`: μΈμ΄ μ½λ (κΈ°λ³Έκ°: ko)
- `hf_token`: HuggingFace ν ν° (νμ)
## Usage Example
### cURL
```bash
# λ¨μ STT
curl -X POST "https://YOUR_SPACE.hf.space/transcribe" \
-F "audio=@audio.wav" \
-F "language=ko" \
-F "hf_token=hf_YOUR_TOKEN"
# νμ μλ³ ν¬ν¨
curl -X POST "https://YOUR_SPACE.hf.space/process" \
-F "audio=@conversation.wav" \
-F "voice_sample=@speaker_sample.wav" \
-F "speaker_name=νκΈΈλ" \
-F "language=ko" \
-F "hf_token=hf_YOUR_TOKEN"
```
### Python
```python
import requests
# λ¨μ STT
response = requests.post(
"https://YOUR_SPACE.hf.space/transcribe",
files={"audio": open("audio.wav", "rb")},
data={"language": "ko", "hf_token": "hf_YOUR_TOKEN"}
)
print(response.json())
# νμ μλ³ ν¬ν¨
response = requests.post(
"https://YOUR_SPACE.hf.space/process",
files={
"audio": open("conversation.wav", "rb"),
"voice_sample": open("speaker_sample.wav", "rb")
},
data={
"speaker_name": "νκΈΈλ",
"language": "ko",
"hf_token": "hf_YOUR_TOKEN"
}
)
print(response.json())
```
### JavaScript/Node.js
```javascript
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');
const form = new FormData();
form.append('audio', fs.createReadStream('audio.wav'));
form.append('language', 'ko');
form.append('hf_token', 'hf_YOUR_TOKEN');
const response = await axios.post(
'https://YOUR_SPACE.hf.space/transcribe',
form,
{ headers: form.getHeaders() }
);
console.log(response.data);
```
## Response Format
```json
{
"success": true,
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "μλ
νμΈμ",
"speaker": "νκΈΈλ",
"similarity": 85.3
}
],
"speaker_stats": {
"νκΈΈλ": {
"count": 10,
"duration": 45.5
}
},
"total_segments": 20
}
```
## Notes
- ECAPA-TDNNμ μ μ¬λ μκ³κ° 25% μ΄μμΌ λ νμ λ§€μΉ
- GPU μ¬μ© κ°λ₯ μ μλμΌλ‘ GPU νμ©
- μ§μ μ€λμ€ ν¬λ§·: wav, mp3, m4a, ogg, flac, aac
- API λ¬Έμ: https://YOUR_SPACE.hf.space/docs
|