File size: 3,221 Bytes
a8c82be
4ebed0f
 
 
 
a8c82be
4ebed0f
a8c82be
 
 
4ebed0f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
title: Speechlib API
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: false
---

# Speechlib REST API (ECAPA-TDNN)

ν™”μž 뢄리(Speaker Diarization) + ν™”μž 식별(Speaker Identification) + μŒμ„± 인식(STT) REST API

## Features

- **ν™”μž 뢄리**: pyannote/speaker-diarization-3.1둜 μ—¬λŸ¬ ν™”μž ꡬ뢄
- **ν™”μž 식별**: speechbrain ECAPA-TDNN으둜 λ“±λ‘λœ ν™”μž 식별 (κ³ μ •λ°€)
- **μŒμ„± 인식**: faster-whisper (large-v3-turbo)λ₯Ό μ‚¬μš©ν•œ STT

## API Endpoints

### GET /
API μƒνƒœ 확인

### GET /health
ν—¬μŠ€ 체크

### POST /transcribe
λ‹¨μˆœ STT + ν™”μž 뢄리 (ν™”μž 식별 μ—†μŒ)

**Parameters (multipart/form-data):**
- `audio`: μ˜€λ””μ˜€ 파일 (ν•„μˆ˜)
- `language`: μ–Έμ–΄ μ½”λ“œ (κΈ°λ³Έκ°’: ko)
- `hf_token`: HuggingFace 토큰 (ν•„μˆ˜)

### POST /process
전체 κΈ°λŠ₯: ν™”μž 뢄리 + ν™”μž 식별 + STT

**Parameters (multipart/form-data):**
- `audio`: 뢄석할 μ˜€λ””μ˜€ 파일 (ν•„μˆ˜)
- `voice_sample`: ν™”μž μƒ˜ν”Œ 파일 (선택)
- `speaker_name`: 식별할 ν™”μž 이름 (κΈ°λ³Έκ°’: speaker)
- `language`: μ–Έμ–΄ μ½”λ“œ (κΈ°λ³Έκ°’: ko)
- `hf_token`: HuggingFace 토큰 (ν•„μˆ˜)

## Usage Example

### cURL

```bash
# λ‹¨μˆœ STT
curl -X POST "https://YOUR_SPACE.hf.space/transcribe" \
  -F "audio=@audio.wav" \
  -F "language=ko" \
  -F "hf_token=hf_YOUR_TOKEN"

# ν™”μž 식별 포함
curl -X POST "https://YOUR_SPACE.hf.space/process" \
  -F "audio=@conversation.wav" \
  -F "voice_sample=@speaker_sample.wav" \
  -F "speaker_name=홍길동" \
  -F "language=ko" \
  -F "hf_token=hf_YOUR_TOKEN"
```

### Python

```python
import requests

# λ‹¨μˆœ STT
response = requests.post(
    "https://YOUR_SPACE.hf.space/transcribe",
    files={"audio": open("audio.wav", "rb")},
    data={"language": "ko", "hf_token": "hf_YOUR_TOKEN"}
)
print(response.json())

# ν™”μž 식별 포함
response = requests.post(
    "https://YOUR_SPACE.hf.space/process",
    files={
        "audio": open("conversation.wav", "rb"),
        "voice_sample": open("speaker_sample.wav", "rb")
    },
    data={
        "speaker_name": "홍길동",
        "language": "ko",
        "hf_token": "hf_YOUR_TOKEN"
    }
)
print(response.json())
```

### JavaScript/Node.js

```javascript
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

const form = new FormData();
form.append('audio', fs.createReadStream('audio.wav'));
form.append('language', 'ko');
form.append('hf_token', 'hf_YOUR_TOKEN');

const response = await axios.post(
  'https://YOUR_SPACE.hf.space/transcribe',
  form,
  { headers: form.getHeaders() }
);
console.log(response.data);
```

## Response Format

```json
{
  "success": true,
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "μ•ˆλ…•ν•˜μ„Έμš”",
      "speaker": "홍길동",
      "similarity": 85.3
    }
  ],
  "speaker_stats": {
    "홍길동": {
      "count": 10,
      "duration": 45.5
    }
  },
  "total_segments": 20
}
```

## Notes

- ECAPA-TDNN은 μœ μ‚¬λ„ μž„κ³„κ°’ 25% 이상일 λ•Œ ν™”μž λ§€μΉ­
- GPU μ‚¬μš© κ°€λŠ₯ μ‹œ μžλ™μœΌλ‘œ GPU ν™œμš©
- 지원 μ˜€λ””μ˜€ 포맷: wav, mp3, m4a, ogg, flac, aac
- API λ¬Έμ„œ: https://YOUR_SPACE.hf.space/docs