speechlibProject / README.md
pgkwon1's picture
Upload 4 files
4ebed0f verified
metadata
title: Speechlib API
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: false

Speechlib REST API (ECAPA-TDNN)

ν™”μž 뢄리(Speaker Diarization) + ν™”μž 식별(Speaker Identification) + μŒμ„± 인식(STT) REST API

Features

  • ν™”μž 뢄리: pyannote/speaker-diarization-3.1둜 μ—¬λŸ¬ ν™”μž ꡬ뢄
  • ν™”μž 식별: speechbrain ECAPA-TDNN으둜 λ“±λ‘λœ ν™”μž 식별 (κ³ μ •λ°€)
  • μŒμ„± 인식: faster-whisper (large-v3-turbo)λ₯Ό μ‚¬μš©ν•œ STT

API Endpoints

GET /

API μƒνƒœ 확인

GET /health

ν—¬μŠ€ 체크

POST /transcribe

λ‹¨μˆœ STT + ν™”μž 뢄리 (ν™”μž 식별 μ—†μŒ)

Parameters (multipart/form-data):

  • audio: μ˜€λ””μ˜€ 파일 (ν•„μˆ˜)
  • language: μ–Έμ–΄ μ½”λ“œ (κΈ°λ³Έκ°’: ko)
  • hf_token: HuggingFace 토큰 (ν•„μˆ˜)

POST /process

전체 κΈ°λŠ₯: ν™”μž 뢄리 + ν™”μž 식별 + STT

Parameters (multipart/form-data):

  • audio: 뢄석할 μ˜€λ””μ˜€ 파일 (ν•„μˆ˜)
  • voice_sample: ν™”μž μƒ˜ν”Œ 파일 (선택)
  • speaker_name: 식별할 ν™”μž 이름 (κΈ°λ³Έκ°’: speaker)
  • language: μ–Έμ–΄ μ½”λ“œ (κΈ°λ³Έκ°’: ko)
  • hf_token: HuggingFace 토큰 (ν•„μˆ˜)

Usage Example

cURL

# λ‹¨μˆœ STT
curl -X POST "https://YOUR_SPACE.hf.space/transcribe" \
  -F "audio=@audio.wav" \
  -F "language=ko" \
  -F "hf_token=hf_YOUR_TOKEN"

# ν™”μž 식별 포함
curl -X POST "https://YOUR_SPACE.hf.space/process" \
  -F "audio=@conversation.wav" \
  -F "voice_sample=@speaker_sample.wav" \
  -F "speaker_name=홍길동" \
  -F "language=ko" \
  -F "hf_token=hf_YOUR_TOKEN"

Python

import requests

# λ‹¨μˆœ STT
response = requests.post(
    "https://YOUR_SPACE.hf.space/transcribe",
    files={"audio": open("audio.wav", "rb")},
    data={"language": "ko", "hf_token": "hf_YOUR_TOKEN"}
)
print(response.json())

# ν™”μž 식별 포함
response = requests.post(
    "https://YOUR_SPACE.hf.space/process",
    files={
        "audio": open("conversation.wav", "rb"),
        "voice_sample": open("speaker_sample.wav", "rb")
    },
    data={
        "speaker_name": "홍길동",
        "language": "ko",
        "hf_token": "hf_YOUR_TOKEN"
    }
)
print(response.json())

JavaScript/Node.js

const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

const form = new FormData();
form.append('audio', fs.createReadStream('audio.wav'));
form.append('language', 'ko');
form.append('hf_token', 'hf_YOUR_TOKEN');

const response = await axios.post(
  'https://YOUR_SPACE.hf.space/transcribe',
  form,
  { headers: form.getHeaders() }
);
console.log(response.data);

Response Format

{
  "success": true,
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "μ•ˆλ…•ν•˜μ„Έμš”",
      "speaker": "홍길동",
      "similarity": 85.3
    }
  ],
  "speaker_stats": {
    "홍길동": {
      "count": 10,
      "duration": 45.5
    }
  },
  "total_segments": 20
}

Notes

  • ECAPA-TDNN은 μœ μ‚¬λ„ μž„κ³„κ°’ 25% 이상일 λ•Œ ν™”μž λ§€μΉ­
  • GPU μ‚¬μš© κ°€λŠ₯ μ‹œ μžλ™μœΌλ‘œ GPU ν™œμš©
  • 지원 μ˜€λ””μ˜€ 포맷: wav, mp3, m4a, ogg, flac, aac
  • API λ¬Έμ„œ: https://YOUR_SPACE.hf.space/docs