Spaces:

pgkwon1
/

speechlibProject

Sleeping

App Files Files Community

speechlibProject / README.md

pgkwon1

Upload 4 files

4ebed0f verified 28 days ago

preview code

raw

history blame contribute delete

3.22 kB

metadata

title: Speechlib API
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: false

Speechlib REST API (ECAPA-TDNN)

화자 분리(Speaker Diarization) + 화자 식별(Speaker Identification) + 음성 인식(STT) REST API

Features

화자 분리: pyannote/speaker-diarization-3.1로 여러 화자 구분
화자 식별: speechbrain ECAPA-TDNN으로 등록된 화자 식별 (고정밀)
음성 인식: faster-whisper (large-v3-turbo)를 사용한 STT

API Endpoints

GET /

API 상태 확인

GET /health

헬스 체크

POST /transcribe

단순 STT + 화자 분리 (화자 식별 없음)

Parameters (multipart/form-data):

audio: 오디오 파일 (필수)
language: 언어 코드 (기본값: ko)
hf_token: HuggingFace 토큰 (필수)

POST /process

전체 기능: 화자 분리 + 화자 식별 + STT

Parameters (multipart/form-data):

audio: 분석할 오디오 파일 (필수)
voice_sample: 화자 샘플 파일 (선택)
speaker_name: 식별할 화자 이름 (기본값: speaker)
language: 언어 코드 (기본값: ko)
hf_token: HuggingFace 토큰 (필수)

Usage Example

cURL

# 단순 STT
curl -X POST "https://YOUR_SPACE.hf.space/transcribe" \
  -F "audio=@audio.wav" \
  -F "language=ko" \
  -F "hf_token=hf_YOUR_TOKEN"

# 화자 식별 포함
curl -X POST "https://YOUR_SPACE.hf.space/process" \
  -F "audio=@conversation.wav" \
  -F "voice_sample=@speaker_sample.wav" \
  -F "speaker_name=홍길동" \
  -F "language=ko" \
  -F "hf_token=hf_YOUR_TOKEN"

Python

import requests

# 단순 STT
response = requests.post(
    "https://YOUR_SPACE.hf.space/transcribe",
    files={"audio": open("audio.wav", "rb")},
    data={"language": "ko", "hf_token": "hf_YOUR_TOKEN"}
)
print(response.json())

# 화자 식별 포함
response = requests.post(
    "https://YOUR_SPACE.hf.space/process",
    files={
        "audio": open("conversation.wav", "rb"),
        "voice_sample": open("speaker_sample.wav", "rb")
    },
    data={
        "speaker_name": "홍길동",
        "language": "ko",
        "hf_token": "hf_YOUR_TOKEN"
    }
)
print(response.json())

JavaScript/Node.js

const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

const form = new FormData();
form.append('audio', fs.createReadStream('audio.wav'));
form.append('language', 'ko');
form.append('hf_token', 'hf_YOUR_TOKEN');

const response = await axios.post(
  'https://YOUR_SPACE.hf.space/transcribe',
  form,
  { headers: form.getHeaders() }
);
console.log(response.data);

Response Format

{
  "success": true,
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "안녕하세요",
      "speaker": "홍길동",
      "similarity": 85.3
    }
  ],
  "speaker_stats": {
    "홍길동": {
      "count": 10,
      "duration": 45.5
    }
  },
  "total_segments": 20
}

Notes

ECAPA-TDNN은 유사도 임계값 25% 이상일 때 화자 매칭
GPU 사용 가능 시 자동으로 GPU 활용
지원 오디오 포맷: wav, mp3, m4a, ogg, flac, aac
API 문서: https://YOUR_SPACE.hf.space/docs