Spaces:

pgkwon1
/

speechlibProject

Sleeping

App Files Files Community

speechlibProject / README.md

pgkwon1

Upload 4 files

4ebed0f verified 28 days ago

preview code

raw

history blame contribute delete

3.22 kB

	---
	title: Speechlib API
	emoji: 🎤
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_file: app.py
	pinned: false
	---

	# Speechlib REST API (ECAPA-TDNN)

	화자 분리(Speaker Diarization) + 화자 식별(Speaker Identification) + 음성 인식(STT) REST API

	## Features

	- 화자 분리: pyannote/speaker-diarization-3.1로 여러 화자 구분
	- 화자 식별: speechbrain ECAPA-TDNN으로 등록된 화자 식별 (고정밀)
	- 음성 인식: faster-whisper (large-v3-turbo)를 사용한 STT

	## API Endpoints

	### GET /
	API 상태 확인

	### GET /health
	헬스 체크

	### POST /transcribe
	단순 STT + 화자 분리 (화자 식별 없음)

	Parameters (multipart/form-data):
	- `audio`: 오디오 파일 (필수)
	- `language`: 언어 코드 (기본값: ko)
	- `hf_token`: HuggingFace 토큰 (필수)

	### POST /process
	전체 기능: 화자 분리 + 화자 식별 + STT

	Parameters (multipart/form-data):
	- `audio`: 분석할 오디오 파일 (필수)
	- `voice_sample`: 화자 샘플 파일 (선택)
	- `speaker_name`: 식별할 화자 이름 (기본값: speaker)
	- `language`: 언어 코드 (기본값: ko)
	- `hf_token`: HuggingFace 토큰 (필수)

	## Usage Example

	### cURL

	```bash
	# 단순 STT
	curl -X POST "https://YOUR_SPACE.hf.space/transcribe" \
	-F "audio=@audio.wav" \
	-F "language=ko" \
	-F "hf_token=hf_YOUR_TOKEN"

	# 화자 식별 포함
	curl -X POST "https://YOUR_SPACE.hf.space/process" \
	-F "audio=@conversation.wav" \
	-F "voice_sample=@speaker_sample.wav" \
	-F "speaker_name=홍길동" \
	-F "language=ko" \
	-F "hf_token=hf_YOUR_TOKEN"
	```

	### Python

	```python
	import requests

	# 단순 STT
	response = requests.post(
	"https://YOUR_SPACE.hf.space/transcribe",
	files={"audio": open("audio.wav", "rb")},
	data={"language": "ko", "hf_token": "hf_YOUR_TOKEN"}
	)
	print(response.json())

	# 화자 식별 포함
	response = requests.post(
	"https://YOUR_SPACE.hf.space/process",
	files={
	"audio": open("conversation.wav", "rb"),
	"voice_sample": open("speaker_sample.wav", "rb")
	},
	data={
	"speaker_name": "홍길동",
	"language": "ko",
	"hf_token": "hf_YOUR_TOKEN"
	}
	)
	print(response.json())
	```

	### JavaScript/Node.js

	```javascript
	const FormData = require('form-data');
	const fs = require('fs');
	const axios = require('axios');

	const form = new FormData();
	form.append('audio', fs.createReadStream('audio.wav'));
	form.append('language', 'ko');
	form.append('hf_token', 'hf_YOUR_TOKEN');

	const response = await axios.post(
	'https://YOUR_SPACE.hf.space/transcribe',
	form,
	{ headers: form.getHeaders() }
	);
	console.log(response.data);
	```

	## Response Format

	```json
	{
	"success": true,
	"segments": [
	{
	"start": 0.0,
	"end": 2.5,
	"text": "안녕하세요",
	"speaker": "홍길동",
	"similarity": 85.3
	}
	],
	"speaker_stats": {
	"홍길동": {
	"count": 10,
	"duration": 45.5
	}
	},
	"total_segments": 20
	}
	```

	## Notes

	- ECAPA-TDNN은 유사도 임계값 25% 이상일 때 화자 매칭
	- GPU 사용 가능 시 자동으로 GPU 활용
	- 지원 오디오 포맷: wav, mp3, m4a, ogg, flac, aac
	- API 문서: https://YOUR_SPACE.hf.space/docs