Spaces:

goabonga
/

reachy-speechbrain-api

Sleeping

App Files Files Community

reachy-speechbrain-api / README.md

goabonga

feat: add speaker recognition API with SpeechBrain ECAPA-TDNN

7323d5e unverified about 2 months ago

preview code

raw

history blame contribute delete

6.34 kB

metadata

title: Reachy SpeechBrain API
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

Reachy SpeechBrain API

FastAPI-based Speaker Recognition API for Reachy robots.

Reachy SpeechBrain API is a lightweight speaker recognition service built with FastAPI and SpeechBrain, designed to run on Hugging Face Spaces (Docker) or locally, and to be easily integrated with Reachy robots or any backend.

Features

🎤 Speaker recognition powered by SpeechBrain ECAPA-TDNN
👤 Speaker enrollment, identification, and verification
⚡ FastAPI HTTP API (simple & stateless)
🐳 Hugging Face Docker Space compatible
🧠 CPU-friendly speaker embeddings
🤖 Ready to integrate with Reachy Mini
📦 Dependency management with uv
🗄️ Flexible storage: local or Hugging Face Hub dataset

API Endpoints

Health check

GET /health

Response:

{ "status": "ok" }

List speakers

GET /speakers

Response:

{
  "speakers": ["alice", "bob", "charlie"]
}

Enroll a speaker

POST /speakers/{name}/enroll

Request

multipart/form-data
Field: file (audio file: WAV, MP3, FLAC, etc.)

Example

curl -X POST \
  -F "file=@voice_sample.wav" \
  http://localhost:7860/speakers/alice/enroll

Response

{
  "message": "Speaker 'alice' enrolled successfully",
  "embedding_size": 192
}

Delete a speaker

DELETE /speakers/{name}

Example

curl -X DELETE http://localhost:7860/speakers/alice

Response

{
  "message": "Speaker 'alice' deleted successfully"
}

Identify speaker

POST /identify

Identifies who is speaking from the enrolled speakers.

Request

multipart/form-data
Field: file (audio file)

Example

curl -X POST \
  -F "file=@unknown_voice.wav" \
  http://localhost:7860/identify

Response

{
  "identified": true,
  "speaker": "alice",
  "confidence": 0.85,
  "threshold": 0.25
}

Verify speaker

POST /verify?name={speaker_name}

Verifies if the audio matches a specific speaker.

Request

Query param: name (speaker name to verify against)
multipart/form-data
Field: file (audio file)

Example

curl -X POST \
  -F "file=@voice.wav" \
  "http://localhost:7860/verify?name=alice"

Response

{
  "verified": true,
  "speaker": "alice",
  "confidence": 0.92,
  "threshold": 0.25
}

Deployment (Hugging Face Space)

Recommended setup:

Space type: Docker
Hardware: CPU (default) or GPU
Exposed port: 7860

Repository structure

reachy-speechbrain-api/
├── app.py              # FastAPI application
├── storage.py          # Storage backends (local & HuggingFace)
├── Dockerfile          # Docker image definition
├── pyproject.toml      # Project configuration and dependencies
├── uv.lock             # Lockfile for reproducible builds
├── .gitignore          # Git ignore rules
├── speakers/           # Speaker embeddings storage (created at runtime)
├── tests/              # Test suite
│   ├── __init__.py
│   ├── conftest.py     # Pytest fixtures
│   ├── test_api.py     # API tests
│   └── test_storage.py # Storage tests
└── README.md

Once pushed, the Space will automatically build and expose:

https://<username>-<space-name>.hf.space

Docker (local run)

docker build -t reachy-speechbrain-api .
docker run -p 7860:7860 reachy-speechbrain-api

Storage Configuration

Speaker embeddings can be stored locally or on Hugging Face Hub.

Local storage (default)

By default, embeddings are stored in speakers/embeddings.json. No configuration needed.

Hugging Face Hub storage

To persist embeddings in a Hugging Face dataset (useful for sharing between instances):

# Set environment variables
export HF_EMBEDDINGS_REPO="username/my-speaker-embeddings"
export HF_TOKEN="hf_xxxxxxxxxxxxx"  # Optional if logged in via `huggingface-cli login`

# Run the API
uv run uvicorn app:app --host 0.0.0.0 --port 7860

Or in Docker:

docker run -p 7860:7860 \
  -e HF_EMBEDDINGS_REPO="username/my-speaker-embeddings" \
  -e HF_TOKEN="hf_xxxxxxxxxxxxx" \
  reachy-speechbrain-api

The dataset will be created automatically (as private) if it doesn't exist.

Dependencies

Dependencies are managed using uv.

Main dependencies:

fastapi
uvicorn
speechbrain (develop branch)
torchaudio
python-multipart
requests
huggingface-hub

The lockfile (uv.lock) ensures reproducible builds.

Development

Install dev dependencies:

uv sync --extra dev

Tools

ruff - Linter and formatter
mypy - Static type checker
pytest - Testing framework
pytest-cov - Code coverage

Run tests

uv run pytest

Coverage report is generated in htmlcov/ and displayed in terminal.

Lint and format

uv run ruff check .
uv run ruff format .

Type checking

uv run mypy .

Release workflow

This project uses commitizen for versioning and changelog generation.

To trigger a new release, push a commit to main with the message chore: release a new version:

git commit --allow-empty -m "chore: release a new version"
git push origin main

This will:

Bump the version based on conventional commits
Generate/update the CHANGELOG
Create a GitHub Release
Sync to Hugging Face Space

Usage with Reachy

This API is designed to be called from:

Reachy Mini
A central VPS backend
Another Hugging Face Space

Typical flow:

Enrollment: Record voice samples from known users and enroll them
Identification: When someone speaks, send audio to /identify to know who it is
Verification: Use /verify to confirm a claimed identity

Use cases:

Personalized interactions based on who is speaking
Access control for voice commands
Multi-user conversation tracking

License

This project is licensed under the MIT License - see the LICENSE file for details.