goabonga's picture
feat: add speaker recognition API with SpeechBrain ECAPA-TDNN
7323d5e unverified
metadata
title: Reachy SpeechBrain API
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

Reachy SpeechBrain API

FastAPI-based Speaker Recognition API for Reachy robots.

Reachy SpeechBrain API is a lightweight speaker recognition service built with FastAPI and SpeechBrain, designed to run on Hugging Face Spaces (Docker) or locally, and to be easily integrated with Reachy robots or any backend.


Features

  • 🎀 Speaker recognition powered by SpeechBrain ECAPA-TDNN
  • πŸ‘€ Speaker enrollment, identification, and verification
  • ⚑ FastAPI HTTP API (simple & stateless)
  • 🐳 Hugging Face Docker Space compatible
  • 🧠 CPU-friendly speaker embeddings
  • πŸ€– Ready to integrate with Reachy Mini
  • πŸ“¦ Dependency management with uv
  • πŸ—„οΈ Flexible storage: local or Hugging Face Hub dataset

API Endpoints

Health check

GET /health

Response:

{ "status": "ok" }

List speakers

GET /speakers

Response:

{
  "speakers": ["alice", "bob", "charlie"]
}

Enroll a speaker

POST /speakers/{name}/enroll

Request

  • multipart/form-data
  • Field: file (audio file: WAV, MP3, FLAC, etc.)

Example

curl -X POST \
  -F "file=@voice_sample.wav" \
  http://localhost:7860/speakers/alice/enroll

Response

{
  "message": "Speaker 'alice' enrolled successfully",
  "embedding_size": 192
}

Delete a speaker

DELETE /speakers/{name}

Example

curl -X DELETE http://localhost:7860/speakers/alice

Response

{
  "message": "Speaker 'alice' deleted successfully"
}

Identify speaker

POST /identify

Identifies who is speaking from the enrolled speakers.

Request

  • multipart/form-data
  • Field: file (audio file)

Example

curl -X POST \
  -F "file=@unknown_voice.wav" \
  http://localhost:7860/identify

Response

{
  "identified": true,
  "speaker": "alice",
  "confidence": 0.85,
  "threshold": 0.25
}

Verify speaker

POST /verify?name={speaker_name}

Verifies if the audio matches a specific speaker.

Request

  • Query param: name (speaker name to verify against)
  • multipart/form-data
  • Field: file (audio file)

Example

curl -X POST \
  -F "file=@voice.wav" \
  "http://localhost:7860/verify?name=alice"

Response

{
  "verified": true,
  "speaker": "alice",
  "confidence": 0.92,
  "threshold": 0.25
}

Deployment (Hugging Face Space)

Recommended setup:

  • Space type: Docker
  • Hardware: CPU (default) or GPU
  • Exposed port: 7860

Repository structure

reachy-speechbrain-api/
β”œβ”€β”€ app.py              # FastAPI application
β”œβ”€β”€ storage.py          # Storage backends (local & HuggingFace)
β”œβ”€β”€ Dockerfile          # Docker image definition
β”œβ”€β”€ pyproject.toml      # Project configuration and dependencies
β”œβ”€β”€ uv.lock             # Lockfile for reproducible builds
β”œβ”€β”€ .gitignore          # Git ignore rules
β”œβ”€β”€ speakers/           # Speaker embeddings storage (created at runtime)
β”œβ”€β”€ tests/              # Test suite
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ conftest.py     # Pytest fixtures
β”‚   β”œβ”€β”€ test_api.py     # API tests
β”‚   └── test_storage.py # Storage tests
└── README.md

Once pushed, the Space will automatically build and expose:

https://<username>-<space-name>.hf.space

Docker (local run)

docker build -t reachy-speechbrain-api .
docker run -p 7860:7860 reachy-speechbrain-api

Storage Configuration

Speaker embeddings can be stored locally or on Hugging Face Hub.

Local storage (default)

By default, embeddings are stored in speakers/embeddings.json. No configuration needed.

Hugging Face Hub storage

To persist embeddings in a Hugging Face dataset (useful for sharing between instances):

# Set environment variables
export HF_EMBEDDINGS_REPO="username/my-speaker-embeddings"
export HF_TOKEN="hf_xxxxxxxxxxxxx"  # Optional if logged in via `huggingface-cli login`

# Run the API
uv run uvicorn app:app --host 0.0.0.0 --port 7860

Or in Docker:

docker run -p 7860:7860 \
  -e HF_EMBEDDINGS_REPO="username/my-speaker-embeddings" \
  -e HF_TOKEN="hf_xxxxxxxxxxxxx" \
  reachy-speechbrain-api

The dataset will be created automatically (as private) if it doesn't exist.


Dependencies

Dependencies are managed using uv.

Main dependencies:

  • fastapi
  • uvicorn
  • speechbrain (develop branch)
  • torchaudio
  • python-multipart
  • requests
  • huggingface-hub

The lockfile (uv.lock) ensures reproducible builds.


Development

Install dev dependencies:

uv sync --extra dev

Tools

  • ruff - Linter and formatter
  • mypy - Static type checker
  • pytest - Testing framework
  • pytest-cov - Code coverage

Run tests

uv run pytest

Coverage report is generated in htmlcov/ and displayed in terminal.

Lint and format

uv run ruff check .
uv run ruff format .

Type checking

uv run mypy .

Release workflow

This project uses commitizen for versioning and changelog generation.

To trigger a new release, push a commit to main with the message chore: release a new version:

git commit --allow-empty -m "chore: release a new version"
git push origin main

This will:

  1. Bump the version based on conventional commits
  2. Generate/update the CHANGELOG
  3. Create a GitHub Release
  4. Sync to Hugging Face Space

Usage with Reachy

This API is designed to be called from:

  • Reachy Mini
  • A central VPS backend
  • Another Hugging Face Space

Typical flow:

  1. Enrollment: Record voice samples from known users and enroll them
  2. Identification: When someone speaks, send audio to /identify to know who it is
  3. Verification: Use /verify to confirm a claimed identity

Use cases:

  • Personalized interactions based on who is speaking
  • Access control for voice commands
  • Multi-user conversation tracking

License

This project is licensed under the MIT License - see the LICENSE file for details.