goabonga's picture
feat: add speaker recognition API with SpeechBrain ECAPA-TDNN
7323d5e unverified
---
title: Reachy SpeechBrain API
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---
# Reachy SpeechBrain API
**FastAPI-based Speaker Recognition API for Reachy robots.**
Reachy SpeechBrain API is a lightweight speaker recognition service built with **FastAPI** and **SpeechBrain**, designed to run on **Hugging Face Spaces (Docker)** or locally, and to be easily integrated with **Reachy robots** or any backend.
---
## Features
- 🎀 Speaker recognition powered by **SpeechBrain ECAPA-TDNN**
- πŸ‘€ Speaker enrollment, identification, and verification
- ⚑ FastAPI HTTP API (simple & stateless)
- 🐳 Hugging Face **Docker Space** compatible
- 🧠 CPU-friendly speaker embeddings
- πŸ€– Ready to integrate with **Reachy Mini**
- πŸ“¦ Dependency management with **uv**
- πŸ—„οΈ Flexible storage: local or **Hugging Face Hub** dataset
---
## API Endpoints
### Health check
```
GET /health
```
Response:
```json
{ "status": "ok" }
```
---
### List speakers
```
GET /speakers
```
Response:
```json
{
"speakers": ["alice", "bob", "charlie"]
}
```
---
### Enroll a speaker
```
POST /speakers/{name}/enroll
```
**Request**
- `multipart/form-data`
- Field: `file` (audio file: WAV, MP3, FLAC, etc.)
**Example**
```bash
curl -X POST \
-F "file=@voice_sample.wav" \
http://localhost:7860/speakers/alice/enroll
```
**Response**
```json
{
"message": "Speaker 'alice' enrolled successfully",
"embedding_size": 192
}
```
---
### Delete a speaker
```
DELETE /speakers/{name}
```
**Example**
```bash
curl -X DELETE http://localhost:7860/speakers/alice
```
**Response**
```json
{
"message": "Speaker 'alice' deleted successfully"
}
```
---
### Identify speaker
```
POST /identify
```
Identifies who is speaking from the enrolled speakers.
**Request**
- `multipart/form-data`
- Field: `file` (audio file)
**Example**
```bash
curl -X POST \
-F "file=@unknown_voice.wav" \
http://localhost:7860/identify
```
**Response**
```json
{
"identified": true,
"speaker": "alice",
"confidence": 0.85,
"threshold": 0.25
}
```
---
### Verify speaker
```
POST /verify?name={speaker_name}
```
Verifies if the audio matches a specific speaker.
**Request**
- Query param: `name` (speaker name to verify against)
- `multipart/form-data`
- Field: `file` (audio file)
**Example**
```bash
curl -X POST \
-F "file=@voice.wav" \
"http://localhost:7860/verify?name=alice"
```
**Response**
```json
{
"verified": true,
"speaker": "alice",
"confidence": 0.92,
"threshold": 0.25
}
```
---
## Deployment (Hugging Face Space)
Recommended setup:
- **Space type**: `Docker`
- **Hardware**: CPU (default) or GPU
- **Exposed port**: `7860`
### Repository structure
```
reachy-speechbrain-api/
β”œβ”€β”€ app.py # FastAPI application
β”œβ”€β”€ storage.py # Storage backends (local & HuggingFace)
β”œβ”€β”€ Dockerfile # Docker image definition
β”œβ”€β”€ pyproject.toml # Project configuration and dependencies
β”œβ”€β”€ uv.lock # Lockfile for reproducible builds
β”œβ”€β”€ .gitignore # Git ignore rules
β”œβ”€β”€ speakers/ # Speaker embeddings storage (created at runtime)
β”œβ”€β”€ tests/ # Test suite
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ conftest.py # Pytest fixtures
β”‚ β”œβ”€β”€ test_api.py # API tests
β”‚ └── test_storage.py # Storage tests
└── README.md
```
Once pushed, the Space will automatically build and expose:
```
https://<username>-<space-name>.hf.space
```
---
## Docker (local run)
```bash
docker build -t reachy-speechbrain-api .
docker run -p 7860:7860 reachy-speechbrain-api
```
---
## Storage Configuration
Speaker embeddings can be stored locally or on Hugging Face Hub.
### Local storage (default)
By default, embeddings are stored in `speakers/embeddings.json`. No configuration needed.
### Hugging Face Hub storage
To persist embeddings in a Hugging Face dataset (useful for sharing between instances):
```bash
# Set environment variables
export HF_EMBEDDINGS_REPO="username/my-speaker-embeddings"
export HF_TOKEN="hf_xxxxxxxxxxxxx" # Optional if logged in via `huggingface-cli login`
# Run the API
uv run uvicorn app:app --host 0.0.0.0 --port 7860
```
Or in Docker:
```bash
docker run -p 7860:7860 \
-e HF_EMBEDDINGS_REPO="username/my-speaker-embeddings" \
-e HF_TOKEN="hf_xxxxxxxxxxxxx" \
reachy-speechbrain-api
```
The dataset will be created automatically (as private) if it doesn't exist.
---
## Dependencies
Dependencies are managed using **uv**.
Main dependencies:
- `fastapi`
- `uvicorn`
- `speechbrain` (develop branch)
- `torchaudio`
- `python-multipart`
- `requests`
- `huggingface-hub`
The lockfile (`uv.lock`) ensures reproducible builds.
---
## Development
Install dev dependencies:
```bash
uv sync --extra dev
```
### Tools
- **ruff** - Linter and formatter
- **mypy** - Static type checker
- **pytest** - Testing framework
- **pytest-cov** - Code coverage
### Run tests
```bash
uv run pytest
```
Coverage report is generated in `htmlcov/` and displayed in terminal.
### Lint and format
```bash
uv run ruff check .
uv run ruff format .
```
### Type checking
```bash
uv run mypy .
```
### Release workflow
This project uses [commitizen](https://commitizen-tools.github.io/commitizen/) for versioning and changelog generation.
To trigger a new release, push a commit to `main` with the message `chore: release a new version`:
```bash
git commit --allow-empty -m "chore: release a new version"
git push origin main
```
This will:
1. Bump the version based on conventional commits
2. Generate/update the CHANGELOG
3. Create a GitHub Release
4. Sync to Hugging Face Space
---
## Usage with Reachy
This API is designed to be called from:
- Reachy Mini
- A central VPS backend
- Another Hugging Face Space
Typical flow:
1. **Enrollment**: Record voice samples from known users and enroll them
2. **Identification**: When someone speaks, send audio to `/identify` to know who it is
3. **Verification**: Use `/verify` to confirm a claimed identity
Use cases:
- Personalized interactions based on who is speaking
- Access control for voice commands
- Multi-user conversation tracking
---
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.