Spaces:
Running
title: Voice Agent
emoji: 🐨
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
Speech AI Agent
FastAPI backend + Streamlit UI for a voice agent using Azure Speech (STT/TTS) and Azure AI Foundry Agents (Azure AI Projects SDK).
Setup
- Create a
.envfile (copy from.env.exampleand fill values). - Create a virtual environment and install dependencies (from the project root):
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
Azure AI Foundry auth (local dev)
Foundry Agent auth uses Entra ID. For local dev, run:
az login
Alternatively, set a service principal in your environment:
AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET.
Run backend
python -m uvicorn src.app.main:app --reload --host 0.0.0.0 --port 8000
Run Streamlit UI
streamlit run ui/streamlit_app.py
If the backend isn’t on localhost:8000, set:
SPEECH_AGENT_WS_URL=ws://<host>:<port>/ws/voice
SPEECH_AGENT_HTTP_URL=http://<host>:<port>
For local agent RAG, configure:
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=<your-embeddings-deployment>
Quick tests
Health check:
curl -s http://localhost:8000/health | jq
Test audio upload (expects base64 in response):
curl -s -X POST "http://localhost:8000/v1/voice/file" \
-F "file=@./sample.wav" \
-F "prompt=Answer briefly." | jq -r '.transcript, .reply_text'
Extract audio from response:
curl -s -X POST "http://localhost:8000/v1/voice/file" \
-F "file=@./sample.wav" \
-F "prompt=Answer briefly." \
| python -c "import sys, json, base64; d=json.load(sys.stdin); open('reply.wav','wb').write(base64.b64decode(d['reply_audio_base64']))"
WebSocket streaming (send after stop)
Server endpoint: ws://localhost:8000/ws/voice
Client flow:
- Send
{"event":"start","content_type":"audio/pcm;rate=16000;bits=16;channels=1","return_audio":true} - Send binary audio chunks (PCM16 mono @ 16kHz)
- Send
{"event":"stop","prompt":"Answer briefly."}
Browser note: the demo page streams raw PCM (not container audio) to avoid format issues.
Optional dev demo page: http://localhost:8000/ws-demo
Local Agent (tools + RAG + memory)
The UI includes an Agent toggle. When selected, it uses the local agent
pipeline with tools, local RAG (from data/), and memory.
RAG uses FAISS + Azure OpenAI embeddings. Supported file types:
txt, md, pdf, docx, csv.
Endpoints:
- Upload files for RAG:
POST /v1/agent/upload(multipartfiles) - Reset session data:
POST /v1/agent/reset
Example:
curl -s -X POST "http://localhost:8000/v1/agent/upload" \
-F "files=@./notes.txt"
Hugging Face (Docker Space)
Deploy both FastAPI + Streamlit in a single Docker Space.
- Create a Docker Space and push this repo.
- Set these Space Secrets/Variables:
AZURE_SPEECH_KEYAZURE_SPEECH_REGIONFOUNDRY_PROJECT_CONN_STRFOUNDRY_AGENT_IDAZURE_TENANT_IDAZURE_CLIENT_IDAZURE_CLIENT_SECRETSPEECH_AGENT_WS_URL=wss://<your-space>.hf.space/ws/voice
- The provided
Dockerfile+docker/start.sh+docker/nginx.conf.templatewill:- run FastAPI on
:8000 - run Streamlit on
:8501 - expose everything via nginx on
:$PORT(HF default 7860)
- run FastAPI on