--- title: Voice Agent emoji: 🐨 colorFrom: purple colorTo: pink sdk: docker pinned: false --- # Speech AI Agent FastAPI backend + Streamlit UI for a voice agent using **Azure Speech** (STT/TTS) and **Azure AI Foundry Agents** (Azure AI Projects SDK). ## Setup 1) Create a `.env` file (copy from `.env.example` and fill values). 2) Create a virtual environment and install dependencies (from the project root): ```bash python -m venv .venv source .venv/bin/activate python -m pip install --upgrade pip python -m pip install -r requirements.txt ``` ### Azure AI Foundry auth (local dev) Foundry Agent auth uses Entra ID. For local dev, run: ```bash az login ``` Alternatively, set a service principal in your environment: `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`. ## Run backend ```bash python -m uvicorn src.app.main:app --reload --host 0.0.0.0 --port 8000 ``` ## Run Streamlit UI ```bash streamlit run ui/streamlit_app.py ``` If the backend isn’t on localhost:8000, set: ```bash SPEECH_AGENT_WS_URL=ws://:/ws/voice SPEECH_AGENT_HTTP_URL=http://: ``` For local agent RAG, configure: ```bash AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT= ``` ## Quick tests Health check: ```bash curl -s http://localhost:8000/health | jq ``` Test audio upload (expects base64 in response): ```bash curl -s -X POST "http://localhost:8000/v1/voice/file" \ -F "file=@./sample.wav" \ -F "prompt=Answer briefly." | jq -r '.transcript, .reply_text' ``` Extract audio from response: ```bash curl -s -X POST "http://localhost:8000/v1/voice/file" \ -F "file=@./sample.wav" \ -F "prompt=Answer briefly." \ | python -c "import sys, json, base64; d=json.load(sys.stdin); open('reply.wav','wb').write(base64.b64decode(d['reply_audio_base64']))" ``` ## WebSocket streaming (send after stop) Server endpoint: `ws://localhost:8000/ws/voice` Client flow: 1) Send `{"event":"start","content_type":"audio/pcm;rate=16000;bits=16;channels=1","return_audio":true}` 2) Send binary audio chunks (PCM16 mono @ 16kHz) 3) Send `{"event":"stop","prompt":"Answer briefly."}` Browser note: the demo page streams raw PCM (not container audio) to avoid format issues. Optional dev demo page: http://localhost:8000/ws-demo ## Local Agent (tools + RAG + memory) The UI includes an **Agent** toggle. When selected, it uses the local agent pipeline with tools, local RAG (from `data/`), and memory. RAG uses FAISS + Azure OpenAI embeddings. Supported file types: `txt`, `md`, `pdf`, `docx`, `csv`. Endpoints: - Upload files for RAG: `POST /v1/agent/upload` (multipart `files`) - Reset session data: `POST /v1/agent/reset` Example: ```bash curl -s -X POST "http://localhost:8000/v1/agent/upload" \ -F "files=@./notes.txt" ``` ## Hugging Face (Docker Space) Deploy both FastAPI + Streamlit in a single Docker Space. 1) Create a **Docker Space** and push this repo. 2) Set these **Space Secrets/Variables**: - `AZURE_SPEECH_KEY` - `AZURE_SPEECH_REGION` - `FOUNDRY_PROJECT_CONN_STR` - `FOUNDRY_AGENT_ID` - `AZURE_TENANT_ID` - `AZURE_CLIENT_ID` - `AZURE_CLIENT_SECRET` - `SPEECH_AGENT_WS_URL=wss://.hf.space/ws/voice` 3) The provided `Dockerfile` + `docker/start.sh` + `docker/nginx.conf.template` will: - run FastAPI on `:8000` - run Streamlit on `:8501` - expose everything via nginx on `:$PORT` (HF default 7860)