Spaces:
Running
Running
| title: Voice Agent | |
| emoji: 🐨 | |
| colorFrom: purple | |
| colorTo: pink | |
| sdk: docker | |
| pinned: false | |
| # Speech AI Agent | |
| FastAPI backend + Streamlit UI for a voice agent using **Azure Speech** (STT/TTS) and **Azure AI Foundry Agents** (Azure AI Projects SDK). | |
| ## Setup | |
| 1) Create a `.env` file (copy from `.env.example` and fill values). | |
| 2) Create a virtual environment and install dependencies (from the project root): | |
| ```bash | |
| python -m venv .venv | |
| source .venv/bin/activate | |
| python -m pip install --upgrade pip | |
| python -m pip install -r requirements.txt | |
| ``` | |
| ### Azure AI Foundry auth (local dev) | |
| Foundry Agent auth uses Entra ID. For local dev, run: | |
| ```bash | |
| az login | |
| ``` | |
| Alternatively, set a service principal in your environment: | |
| `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`. | |
| ## Run backend | |
| ```bash | |
| python -m uvicorn src.app.main:app --reload --host 0.0.0.0 --port 8000 | |
| ``` | |
| ## Run Streamlit UI | |
| ```bash | |
| streamlit run ui/streamlit_app.py | |
| ``` | |
| If the backend isn’t on localhost:8000, set: | |
| ```bash | |
| SPEECH_AGENT_WS_URL=ws://<host>:<port>/ws/voice | |
| SPEECH_AGENT_HTTP_URL=http://<host>:<port> | |
| ``` | |
| For local agent RAG, configure: | |
| ```bash | |
| AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=<your-embeddings-deployment> | |
| ``` | |
| ## Quick tests | |
| Health check: | |
| ```bash | |
| curl -s http://localhost:8000/health | jq | |
| ``` | |
| Test audio upload (expects base64 in response): | |
| ```bash | |
| curl -s -X POST "http://localhost:8000/v1/voice/file" \ | |
| -F "file=@./sample.wav" \ | |
| -F "prompt=Answer briefly." | jq -r '.transcript, .reply_text' | |
| ``` | |
| Extract audio from response: | |
| ```bash | |
| curl -s -X POST "http://localhost:8000/v1/voice/file" \ | |
| -F "file=@./sample.wav" \ | |
| -F "prompt=Answer briefly." \ | |
| | python -c "import sys, json, base64; d=json.load(sys.stdin); open('reply.wav','wb').write(base64.b64decode(d['reply_audio_base64']))" | |
| ``` | |
| ## WebSocket streaming (send after stop) | |
| Server endpoint: `ws://localhost:8000/ws/voice` | |
| Client flow: | |
| 1) Send `{"event":"start","content_type":"audio/pcm;rate=16000;bits=16;channels=1","return_audio":true}` | |
| 2) Send binary audio chunks (PCM16 mono @ 16kHz) | |
| 3) Send `{"event":"stop","prompt":"Answer briefly."}` | |
| Browser note: the demo page streams raw PCM (not container audio) to avoid format issues. | |
| Optional dev demo page: http://localhost:8000/ws-demo | |
| ## Local Agent (tools + RAG + memory) | |
| The UI includes an **Agent** toggle. When selected, it uses the local agent | |
| pipeline with tools, local RAG (from `data/`), and memory. | |
| RAG uses FAISS + Azure OpenAI embeddings. Supported file types: | |
| `txt`, `md`, `pdf`, `docx`, `csv`. | |
| Endpoints: | |
| - Upload files for RAG: `POST /v1/agent/upload` (multipart `files`) | |
| - Reset session data: `POST /v1/agent/reset` | |
| Example: | |
| ```bash | |
| curl -s -X POST "http://localhost:8000/v1/agent/upload" \ | |
| -F "files=@./notes.txt" | |
| ``` | |
| ## Hugging Face (Docker Space) | |
| Deploy both FastAPI + Streamlit in a single Docker Space. | |
| 1) Create a **Docker Space** and push this repo. | |
| 2) Set these **Space Secrets/Variables**: | |
| - `AZURE_SPEECH_KEY` | |
| - `AZURE_SPEECH_REGION` | |
| - `FOUNDRY_PROJECT_CONN_STR` | |
| - `FOUNDRY_AGENT_ID` | |
| - `AZURE_TENANT_ID` | |
| - `AZURE_CLIENT_ID` | |
| - `AZURE_CLIENT_SECRET` | |
| - `SPEECH_AGENT_WS_URL=wss://<your-space>.hf.space/ws/voice` | |
| 3) The provided `Dockerfile` + `docker/start.sh` + `docker/nginx.conf.template` will: | |
| - run FastAPI on `:8000` | |
| - run Streamlit on `:8501` | |
| - expose everything via nginx on `:$PORT` (HF default 7860) | |