---
title: Voice Agent
emoji: 🐨
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
---

# Speech AI Agent

FastAPI backend + Streamlit UI for a voice agent using **Azure Speech** (STT/TTS) and **Azure AI Foundry Agents** (Azure AI Projects SDK).

## Setup

1) Create a `.env` file (copy from `.env.example` and fill values).
2) Create a virtual environment and install dependencies (from the project root):

```bash
python -m venv .venv
source .venv/bin/activate

python -m pip install --upgrade pip
python -m pip install -r requirements.txt
```

### Azure AI Foundry auth (local dev)

Foundry Agent auth uses Entra ID. For local dev, run:

```bash
az login
```

Alternatively, set a service principal in your environment:
`AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`.

## Run backend

```bash
python -m uvicorn src.app.main:app --reload --host 0.0.0.0 --port 8000
```

## Run Streamlit UI

```bash
streamlit run ui/streamlit_app.py
```

If the backend isn’t on localhost:8000, set:

```bash
SPEECH_AGENT_WS_URL=ws://<host>:<port>/ws/voice
SPEECH_AGENT_HTTP_URL=http://<host>:<port>
```

For local agent RAG, configure:

```bash
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=<your-embeddings-deployment>
```

## Quick tests

Health check:

```bash
curl -s http://localhost:8000/health | jq
```

Test audio upload (expects base64 in response):

```bash
curl -s -X POST "http://localhost:8000/v1/voice/file" \
  -F "file=@./sample.wav" \
  -F "prompt=Answer briefly." | jq -r '.transcript, .reply_text'
```

Extract audio from response:

```bash
curl -s -X POST "http://localhost:8000/v1/voice/file" \
  -F "file=@./sample.wav" \
  -F "prompt=Answer briefly." \
| python -c "import sys, json, base64; d=json.load(sys.stdin); open('reply.wav','wb').write(base64.b64decode(d['reply_audio_base64']))"
```

## WebSocket streaming (send after stop)

Server endpoint: `ws://localhost:8000/ws/voice`

Client flow:
1) Send `{"event":"start","content_type":"audio/pcm;rate=16000;bits=16;channels=1","return_audio":true}`
2) Send binary audio chunks (PCM16 mono @ 16kHz)
3) Send `{"event":"stop","prompt":"Answer briefly."}`

Browser note: the demo page streams raw PCM (not container audio) to avoid format issues.

Optional dev demo page: http://localhost:8000/ws-demo

## Local Agent (tools + RAG + memory)

The UI includes an **Agent** toggle. When selected, it uses the local agent
pipeline with tools, local RAG (from `data/`), and memory.

RAG uses FAISS + Azure OpenAI embeddings. Supported file types:
`txt`, `md`, `pdf`, `docx`, `csv`.

Endpoints:
- Upload files for RAG: `POST /v1/agent/upload` (multipart `files`)
- Reset session data: `POST /v1/agent/reset`

Example:

```bash
curl -s -X POST "http://localhost:8000/v1/agent/upload" \
  -F "files=@./notes.txt"
```

## Hugging Face (Docker Space)

Deploy both FastAPI + Streamlit in a single Docker Space.

1) Create a **Docker Space** and push this repo.
2) Set these **Space Secrets/Variables**:
   - `AZURE_SPEECH_KEY`
   - `AZURE_SPEECH_REGION`
   - `FOUNDRY_PROJECT_CONN_STR`
   - `FOUNDRY_AGENT_ID`
   - `AZURE_TENANT_ID`
   - `AZURE_CLIENT_ID`
   - `AZURE_CLIENT_SECRET`
   - `SPEECH_AGENT_WS_URL=wss://<your-space>.hf.space/ws/voice`
3) The provided `Dockerfile` + `docker/start.sh` + `docker/nginx.conf.template` will:
   - run FastAPI on `:8000`
   - run Streamlit on `:8501`
   - expose everything via nginx on `:$PORT` (HF default 7860)