Spaces:

Sbboss
/

voice-agent

Running

App Files Files Community

voice-agent / README.md

Sbboss

RAG, language updates

0b2d478 2 months ago

preview code

raw

history blame contribute delete

3.42 kB

	---
	title: Voice Agent
	emoji: 🐨
	colorFrom: purple
	colorTo: pink
	sdk: docker
	pinned: false
	---

	# Speech AI Agent

	FastAPI backend + Streamlit UI for a voice agent using Azure Speech (STT/TTS) and Azure AI Foundry Agents (Azure AI Projects SDK).

	## Setup

	1) Create a `.env` file (copy from `.env.example` and fill values).
	2) Create a virtual environment and install dependencies (from the project root):

	```bash
	python -m venv .venv
	source .venv/bin/activate

	python -m pip install --upgrade pip
	python -m pip install -r requirements.txt
	```

	### Azure AI Foundry auth (local dev)

	Foundry Agent auth uses Entra ID. For local dev, run:

	```bash
	az login
	```

	Alternatively, set a service principal in your environment:
	`AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`.

	## Run backend

	```bash
	python -m uvicorn src.app.main:app --reload --host 0.0.0.0 --port 8000
	```

	## Run Streamlit UI

	```bash
	streamlit run ui/streamlit_app.py
	```

	If the backend isn’t on localhost:8000, set:

	```bash
	SPEECH_AGENT_WS_URL=ws://<host>:<port>/ws/voice
	SPEECH_AGENT_HTTP_URL=http://<host>:<port>
	```

	For local agent RAG, configure:

	```bash
	AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=<your-embeddings-deployment>
	```

	## Quick tests

	Health check:

	```bash
	curl -s http://localhost:8000/health \| jq
	```

	Test audio upload (expects base64 in response):

	```bash
	curl -s -X POST "http://localhost:8000/v1/voice/file" \
	-F "file=@./sample.wav" \
	-F "prompt=Answer briefly." \| jq -r '.transcript, .reply_text'
	```

	Extract audio from response:

	```bash
	curl -s -X POST "http://localhost:8000/v1/voice/file" \
	-F "file=@./sample.wav" \
	-F "prompt=Answer briefly." \
	\| python -c "import sys, json, base64; d=json.load(sys.stdin); open('reply.wav','wb').write(base64.b64decode(d['reply_audio_base64']))"
	```

	## WebSocket streaming (send after stop)

	Server endpoint: `ws://localhost:8000/ws/voice`

	Client flow:
	1) Send `{"event":"start","content_type":"audio/pcm;rate=16000;bits=16;channels=1","return_audio":true}`
	2) Send binary audio chunks (PCM16 mono @ 16kHz)
	3) Send `{"event":"stop","prompt":"Answer briefly."}`

	Browser note: the demo page streams raw PCM (not container audio) to avoid format issues.

	Optional dev demo page: http://localhost:8000/ws-demo

	## Local Agent (tools + RAG + memory)

	The UI includes an Agent toggle. When selected, it uses the local agent
	pipeline with tools, local RAG (from `data/`), and memory.

	RAG uses FAISS + Azure OpenAI embeddings. Supported file types:
	`txt`, `md`, `pdf`, `docx`, `csv`.

	Endpoints:
	- Upload files for RAG: `POST /v1/agent/upload` (multipart `files`)
	- Reset session data: `POST /v1/agent/reset`

	Example:

	```bash
	curl -s -X POST "http://localhost:8000/v1/agent/upload" \
	-F "files=@./notes.txt"
	```

	## Hugging Face (Docker Space)

	Deploy both FastAPI + Streamlit in a single Docker Space.

	1) Create a Docker Space and push this repo.
	2) Set these Space Secrets/Variables:
	- `AZURE_SPEECH_KEY`
	- `AZURE_SPEECH_REGION`
	- `FOUNDRY_PROJECT_CONN_STR`
	- `FOUNDRY_AGENT_ID`
	- `AZURE_TENANT_ID`
	- `AZURE_CLIENT_ID`
	- `AZURE_CLIENT_SECRET`
	- `SPEECH_AGENT_WS_URL=wss://<your-space>.hf.space/ws/voice`
	3) The provided `Dockerfile` + `docker/start.sh` + `docker/nginx.conf.template` will:
	- run FastAPI on `:8000`
	- run Streamlit on `:8501`
	- expose everything via nginx on `:$PORT` (HF default 7860)