Spaces:
Runtime error
title: Voice RAG Bot
emoji: ποΈ
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
Voice RAG Bot
A voice-enabled RAG (Retrieval Augmented Generation) bot.
π Quick Overview
Voice RAG Bot is an intelligent AI customer support system that:
- π€ Accepts voice input via microphone or audio file upload
- π§ Processes with LLM (Groq) for intent detection and response generation
- π Retrieves relevant context from knowledge base and customer history using vector search
- π Analyzes sentiment to provide empathetic, sentiment-aware responses
- π Generates speech output via text-to-speech
- π Orchestrates 9-node workflow using LangGraph
Tech Stack: Faster Whisper (STT) β LangGraph (9 nodes) β Groq LLM β Qdrant (Vector DB) β gTTS (TTS)
π Quick Start (3 Steps)
Step 1: Prerequisites
- Docker Desktop running (for Qdrant)
- Python 3.11+
- Git (optional)
Step 2: Start Qdrant (Vector Database)
docker run -p 6333:6333 qdrant/qdrant:latest
Leave this running in background. β System will auto-create collections.
Step 3: Start Voice RAG Bot
cd d:\Voice RAG Bot\voice-rag-bot
# Activate virtual environment
.\venv\Scripts\Activate.ps1
# Run startup script (starts backend + Streamlit)
.\START_SYSTEM.ps1
Or start services manually:
Terminal 1 (Backend):
.\venv\Scripts\Activate.ps1
python backend/main.py
# Runs on http://localhost:8000
Terminal 2 (Frontend):
.\venv\Scripts\Activate.ps1
streamlit run frontend/streamlit_app.py
# Opens http://localhost:8501
οΏ½ Docker Deployment
Option A: Docker Compose (Recommended for Development)
Start all services (Backend + Frontend + Qdrant + Redis):
docker-compose up -d
Access Points:
- π€ Frontend: http://localhost:8501
- βοΈ Backend: http://localhost:8000
- π Qdrant: http://localhost:6333
- πΎ Redis: localhost:6379
Stop Services:
docker-compose down
Option B: Individual Docker Images
Build Image:
docker build -t voice-rag-bot:latest .
Run Backend:
docker run -p 8000:8000 \
-e APP_TYPE=backend \
-e GROQ_API_KEY=your_key \
-e QDRANT_URL=http://localhost:6333 \
voice-rag-bot:latest
Run Frontend:
docker run -p 8501:8501 \
-e APP_TYPE=frontend \
-e GROQ_API_KEY=your_key \
-e QDRANT_URL=http://localhost:6333 \
voice-rag-bot:latest
π GitHub Actions CI/CD
Setup GitHub Secrets
Add these secrets to your GitHub repository (Settings β Secrets and Variables β Actions):
| Secret Name | Value | Description |
|---|---|---|
GROQ_API_KEY |
gsk_xxxxxxxxxxxx |
Groq API key for LLM |
HF_USERNAME |
your_username |
HuggingFace username |
HF_TOKEN |
hf_xxxxxxxxxxxx |
HuggingFace access token |
HF_SPACE_REPO |
username/voice-rag-bot |
HF Spaces repo path |
How to Add Secrets:
- Go to GitHub repository β Settings
- Click "Secrets and variables" β "Actions"
- Click "New repository secret"
- Add each secret with name and value
Automatic Deployment
The workflow (.github/workflows/docker-build.yml) automatically:
On
mainbranch push:- Builds Docker image
- Pushes to GitHub Container Registry (GHCR)
- Deploys to HuggingFace Spaces
- Generates tags:
main,latest,sha-xxxxx
On Pull Request:
- Builds Docker image (no push)
- Validates Dockerfile syntax
- Tests image build
Workflow File:
- Location:
.github/workflows/docker-build.yml - Triggers: Push to
main/develop, Pull requests - Status: View in GitHub β Actions tab
Access Docker Images:
docker pull ghcr.io/your-username/voice-rag-bot:latest
docker pull ghcr.io/your-username/voice-rag-bot:main
π€ HuggingFace Spaces Deployment
Option A: Automatic Deployment (Via GitHub Actions)
Create HuggingFace Space: https://huggingface.co/spaces
- Name:
voice-rag-bot - License: OpenRAIL
- Private/Public: Your choice
- Name:
Get HF credentials:
- Username: Your HF account name
- Token: https://huggingface.co/settings/tokens (create "write" token)
Add GitHub Secrets (see above):
HF_USERNAMEHF_TOKENHF_SPACE_REPO=username/voice-rag-bot
Push to main branch β Automatic deployment!
Option B: Manual Deployment to HF Spaces
Create HF Space (if not exists):
huggingface-cli repo create voice-rag-bot --type space --space-sdk streamlitClone & Push:
git clone https://huggingface.co/spaces/your-username/voice-rag-bot cd voice-rag-bot # Add your project files cp -r /path/to/voice-rag-bot/* . # Push to HF Spaces git add . git commit -m "Deploy Voice RAG Bot" git push origin mainConfigure Secrets in HF Spaces:
- Go to Space Settings β Variables and secrets
- Add:
GROQ_API_KEY,QDRANT_URL, etc.
App File:
app.py(automatically created)
HF Spaces Configuration (spaces.yaml)
title: Voice RAG Bot
description: Voice-enabled RAG chatbot
app_file: app.py
sdk: streamlit
sdk_version: "1.28.0"
python_version: "3.11"
cpu: true
gpu: true
startup_duration_timeout: 600
HF Spaces Requirements
Note: HuggingFace Spaces runs Streamlit frontend only (no backend microservices).
Options:
Use External Backend:
- Deploy backend separately (Railway, Render, Heroku)
- Update
BACKEND_URLin Streamlit config - Spaces frontend connects to external backend
Self-contained (Frontend Only):
- Remove backend API calls
- Use Streamlit session state for data
- Limited functionality (no vector DB, LLM caching)
Docker-based Space (Advanced):
- Deploy full stack in Docker container
- Requires HF Spaces Docker runtime
- Use
Dockerfile+docker-compose.yml
Recommended: Use external FastAPI backend on Render/Railway + Streamlit on HF Spaces
π§ Environment Variables for Deployment
Local Development
GROQ_API_KEY=gsk_xxxxxxxxxxxx
QDRANT_URL=http://localhost:6333
DEBUG=True
LOG_LEVEL=INFO
Docker Compose
GROQ_API_KEY=gsk_xxxxxxxxxxxx
QDRANT_URL=http://qdrant:6333
BACKEND_URL=http://backend:8000
DEBUG=False
LOG_LEVEL=INFO
HuggingFace Spaces
GROQ_API_KEY=gsk_xxxxxxxxxxxx
BACKEND_URL=https://your-backend-api.herokuapp.com
FRONTEND_MODE=SPACES
GitHub Actions (Auto-set)
REGISTRY: ghcr.ioIMAGE_NAME: ${{ github.repository }}- Secrets: See above
οΏ½π Usage Guide
Via Streamlit Frontend (Recommended)
- Open Browser: http://localhost:8501
- Enter Customer ID: Unique identifier for customer (enables history tracking)
- Choose Input Method:
- Option A: Click π€ Record β Speak your message β Process Audio
- Option B: Upload audio file (MP3/WAV)
- Option C: Type message directly in text area
- View Results (automatically displayed):
- π Generated Response
- π― Detected Intent (+ confidence)
- π Sentiment Analysis (+ confidence)
- π·οΈ Extracted Entities
- π Knowledge Base context (if relevant)
- π Customer History (if relevant)
- π Audio playback of response
Via REST API (For Integration)
Process Audio:
curl -X POST "http://localhost:8000/process-audio?customer_id=CUST_001" \
-F "file=@voice_message.wav"
Process Text:
curl -X POST "http://localhost:8000/process-text" \
-d "user_input=I want to return my laptop&customer_id=CUST_001"
Health Check:
curl http://localhost:8000/health
π System Architecture
Input Layer
ββ π€ Audio Input (Streamlit st.audio_input)
ββ π Text Input (Streamlit text area)
β
Speech-to-Text
ββ Faster Whisper (base model, CPU inference)
β
Orchestration Layer (LangGraph - 9 Nodes)
1. sentiment_analysis (DistilBERT)
2. entity_extraction (BERT-base-NER)
3. intent_detection (Groq LLM)
4. retrieval_router (Qdrant search)
5. context_builder (Format prompt)
6. response_generation (Groq LLM)
7. validation (Hallucination checks)
8. memory_persistence (Qdrant upsert)
9. tts_generation (gTTS)
β
Output Layer
ββ π Text Response
ββ π Sentiment-aware Tone
ββ π Audio File (MP3)
ββ π― Intent Classification
π§ Configuration
Environment Variables (.env):
GROQ_API_KEY=your_groq_api_key_here
QDRANT_URL=http://localhost:6333
BACKEND_URL=http://localhost:8000
VECTOR_DIMENSION=1024
EMBEDDING_MODEL=BAAI/bge-m3
GROQ_MODEL=openai/gpt-oss-20b
KB_COLLECTION_NAME=knowledge_base
HISTORY_COLLECTION_NAME=customer_history
WHISPER_MODEL=base
π Sample Data
Load sample data (4 KB documents + 4 customer history records):
.\venv\Scripts\Activate.ps1
python data/load_sample_data.py
Included Data:
- KB Documents: Return Policy, Shipping Info, Warranty Info, Account Management
- Customer History: 4 interactions (complaints, refunds, inquiries)
π§ͺ Testing
Quick Verification
# Test complete pipeline (end-to-end)
.\venv\Scripts\Activate.ps1
python tests/test_full_integration.py
Expected Output: β FULL INTEGRATION TEST PASSED
Component Status
- β All 9 nodes connected and working
- β FastAPI endpoints operational
- β Qdrant vector search functional
- β LLM integration responding
- β Audio processing working
- β Sample data loadable
π― Intent Types Supported
| Intent | Example | Response |
|---|---|---|
refund_request |
"I want to return this" | Empathetic, processing info |
order_status |
"Where's my order?" | Tracking info |
product_inquiry |
"Tell me about...?" | Product details |
billing_issue |
"My charge was wrong" | Empathetic, billing process |
warranty_claim |
"Product broke" | Warranty eligibility info |
account_management |
"Change my password" | Account instructions |
general_support |
"How do I...?" | General assistance |
complaint |
"This is unacceptable" | Empathetic, resolution steps |
other |
Misc questions | General help |
π Response Quality Factors
- Sentiment Detection: POSITIVE/NEGATIVE/NEUTRAL classification
- Confidence Scores: 0-1 for both intent and sentiment
- Context Retrieval: Up to 3 KB documents + customer history
- Tone Matching: Empathetic for negative, professional for neutral, friendly for positive
- Hallucination Prevention: Validation layer checks for accuracy
π Troubleshooting
Issue: "Backend Not Connected"
Solution: Ensure FastAPI backend is running
python backend/main.py
Issue: "Qdrant Connection Error"
Solution: Start Qdrant Docker container
docker run -p 6333:6333 qdrant/qdrant:latest
Issue: "Groq API Error"
Solution: Check GROQ_API_KEY in .env file
# Verify key is set
echo $env:GROQ_API_KEY
Issue: "Audio Processing Timeout"
Solution: Processing may take 30-60 seconds for audio
- First run downloads models (Whisper, BGE-M3, DistilBERT)
- Subsequent runs are faster
- Ensure sufficient disk space (~5GB)
Issue: "Module Not Found"
Solution: Reinstall dependencies
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt
π Project Structure
d:\Voice RAG Bot\voice-rag-bot\
βββ backend/
β βββ main.py FastAPI server
β βββ config.py Configuration
βββ frontend/
β βββ streamlit_app.py Web UI
βββ orchestration/
β βββ langgraph_workflow.py 9-node workflow
β βββ state.py State management
β βββ nodes/ Individual nodes
β βββ sentiment_analysis.py
β βββ entity_extraction.py
β βββ intent_detection.py
β βββ retrieval_router.py
β βββ context_builder.py
β βββ response_generation.py
β βββ validation.py
β βββ memory_persistence.py
β βββ tts_generation.py
βββ rag/
β βββ qdrant_manager.py Vector DB client
β βββ embedding_manager.py BGE-M3 embeddings
βββ data/
β βββ load_sample_data.py Sample data loader
β βββ audio_output/ Generated audio files
βββ tests/
β βββ test_full_integration.py End-to-end test
βββ .env Configuration
βββ requirements.txt Dependencies
βββ START_SYSTEM.ps1 Quick start script
βββ venv/ Python environment
π Workflow Execution (Behind the Scenes)
- sentiment_analysis: Input β DistilBERT β POSITIVE/NEGATIVE/NEUTRAL
- entity_extraction: Input β BERT-NER β Extract names, locations, etc.
- intent_detection: Input β Groq LLM β 9-intent classification
- retrieval_router: Intent β Qdrant search β 3 KB docs + customer history
- context_builder: Format contexts β Unified prompt
- response_generation: Prompt β Groq LLM β Response text
- validation: Check hallucinations β Retry if needed
- memory_persistence: Embed response β Upsert to Qdrant
- tts_generation: Response text β gTTS β MP3 audio file
π Performance Metrics (Approximate)
| Component | Time | Notes |
|---|---|---|
| STT (Audio β Text) | 5-15s | Depends on audio length |
| Sentiment Analysis | 0.5s | DistilBERT inference |
| Entity Extraction | 0.5s | BERT-NER inference |
| Intent Detection | 1-2s | Groq API call |
| KB Search | 0.2s | Qdrant vector search |
| Response Generation | 2-5s | Groq streaming |
| Validation | 0.5s | Local checks |
| TTS Generation | 2-5s | gTTS processing |
| Total End-to-End | 12-35s | First run slower (model loading) |
π‘ Tips & Tricks
Faster Processing
- Use text input instead of audio (skips STT)
- System caches models after first run
- Keep audio messages under 30 seconds
Better Responses
- Use clear, grammatically correct input
- Provide context ("purchased last week" vs "bought before")
- Specify what you need (return, refund, replacement)
Debugging
- Check
backend/main.pylogs for errors - View Qdrant collections: http://localhost:6333/api/swagger/index.html
- Monitor Streamlit server in terminal for issues
π Next Steps
- Load Sample Data:
python data/load_sample_data.py - Test with Demo Scenarios: Use Streamlit to test various intents
- Customize KB Documents: Add your own documents to Qdrant
- Fine-tune Prompts: Edit prompts in
prompts/directory - Production Deployment: Add authentication, rate limiting, monitoring
π Support & References
Documentation Files:
data/DATA_REQUIREMENTS.md- Data schema documentation.env- Environment configuration
API Endpoints:
POST /process-audio- Audio input endpointPOST /process-text- Text input endpointGET /health- Health check
Backend Logs:
- Location: Console output when running
python backend/main.py - Check for errors, model loading, API calls
π License & Attribution
Components:
- Groq LLM: Free tier, gpt-oss-20b model
- Faster Whisper: OpenAI (MIT License)
- LangGraph: LangChain (Open Source)
- Qdrant: Open source vector database
- BGE-M3: BAAI embeddings model
- DistilBERT: Hugging Face transformers
- gTTS: Google Text-to-Speech
β Verification Checklist
Before considering system "ready for production":
- Backend running on http://localhost:8000
- Qdrant running on http://localhost:6333
- Streamlit frontend accessible at http://localhost:8501
- Sample data loaded (
python data/load_sample_data.py) - Integration test passing (
python tests/test_full_integration.py) - Audio input working (record or upload)
- All 9 nodes executing (check logs)
- Response generation working
- Audio playback working
- History tracking working (multiple messages same customer)
Built with β€οΈ | Last Updated: May 30, 2026