--- title: Voice RAG Bot emoji: ๐ŸŽ™๏ธ colorFrom: blue colorTo: purple sdk: docker pinned: false --- # Voice RAG Bot A voice-enabled RAG (Retrieval Augmented Generation) bot. ## ๐Ÿ“‹ Quick Overview Voice RAG Bot is an intelligent AI customer support system that: - ๐ŸŽค **Accepts voice input** via microphone or audio file upload - ๐Ÿง  **Processes with LLM** (Groq) for intent detection and response generation - ๐Ÿ“š **Retrieves relevant context** from knowledge base and customer history using vector search - ๐Ÿ˜Š **Analyzes sentiment** to provide empathetic, sentiment-aware responses - ๐Ÿ”Š **Generates speech output** via text-to-speech - ๐Ÿ“Š **Orchestrates 9-node workflow** using LangGraph **Tech Stack**: Faster Whisper (STT) โ†’ LangGraph (9 nodes) โ†’ Groq LLM โ†’ Qdrant (Vector DB) โ†’ gTTS (TTS) --- ## ๐Ÿš€ Quick Start (3 Steps) ### Step 1: Prerequisites - Docker Desktop running (for Qdrant) - Python 3.11+ - Git (optional) ### Step 2: Start Qdrant (Vector Database) ```bash docker run -p 6333:6333 qdrant/qdrant:latest ``` Leave this running in background. โœ… System will auto-create collections. ### Step 3: Start Voice RAG Bot ```bash cd d:\Voice RAG Bot\voice-rag-bot # Activate virtual environment .\venv\Scripts\Activate.ps1 # Run startup script (starts backend + Streamlit) .\START_SYSTEM.ps1 ``` **Or start services manually:** Terminal 1 (Backend): ```bash .\venv\Scripts\Activate.ps1 python backend/main.py # Runs on http://localhost:8000 ``` Terminal 2 (Frontend): ```bash .\venv\Scripts\Activate.ps1 streamlit run frontend/streamlit_app.py # Opens http://localhost:8501 ``` --- ## ๏ฟฝ Docker Deployment ### Option A: Docker Compose (Recommended for Development) Start all services (Backend + Frontend + Qdrant + Redis): ```bash docker-compose up -d ``` **Access Points:** - ๐ŸŽค Frontend: http://localhost:8501 - โš™๏ธ Backend: http://localhost:8000 - ๐Ÿ“š Qdrant: http://localhost:6333 - ๐Ÿ’พ Redis: localhost:6379 **Stop Services:** ```bash docker-compose down ``` ### Option B: Individual Docker Images **Build Image:** ```bash docker build -t voice-rag-bot:latest . ``` **Run Backend:** ```bash docker run -p 8000:8000 \ -e APP_TYPE=backend \ -e GROQ_API_KEY=your_key \ -e QDRANT_URL=http://localhost:6333 \ voice-rag-bot:latest ``` **Run Frontend:** ```bash docker run -p 8501:8501 \ -e APP_TYPE=frontend \ -e GROQ_API_KEY=your_key \ -e QDRANT_URL=http://localhost:6333 \ voice-rag-bot:latest ``` --- ## ๐Ÿš€ GitHub Actions CI/CD ### Setup GitHub Secrets Add these secrets to your GitHub repository (Settings โ†’ Secrets and Variables โ†’ Actions): | Secret Name | Value | Description | |------------|-------|-------------| | `GROQ_API_KEY` | `gsk_xxxxxxxxxxxx` | Groq API key for LLM | | `HF_USERNAME` | `your_username` | HuggingFace username | | `HF_TOKEN` | `hf_xxxxxxxxxxxx` | HuggingFace access token | | `HF_SPACE_REPO` | `username/voice-rag-bot` | HF Spaces repo path | **How to Add Secrets:** 1. Go to GitHub repository โ†’ Settings 2. Click "Secrets and variables" โ†’ "Actions" 3. Click "New repository secret" 4. Add each secret with name and value ### Automatic Deployment The workflow (`.github/workflows/docker-build.yml`) automatically: 1. **On `main` branch push:** - Builds Docker image - Pushes to GitHub Container Registry (GHCR) - Deploys to HuggingFace Spaces - Generates tags: `main`, `latest`, `sha-xxxxx` 2. **On Pull Request:** - Builds Docker image (no push) - Validates Dockerfile syntax - Tests image build **Workflow File:** - Location: `.github/workflows/docker-build.yml` - Triggers: Push to `main`/`develop`, Pull requests - Status: View in GitHub โ†’ Actions tab **Access Docker Images:** ```bash docker pull ghcr.io/your-username/voice-rag-bot:latest docker pull ghcr.io/your-username/voice-rag-bot:main ``` --- ## ๐Ÿค— HuggingFace Spaces Deployment ### Option A: Automatic Deployment (Via GitHub Actions) 1. Create HuggingFace Space: https://huggingface.co/spaces - Name: `voice-rag-bot` - License: OpenRAIL - Private/Public: Your choice 2. Get HF credentials: - Username: Your HF account name - Token: https://huggingface.co/settings/tokens (create "write" token) 3. Add GitHub Secrets (see above): - `HF_USERNAME` - `HF_TOKEN` - `HF_SPACE_REPO` = `username/voice-rag-bot` 4. **Push to main branch โ†’ Automatic deployment!** ### Option B: Manual Deployment to HF Spaces 1. **Create HF Space (if not exists):** ```bash huggingface-cli repo create voice-rag-bot --type space --space-sdk streamlit ``` 2. **Clone & Push:** ```bash git clone https://huggingface.co/spaces/your-username/voice-rag-bot cd voice-rag-bot # Add your project files cp -r /path/to/voice-rag-bot/* . # Push to HF Spaces git add . git commit -m "Deploy Voice RAG Bot" git push origin main ``` 3. **Configure Secrets in HF Spaces:** - Go to Space Settings โ†’ Variables and secrets - Add: `GROQ_API_KEY`, `QDRANT_URL`, etc. 4. **App File:** `app.py` (automatically created) ### HF Spaces Configuration (`spaces.yaml`) ```yaml title: Voice RAG Bot description: Voice-enabled RAG chatbot app_file: app.py sdk: streamlit sdk_version: "1.28.0" python_version: "3.11" cpu: true gpu: true startup_duration_timeout: 600 ``` ### HF Spaces Requirements **Note:** HuggingFace Spaces runs Streamlit frontend only (no backend microservices). **Options:** 1. **Use External Backend:** - Deploy backend separately (Railway, Render, Heroku) - Update `BACKEND_URL` in Streamlit config - Spaces frontend connects to external backend 2. **Self-contained (Frontend Only):** - Remove backend API calls - Use Streamlit session state for data - Limited functionality (no vector DB, LLM caching) 3. **Docker-based Space (Advanced):** - Deploy full stack in Docker container - Requires HF Spaces Docker runtime - Use `Dockerfile` + `docker-compose.yml` **Recommended:** Use external FastAPI backend on Render/Railway + Streamlit on HF Spaces --- ## ๐Ÿ”ง Environment Variables for Deployment ### Local Development ``` GROQ_API_KEY=gsk_xxxxxxxxxxxx QDRANT_URL=http://localhost:6333 DEBUG=True LOG_LEVEL=INFO ``` ### Docker Compose ``` GROQ_API_KEY=gsk_xxxxxxxxxxxx QDRANT_URL=http://qdrant:6333 BACKEND_URL=http://backend:8000 DEBUG=False LOG_LEVEL=INFO ``` ### HuggingFace Spaces ``` GROQ_API_KEY=gsk_xxxxxxxxxxxx BACKEND_URL=https://your-backend-api.herokuapp.com FRONTEND_MODE=SPACES ``` ### GitHub Actions (Auto-set) - `REGISTRY`: ghcr.io - `IMAGE_NAME`: ${{ github.repository }} - Secrets: See above --- ## ๏ฟฝ๐Ÿ“– Usage Guide ### Via Streamlit Frontend (Recommended) 1. **Open Browser**: http://localhost:8501 2. **Enter Customer ID**: Unique identifier for customer (enables history tracking) 3. **Choose Input Method**: - **Option A**: Click ๐ŸŽค **Record** โ†’ Speak your message โ†’ **Process Audio** - **Option B**: Upload audio file (MP3/WAV) - **Option C**: Type message directly in text area 4. **View Results** (automatically displayed): - ๐Ÿ“ Generated Response - ๐ŸŽฏ Detected Intent (+ confidence) - ๐Ÿ˜Š Sentiment Analysis (+ confidence) - ๐Ÿท๏ธ Extracted Entities - ๐Ÿ“š Knowledge Base context (if relevant) - ๐Ÿ“œ Customer History (if relevant) - ๐Ÿ”Š Audio playback of response ### Via REST API (For Integration) **Process Audio:** ```bash curl -X POST "http://localhost:8000/process-audio?customer_id=CUST_001" \ -F "file=@voice_message.wav" ``` **Process Text:** ```bash curl -X POST "http://localhost:8000/process-text" \ -d "user_input=I want to return my laptop&customer_id=CUST_001" ``` **Health Check:** ```bash curl http://localhost:8000/health ``` --- ## ๐Ÿ“Š System Architecture ``` Input Layer โ”œโ”€ ๐ŸŽค Audio Input (Streamlit st.audio_input) โ””โ”€ ๐Ÿ“ Text Input (Streamlit text area) โ†“ Speech-to-Text โ””โ”€ Faster Whisper (base model, CPU inference) โ†“ Orchestration Layer (LangGraph - 9 Nodes) 1. sentiment_analysis (DistilBERT) 2. entity_extraction (BERT-base-NER) 3. intent_detection (Groq LLM) 4. retrieval_router (Qdrant search) 5. context_builder (Format prompt) 6. response_generation (Groq LLM) 7. validation (Hallucination checks) 8. memory_persistence (Qdrant upsert) 9. tts_generation (gTTS) โ†“ Output Layer โ”œโ”€ ๐Ÿ“ Text Response โ”œโ”€ ๐Ÿ˜Š Sentiment-aware Tone โ”œโ”€ ๐Ÿ”Š Audio File (MP3) โ””โ”€ ๐ŸŽฏ Intent Classification ``` --- ## ๐Ÿ”ง Configuration **Environment Variables** (`.env`): ``` GROQ_API_KEY=your_groq_api_key_here QDRANT_URL=http://localhost:6333 BACKEND_URL=http://localhost:8000 VECTOR_DIMENSION=1024 EMBEDDING_MODEL=BAAI/bge-m3 GROQ_MODEL=openai/gpt-oss-20b KB_COLLECTION_NAME=knowledge_base HISTORY_COLLECTION_NAME=customer_history WHISPER_MODEL=base ``` --- ## ๐Ÿ“ Sample Data Load sample data (4 KB documents + 4 customer history records): ```bash .\venv\Scripts\Activate.ps1 python data/load_sample_data.py ``` **Included Data:** - KB Documents: Return Policy, Shipping Info, Warranty Info, Account Management - Customer History: 4 interactions (complaints, refunds, inquiries) --- ## ๐Ÿงช Testing ### Quick Verification ```bash # Test complete pipeline (end-to-end) .\venv\Scripts\Activate.ps1 python tests/test_full_integration.py ``` **Expected Output**: โœ… FULL INTEGRATION TEST PASSED ### Component Status - โœ… All 9 nodes connected and working - โœ… FastAPI endpoints operational - โœ… Qdrant vector search functional - โœ… LLM integration responding - โœ… Audio processing working - โœ… Sample data loadable --- ## ๐ŸŽฏ Intent Types Supported | Intent | Example | Response | |--------|---------|----------| | `refund_request` | "I want to return this" | Empathetic, processing info | | `order_status` | "Where's my order?" | Tracking info | | `product_inquiry` | "Tell me about...?" | Product details | | `billing_issue` | "My charge was wrong" | Empathetic, billing process | | `warranty_claim` | "Product broke" | Warranty eligibility info | | `account_management` | "Change my password" | Account instructions | | `general_support` | "How do I...?" | General assistance | | `complaint` | "This is unacceptable" | Empathetic, resolution steps | | `other` | Misc questions | General help | --- ## ๐Ÿ“Š Response Quality Factors 1. **Sentiment Detection**: POSITIVE/NEGATIVE/NEUTRAL classification 2. **Confidence Scores**: 0-1 for both intent and sentiment 3. **Context Retrieval**: Up to 3 KB documents + customer history 4. **Tone Matching**: Empathetic for negative, professional for neutral, friendly for positive 5. **Hallucination Prevention**: Validation layer checks for accuracy --- ## ๐Ÿ› Troubleshooting ### Issue: "Backend Not Connected" **Solution**: Ensure FastAPI backend is running ```bash python backend/main.py ``` ### Issue: "Qdrant Connection Error" **Solution**: Start Qdrant Docker container ```bash docker run -p 6333:6333 qdrant/qdrant:latest ``` ### Issue: "Groq API Error" **Solution**: Check GROQ_API_KEY in `.env` file ```bash # Verify key is set echo $env:GROQ_API_KEY ``` ### Issue: "Audio Processing Timeout" **Solution**: Processing may take 30-60 seconds for audio - First run downloads models (Whisper, BGE-M3, DistilBERT) - Subsequent runs are faster - Ensure sufficient disk space (~5GB) ### Issue: "Module Not Found" **Solution**: Reinstall dependencies ```bash .\venv\Scripts\Activate.ps1 pip install -r requirements.txt ``` --- ## ๐Ÿ“ Project Structure ``` d:\Voice RAG Bot\voice-rag-bot\ โ”œโ”€โ”€ backend/ โ”‚ โ”œโ”€โ”€ main.py FastAPI server โ”‚ โ””โ”€โ”€ config.py Configuration โ”œโ”€โ”€ frontend/ โ”‚ โ””โ”€โ”€ streamlit_app.py Web UI โ”œโ”€โ”€ orchestration/ โ”‚ โ”œโ”€โ”€ langgraph_workflow.py 9-node workflow โ”‚ โ”œโ”€โ”€ state.py State management โ”‚ โ””โ”€โ”€ nodes/ Individual nodes โ”‚ โ”œโ”€โ”€ sentiment_analysis.py โ”‚ โ”œโ”€โ”€ entity_extraction.py โ”‚ โ”œโ”€โ”€ intent_detection.py โ”‚ โ”œโ”€โ”€ retrieval_router.py โ”‚ โ”œโ”€โ”€ context_builder.py โ”‚ โ”œโ”€โ”€ response_generation.py โ”‚ โ”œโ”€โ”€ validation.py โ”‚ โ”œโ”€โ”€ memory_persistence.py โ”‚ โ””โ”€โ”€ tts_generation.py โ”œโ”€โ”€ rag/ โ”‚ โ”œโ”€โ”€ qdrant_manager.py Vector DB client โ”‚ โ””โ”€โ”€ embedding_manager.py BGE-M3 embeddings โ”œโ”€โ”€ data/ โ”‚ โ”œโ”€โ”€ load_sample_data.py Sample data loader โ”‚ โ””โ”€โ”€ audio_output/ Generated audio files โ”œโ”€โ”€ tests/ โ”‚ โ””โ”€โ”€ test_full_integration.py End-to-end test โ”œโ”€โ”€ .env Configuration โ”œโ”€โ”€ requirements.txt Dependencies โ”œโ”€โ”€ START_SYSTEM.ps1 Quick start script โ””โ”€โ”€ venv/ Python environment ``` --- ## ๐Ÿ”„ Workflow Execution (Behind the Scenes) 1. **sentiment_analysis**: Input โ†’ DistilBERT โ†’ POSITIVE/NEGATIVE/NEUTRAL 2. **entity_extraction**: Input โ†’ BERT-NER โ†’ Extract names, locations, etc. 3. **intent_detection**: Input โ†’ Groq LLM โ†’ 9-intent classification 4. **retrieval_router**: Intent โ†’ Qdrant search โ†’ 3 KB docs + customer history 5. **context_builder**: Format contexts โ†’ Unified prompt 6. **response_generation**: Prompt โ†’ Groq LLM โ†’ Response text 7. **validation**: Check hallucinations โ†’ Retry if needed 8. **memory_persistence**: Embed response โ†’ Upsert to Qdrant 9. **tts_generation**: Response text โ†’ gTTS โ†’ MP3 audio file --- ## ๐Ÿ“Š Performance Metrics (Approximate) | Component | Time | Notes | |-----------|------|-------| | STT (Audio โ†’ Text) | 5-15s | Depends on audio length | | Sentiment Analysis | 0.5s | DistilBERT inference | | Entity Extraction | 0.5s | BERT-NER inference | | Intent Detection | 1-2s | Groq API call | | KB Search | 0.2s | Qdrant vector search | | Response Generation | 2-5s | Groq streaming | | Validation | 0.5s | Local checks | | TTS Generation | 2-5s | gTTS processing | | **Total End-to-End** | **12-35s** | First run slower (model loading) | --- ## ๐Ÿ’ก Tips & Tricks ### Faster Processing - Use text input instead of audio (skips STT) - System caches models after first run - Keep audio messages under 30 seconds ### Better Responses - Use clear, grammatically correct input - Provide context ("purchased last week" vs "bought before") - Specify what you need (return, refund, replacement) ### Debugging - Check `backend/main.py` logs for errors - View Qdrant collections: http://localhost:6333/api/swagger/index.html - Monitor Streamlit server in terminal for issues --- ## ๐Ÿš€ Next Steps 1. **Load Sample Data**: `python data/load_sample_data.py` 2. **Test with Demo Scenarios**: Use Streamlit to test various intents 3. **Customize KB Documents**: Add your own documents to Qdrant 4. **Fine-tune Prompts**: Edit prompts in `prompts/` directory 5. **Production Deployment**: Add authentication, rate limiting, monitoring --- ## ๐Ÿ“ž Support & References **Documentation Files:** - `data/DATA_REQUIREMENTS.md` - Data schema documentation - `.env` - Environment configuration **API Endpoints:** - `POST /process-audio` - Audio input endpoint - `POST /process-text` - Text input endpoint - `GET /health` - Health check **Backend Logs:** - Location: Console output when running `python backend/main.py` - Check for errors, model loading, API calls --- ## ๐Ÿ“ License & Attribution **Components**: - **Groq LLM**: Free tier, gpt-oss-20b model - **Faster Whisper**: OpenAI (MIT License) - **LangGraph**: LangChain (Open Source) - **Qdrant**: Open source vector database - **BGE-M3**: BAAI embeddings model - **DistilBERT**: Hugging Face transformers - **gTTS**: Google Text-to-Speech --- ## โœ… Verification Checklist Before considering system "ready for production": - [ ] Backend running on http://localhost:8000 - [ ] Qdrant running on http://localhost:6333 - [ ] Streamlit frontend accessible at http://localhost:8501 - [ ] Sample data loaded (`python data/load_sample_data.py`) - [ ] Integration test passing (`python tests/test_full_integration.py`) - [ ] Audio input working (record or upload) - [ ] All 9 nodes executing (check logs) - [ ] Response generation working - [ ] Audio playback working - [ ] History tracking working (multiple messages same customer) --- **Built with โค๏ธ | Last Updated: May 30, 2026**