Voice-Bot-RAG / README.md
Abeshith's picture
fix: add HuggingFace Space configuration to README
b86bb6f
metadata
title: Voice RAG Bot
emoji: πŸŽ™οΈ
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

Voice RAG Bot

A voice-enabled RAG (Retrieval Augmented Generation) bot.

πŸ“‹ Quick Overview

Voice RAG Bot is an intelligent AI customer support system that:

  • 🎀 Accepts voice input via microphone or audio file upload
  • 🧠 Processes with LLM (Groq) for intent detection and response generation
  • πŸ“š Retrieves relevant context from knowledge base and customer history using vector search
  • 😊 Analyzes sentiment to provide empathetic, sentiment-aware responses
  • πŸ”Š Generates speech output via text-to-speech
  • πŸ“Š Orchestrates 9-node workflow using LangGraph

Tech Stack: Faster Whisper (STT) β†’ LangGraph (9 nodes) β†’ Groq LLM β†’ Qdrant (Vector DB) β†’ gTTS (TTS)


πŸš€ Quick Start (3 Steps)

Step 1: Prerequisites

  • Docker Desktop running (for Qdrant)
  • Python 3.11+
  • Git (optional)

Step 2: Start Qdrant (Vector Database)

docker run -p 6333:6333 qdrant/qdrant:latest

Leave this running in background. βœ… System will auto-create collections.

Step 3: Start Voice RAG Bot

cd d:\Voice RAG Bot\voice-rag-bot

# Activate virtual environment
.\venv\Scripts\Activate.ps1

# Run startup script (starts backend + Streamlit)
.\START_SYSTEM.ps1

Or start services manually:

Terminal 1 (Backend):

.\venv\Scripts\Activate.ps1
python backend/main.py
# Runs on http://localhost:8000

Terminal 2 (Frontend):

.\venv\Scripts\Activate.ps1
streamlit run frontend/streamlit_app.py
# Opens http://localhost:8501

οΏ½ Docker Deployment

Option A: Docker Compose (Recommended for Development)

Start all services (Backend + Frontend + Qdrant + Redis):

docker-compose up -d

Access Points:

Stop Services:

docker-compose down

Option B: Individual Docker Images

Build Image:

docker build -t voice-rag-bot:latest .

Run Backend:

docker run -p 8000:8000 \
  -e APP_TYPE=backend \
  -e GROQ_API_KEY=your_key \
  -e QDRANT_URL=http://localhost:6333 \
  voice-rag-bot:latest

Run Frontend:

docker run -p 8501:8501 \
  -e APP_TYPE=frontend \
  -e GROQ_API_KEY=your_key \
  -e QDRANT_URL=http://localhost:6333 \
  voice-rag-bot:latest

πŸš€ GitHub Actions CI/CD

Setup GitHub Secrets

Add these secrets to your GitHub repository (Settings β†’ Secrets and Variables β†’ Actions):

Secret Name Value Description
GROQ_API_KEY gsk_xxxxxxxxxxxx Groq API key for LLM
HF_USERNAME your_username HuggingFace username
HF_TOKEN hf_xxxxxxxxxxxx HuggingFace access token
HF_SPACE_REPO username/voice-rag-bot HF Spaces repo path

How to Add Secrets:

  1. Go to GitHub repository β†’ Settings
  2. Click "Secrets and variables" β†’ "Actions"
  3. Click "New repository secret"
  4. Add each secret with name and value

Automatic Deployment

The workflow (.github/workflows/docker-build.yml) automatically:

  1. On main branch push:

    • Builds Docker image
    • Pushes to GitHub Container Registry (GHCR)
    • Deploys to HuggingFace Spaces
    • Generates tags: main, latest, sha-xxxxx
  2. On Pull Request:

    • Builds Docker image (no push)
    • Validates Dockerfile syntax
    • Tests image build

Workflow File:

  • Location: .github/workflows/docker-build.yml
  • Triggers: Push to main/develop, Pull requests
  • Status: View in GitHub β†’ Actions tab

Access Docker Images:

docker pull ghcr.io/your-username/voice-rag-bot:latest
docker pull ghcr.io/your-username/voice-rag-bot:main

πŸ€— HuggingFace Spaces Deployment

Option A: Automatic Deployment (Via GitHub Actions)

  1. Create HuggingFace Space: https://huggingface.co/spaces

    • Name: voice-rag-bot
    • License: OpenRAIL
    • Private/Public: Your choice
  2. Get HF credentials:

  3. Add GitHub Secrets (see above):

    • HF_USERNAME
    • HF_TOKEN
    • HF_SPACE_REPO = username/voice-rag-bot
  4. Push to main branch β†’ Automatic deployment!

Option B: Manual Deployment to HF Spaces

  1. Create HF Space (if not exists):

    huggingface-cli repo create voice-rag-bot --type space --space-sdk streamlit
    
  2. Clone & Push:

    git clone https://huggingface.co/spaces/your-username/voice-rag-bot
    cd voice-rag-bot
    
    # Add your project files
    cp -r /path/to/voice-rag-bot/* .
    
    # Push to HF Spaces
    git add .
    git commit -m "Deploy Voice RAG Bot"
    git push origin main
    
  3. Configure Secrets in HF Spaces:

    • Go to Space Settings β†’ Variables and secrets
    • Add: GROQ_API_KEY, QDRANT_URL, etc.
  4. App File: app.py (automatically created)

HF Spaces Configuration (spaces.yaml)

title: Voice RAG Bot
description: Voice-enabled RAG chatbot
app_file: app.py
sdk: streamlit
sdk_version: "1.28.0"
python_version: "3.11"
cpu: true
gpu: true
startup_duration_timeout: 600

HF Spaces Requirements

Note: HuggingFace Spaces runs Streamlit frontend only (no backend microservices).

Options:

  1. Use External Backend:

    • Deploy backend separately (Railway, Render, Heroku)
    • Update BACKEND_URL in Streamlit config
    • Spaces frontend connects to external backend
  2. Self-contained (Frontend Only):

    • Remove backend API calls
    • Use Streamlit session state for data
    • Limited functionality (no vector DB, LLM caching)
  3. Docker-based Space (Advanced):

    • Deploy full stack in Docker container
    • Requires HF Spaces Docker runtime
    • Use Dockerfile + docker-compose.yml

Recommended: Use external FastAPI backend on Render/Railway + Streamlit on HF Spaces


πŸ”§ Environment Variables for Deployment

Local Development

GROQ_API_KEY=gsk_xxxxxxxxxxxx
QDRANT_URL=http://localhost:6333
DEBUG=True
LOG_LEVEL=INFO

Docker Compose

GROQ_API_KEY=gsk_xxxxxxxxxxxx
QDRANT_URL=http://qdrant:6333
BACKEND_URL=http://backend:8000
DEBUG=False
LOG_LEVEL=INFO

HuggingFace Spaces

GROQ_API_KEY=gsk_xxxxxxxxxxxx
BACKEND_URL=https://your-backend-api.herokuapp.com
FRONTEND_MODE=SPACES

GitHub Actions (Auto-set)

  • REGISTRY: ghcr.io
  • IMAGE_NAME: ${{ github.repository }}
  • Secrets: See above

οΏ½πŸ“– Usage Guide

Via Streamlit Frontend (Recommended)

  1. Open Browser: http://localhost:8501
  2. Enter Customer ID: Unique identifier for customer (enables history tracking)
  3. Choose Input Method:
    • Option A: Click 🎀 Record β†’ Speak your message β†’ Process Audio
    • Option B: Upload audio file (MP3/WAV)
    • Option C: Type message directly in text area
  4. View Results (automatically displayed):
    • πŸ“ Generated Response
    • 🎯 Detected Intent (+ confidence)
    • 😊 Sentiment Analysis (+ confidence)
    • 🏷️ Extracted Entities
    • πŸ“š Knowledge Base context (if relevant)
    • πŸ“œ Customer History (if relevant)
    • πŸ”Š Audio playback of response

Via REST API (For Integration)

Process Audio:

curl -X POST "http://localhost:8000/process-audio?customer_id=CUST_001" \
  -F "file=@voice_message.wav"

Process Text:

curl -X POST "http://localhost:8000/process-text" \
  -d "user_input=I want to return my laptop&customer_id=CUST_001"

Health Check:

curl http://localhost:8000/health

πŸ“Š System Architecture

Input Layer
  β”œβ”€ 🎀 Audio Input (Streamlit st.audio_input)
  └─ πŸ“ Text Input (Streamlit text area)
         ↓
Speech-to-Text
  └─ Faster Whisper (base model, CPU inference)
         ↓
Orchestration Layer (LangGraph - 9 Nodes)
  1. sentiment_analysis (DistilBERT)
  2. entity_extraction (BERT-base-NER)
  3. intent_detection (Groq LLM)
  4. retrieval_router (Qdrant search)
  5. context_builder (Format prompt)
  6. response_generation (Groq LLM)
  7. validation (Hallucination checks)
  8. memory_persistence (Qdrant upsert)
  9. tts_generation (gTTS)
         ↓
Output Layer
  β”œβ”€ πŸ“ Text Response
  β”œβ”€ 😊 Sentiment-aware Tone
  β”œβ”€ πŸ”Š Audio File (MP3)
  └─ 🎯 Intent Classification

πŸ”§ Configuration

Environment Variables (.env):

GROQ_API_KEY=your_groq_api_key_here
QDRANT_URL=http://localhost:6333
BACKEND_URL=http://localhost:8000
VECTOR_DIMENSION=1024
EMBEDDING_MODEL=BAAI/bge-m3
GROQ_MODEL=openai/gpt-oss-20b
KB_COLLECTION_NAME=knowledge_base
HISTORY_COLLECTION_NAME=customer_history
WHISPER_MODEL=base

πŸ“ Sample Data

Load sample data (4 KB documents + 4 customer history records):

.\venv\Scripts\Activate.ps1
python data/load_sample_data.py

Included Data:

  • KB Documents: Return Policy, Shipping Info, Warranty Info, Account Management
  • Customer History: 4 interactions (complaints, refunds, inquiries)

πŸ§ͺ Testing

Quick Verification

# Test complete pipeline (end-to-end)
.\venv\Scripts\Activate.ps1
python tests/test_full_integration.py

Expected Output: βœ… FULL INTEGRATION TEST PASSED

Component Status

  • βœ… All 9 nodes connected and working
  • βœ… FastAPI endpoints operational
  • βœ… Qdrant vector search functional
  • βœ… LLM integration responding
  • βœ… Audio processing working
  • βœ… Sample data loadable

🎯 Intent Types Supported

Intent Example Response
refund_request "I want to return this" Empathetic, processing info
order_status "Where's my order?" Tracking info
product_inquiry "Tell me about...?" Product details
billing_issue "My charge was wrong" Empathetic, billing process
warranty_claim "Product broke" Warranty eligibility info
account_management "Change my password" Account instructions
general_support "How do I...?" General assistance
complaint "This is unacceptable" Empathetic, resolution steps
other Misc questions General help

πŸ“Š Response Quality Factors

  1. Sentiment Detection: POSITIVE/NEGATIVE/NEUTRAL classification
  2. Confidence Scores: 0-1 for both intent and sentiment
  3. Context Retrieval: Up to 3 KB documents + customer history
  4. Tone Matching: Empathetic for negative, professional for neutral, friendly for positive
  5. Hallucination Prevention: Validation layer checks for accuracy

πŸ› Troubleshooting

Issue: "Backend Not Connected"

Solution: Ensure FastAPI backend is running

python backend/main.py

Issue: "Qdrant Connection Error"

Solution: Start Qdrant Docker container

docker run -p 6333:6333 qdrant/qdrant:latest

Issue: "Groq API Error"

Solution: Check GROQ_API_KEY in .env file

# Verify key is set
echo $env:GROQ_API_KEY

Issue: "Audio Processing Timeout"

Solution: Processing may take 30-60 seconds for audio

  • First run downloads models (Whisper, BGE-M3, DistilBERT)
  • Subsequent runs are faster
  • Ensure sufficient disk space (~5GB)

Issue: "Module Not Found"

Solution: Reinstall dependencies

.\venv\Scripts\Activate.ps1
pip install -r requirements.txt

πŸ“ Project Structure

d:\Voice RAG Bot\voice-rag-bot\
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py                 FastAPI server
β”‚   └── config.py               Configuration
β”œβ”€β”€ frontend/
β”‚   └── streamlit_app.py        Web UI
β”œβ”€β”€ orchestration/
β”‚   β”œβ”€β”€ langgraph_workflow.py   9-node workflow
β”‚   β”œβ”€β”€ state.py                State management
β”‚   └── nodes/                  Individual nodes
β”‚       β”œβ”€β”€ sentiment_analysis.py
β”‚       β”œβ”€β”€ entity_extraction.py
β”‚       β”œβ”€β”€ intent_detection.py
β”‚       β”œβ”€β”€ retrieval_router.py
β”‚       β”œβ”€β”€ context_builder.py
β”‚       β”œβ”€β”€ response_generation.py
β”‚       β”œβ”€β”€ validation.py
β”‚       β”œβ”€β”€ memory_persistence.py
β”‚       └── tts_generation.py
β”œβ”€β”€ rag/
β”‚   β”œβ”€β”€ qdrant_manager.py       Vector DB client
β”‚   └── embedding_manager.py    BGE-M3 embeddings
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ load_sample_data.py     Sample data loader
β”‚   └── audio_output/           Generated audio files
β”œβ”€β”€ tests/
β”‚   └── test_full_integration.py End-to-end test
β”œβ”€β”€ .env                        Configuration
β”œβ”€β”€ requirements.txt            Dependencies
β”œβ”€β”€ START_SYSTEM.ps1           Quick start script
└── venv/                       Python environment

πŸ”„ Workflow Execution (Behind the Scenes)

  1. sentiment_analysis: Input β†’ DistilBERT β†’ POSITIVE/NEGATIVE/NEUTRAL
  2. entity_extraction: Input β†’ BERT-NER β†’ Extract names, locations, etc.
  3. intent_detection: Input β†’ Groq LLM β†’ 9-intent classification
  4. retrieval_router: Intent β†’ Qdrant search β†’ 3 KB docs + customer history
  5. context_builder: Format contexts β†’ Unified prompt
  6. response_generation: Prompt β†’ Groq LLM β†’ Response text
  7. validation: Check hallucinations β†’ Retry if needed
  8. memory_persistence: Embed response β†’ Upsert to Qdrant
  9. tts_generation: Response text β†’ gTTS β†’ MP3 audio file

πŸ“Š Performance Metrics (Approximate)

Component Time Notes
STT (Audio β†’ Text) 5-15s Depends on audio length
Sentiment Analysis 0.5s DistilBERT inference
Entity Extraction 0.5s BERT-NER inference
Intent Detection 1-2s Groq API call
KB Search 0.2s Qdrant vector search
Response Generation 2-5s Groq streaming
Validation 0.5s Local checks
TTS Generation 2-5s gTTS processing
Total End-to-End 12-35s First run slower (model loading)

πŸ’‘ Tips & Tricks

Faster Processing

  • Use text input instead of audio (skips STT)
  • System caches models after first run
  • Keep audio messages under 30 seconds

Better Responses

  • Use clear, grammatically correct input
  • Provide context ("purchased last week" vs "bought before")
  • Specify what you need (return, refund, replacement)

Debugging


πŸš€ Next Steps

  1. Load Sample Data: python data/load_sample_data.py
  2. Test with Demo Scenarios: Use Streamlit to test various intents
  3. Customize KB Documents: Add your own documents to Qdrant
  4. Fine-tune Prompts: Edit prompts in prompts/ directory
  5. Production Deployment: Add authentication, rate limiting, monitoring

πŸ“ž Support & References

Documentation Files:

  • data/DATA_REQUIREMENTS.md - Data schema documentation
  • .env - Environment configuration

API Endpoints:

  • POST /process-audio - Audio input endpoint
  • POST /process-text - Text input endpoint
  • GET /health - Health check

Backend Logs:

  • Location: Console output when running python backend/main.py
  • Check for errors, model loading, API calls

πŸ“ License & Attribution

Components:

  • Groq LLM: Free tier, gpt-oss-20b model
  • Faster Whisper: OpenAI (MIT License)
  • LangGraph: LangChain (Open Source)
  • Qdrant: Open source vector database
  • BGE-M3: BAAI embeddings model
  • DistilBERT: Hugging Face transformers
  • gTTS: Google Text-to-Speech

βœ… Verification Checklist

Before considering system "ready for production":

  • Backend running on http://localhost:8000
  • Qdrant running on http://localhost:6333
  • Streamlit frontend accessible at http://localhost:8501
  • Sample data loaded (python data/load_sample_data.py)
  • Integration test passing (python tests/test_full_integration.py)
  • Audio input working (record or upload)
  • All 9 nodes executing (check logs)
  • Response generation working
  • Audio playback working
  • History tracking working (multiple messages same customer)

Built with ❀️ | Last Updated: May 30, 2026