Spaces:
Sleeping
title: QUESTRAG Backend
emoji: π¦
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
π¦ QUESTRAG - Banking QUEries and Support system via Trained Reinforced RAG
An intelligent banking chatbot powered by Retrieval-Augmented Generation (RAG) and Reinforcement Learning (RL) to provide accurate, context-aware responses to Indian banking queries while optimizing token costs.
π Table of Contents
- Overview
- Key Features
- System Architecture
- Technology Stack
- Installation
- Configuration
- Usage
- Project Structure
- Datasets
- Performance Metrics
- API Documentation
- Deployment
- Contributing
- License
- Acknowledgments
- Contact
- Status
- Links
π― Overview
QUESTRAG is an advanced banking chatbot designed to revolutionize customer support in the Indian banking sector. By combining Retrieval-Augmented Generation (RAG) with Reinforcement Learning (RL), the system intelligently decides when to fetch external context from a knowledge base and when to respond directly, reducing token costs by up to 31% while maintaining high accuracy.
Problem Statement
Existing banking chatbots suffer from:
- β Limited response flexibility (rigid, rule-based systems)
- β Poor handling of informal/real-world queries
- β Lack of contextual understanding
- β High operational costs due to inefficient token usage
- β Low user satisfaction and trust
Solution
QUESTRAG addresses these challenges through:
- β Domain-specific RAG trained on 19,000+ banking queries / support data
- β RL-optimized policy network (BERT-based) for smart context-fetching decisions
- β Fine-tuned retriever model (E5-base-v2) using InfoNCE + Triplet Loss
- β Groq LLM with HuggingFace fallback for reliable, fast responses
- β Full-stack web application with modern UI/UX and JWT authentication
π Key Features
π€ Intelligent RAG Pipeline
- FAISS-powered retrieval for fast similarity search across 19,352 documents
- Fine-tuned embedding model (
e5-base-v2) trained on English + Hinglish paraphrases - Context-aware response generation using Llama 3 models (8B & 70B) via Groq
π§ Reinforcement Learning System
- BERT-based policy network (
bert-base-uncased) for FETCH/NO_FETCH decisions - Reward-driven optimization (+2.0 accurate, +0.5 needed fetch, -0.5 incorrect)
- 31% token cost reduction via optimized retrieval
π¨ Modern Web Interface
- React 18 + Vite with Tailwind CSS
- Real-time chat, conversation history, JWT authentication
- Responsive design for desktop and mobile
π Enterprise-Ready Backend
- FastAPI + MongoDB Atlas for scalable async operations
- JWT authentication with secure password hashing (bcrypt)
- Multi-provider LLM (Groq β HuggingFace automatic fallback)
- Deployed on HuggingFace Spaces with Docker containerization
ποΈ System Architecture
π Workflow
- User Query β FastAPI receives query via REST API
- Policy Decision β BERT-based RL model decides FETCH or NO_FETCH
- Conditional Retrieval β If FETCH β Retrieve top-5 docs from FAISS using E5-base-v2
- Response Generation β Llama 3 (via Groq) generates final answer
- Evaluation & Logging β Logged in MongoDB + reward-based model update
π Sequence Diagram
π οΈ Technology Stack
Frontend
- βοΈ React 18.3.1 + Vite 5.4.2
- π¨ Tailwind CSS 3.4.1
- π React Context API + Axios + React Router DOM
Backend
- π FastAPI 0.104.1
- ποΈ MongoDB Atlas + Motor (async driver)
- π JWT Auth + Passlib (bcrypt)
- π€ PyTorch 2.9.1, Transformers 4.57, FAISS 1.13.0
- π¬ Groq (Llama 3.1 8B Instant / Llama 3.3 70B Versatile)
- π― Sentence Transformers 5.1.2
Machine Learning
- π§ Policy Network: BERT-base-uncased (trained with RL)
- π Retriever: E5-base-v2 (fine-tuned with InfoNCE + Triplet Loss)
- π Vector Store: FAISS (19,352 documents)
Deployment
- π³ Docker (HuggingFace Spaces)
- π€ HuggingFace Hub (model storage)
- βοΈ MongoDB Atlas (cloud database)
- π Python 3.12 + uvicorn
βοΈ Installation
π§© Prerequisites
- Python 3.12+
- Node.js 18+
- MongoDB Atlas account (or local MongoDB 6.0+)
- Groq API key (or HuggingFace token)
π§ Backend Setup (Local Development)
# Navigate to backend
cd backend
# Create virtual environment
python -m venv venv
# Activate it
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Create environment file
cp .env.example .env
# Edit .env with your credentials (see Configuration section)
# Build FAISS index (one-time setup)
python build_faiss_index.py
# Start backend server
uvicorn app.main:app --reload --port 8000
π» Frontend Setup
# Navigate to frontend
cd frontend
# Install dependencies
npm install
# Create environment file
cp .env.example .env
# Update VITE_API_URL to point to your backend
# Start dev server
npm run dev
βοΈ Configuration
π Backend .env (Key Parameters)
| Category | Key | Example / Description |
|---|---|---|
| Environment | ENVIRONMENT |
development or production |
| MongoDB | MONGODB_URI |
mongodb+srv://user:pass@cluster.mongodb.net/ |
| Authentication | SECRET_KEY |
Generate with python -c "import secrets; print(secrets.token_urlsafe(32))" |
ALGORITHM |
HS256 |
|
ACCESS_TOKEN_EXPIRE_MINUTES |
1440 (24 hours) |
|
| Groq API | GROQ_API_KEY_1 |
Your primary Groq API key |
GROQ_API_KEY_2 |
Secondary key (optional) | |
GROQ_API_KEY_3 |
Tertiary key (optional) | |
GROQ_CHAT_MODEL |
llama-3.1-8b-instant |
|
GROQ_EVAL_MODEL |
llama-3.3-70b-versatile |
|
| HuggingFace | HF_TOKEN_1 |
HuggingFace token (fallback LLM) |
HF_MODEL_REPO |
eeshanyaj/questrag_models (for model download) |
|
| Model Paths | POLICY_MODEL_PATH |
app/models/best_policy_model.pth |
RETRIEVER_MODEL_PATH |
app/models/best_retriever_model.pth |
|
FAISS_INDEX_PATH |
app/models/faiss_index.pkl |
|
KB_PATH |
app/data/final_knowledge_base.jsonl |
|
| Device | DEVICE |
cpu or cuda |
| RAG Params | TOP_K |
5 (number of documents to retrieve) |
SIMILARITY_THRESHOLD |
0.5 (minimum similarity score) |
|
| Policy Network | CONFIDENCE_THRESHOLD |
0.7 (policy decision confidence) |
| CORS | ALLOWED_ORIGINS |
http://localhost:5173 or * |
π Frontend .env
# Local development
VITE_API_URL=http://localhost:8000
# Production (HuggingFace Spaces)
VITE_API_URL=https://eeshanyaj-questrag-backend.hf.space
π Usage
π₯οΈ Local Development
Start Backend Server
cd backend
source venv/bin/activate # or venv\Scripts\activate
uvicorn app.main:app --reload --port 8000
- Backend: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
Start Frontend Dev Server
cd frontend
npm run dev
- Frontend: http://localhost:5173
π Production (HuggingFace Spaces)
Backend API:
- Base URL: https://eeshanyaj-questrag-backend.hf.space
- API Docs: https://eeshanyaj-questrag-backend.hf.space/docs
- Health Check: https://eeshanyaj-questrag-backend.hf.space/health
Frontend (Coming Soon):
- Will be deployed on Vercel/Netlify
π Project Structure
questrag/
β
βββ backend/
β βββ app/
β β βββ api/v1/
β β β βββ auth.py # Auth endpoints (register, login)
β β β βββ chat.py # Chat endpoints
β β βββ core/
β β β βββ llm_manager.py # Groq + HF LLM orchestration
β β β βββ security.py # JWT & password hashing
β β βββ ml/
β β β βββ policy_network.py # RL Policy model (BERT)
β β β βββ retriever.py # E5-base-v2 retriever
β β βββ db/
β β β βββ mongodb.py # MongoDB connection
β β β βββ repositories/ # User & conversation repos
β β βββ services/
β β β βββ chat_service.py # Orchestration logic
β β βββ models/
β β β βββ best_policy_model.pth # Trained policy network
β β β βββ best_retriever_model.pth # Fine-tuned retriever
β β β βββ faiss_index.pkl # FAISS vector store
β β βββ data/
β β β βββ final_knowledge_base.jsonl # 19,352 Q&A pairs
β β βββ config.py # Settings & env vars
β β βββ main.py # FastAPI app entry point
β βββ Dockerfile # Docker config for HF Spaces
β βββ requirements.txt
β βββ .env.example
β
βββ frontend/
βββ src/
β βββ components/ # UI Components
β βββ context/ # Auth Context
β βββ pages/ # Login, Register, Chat
β βββ services/api.js # Axios Client
β βββ App.jsx
β βββ main.jsx
βββ package.json
βββ .env
π Datasets
1. Final Knowledge Base
- Size: 19,352 question-answer pairs
- Categories: 15 banking categories
- Intents: 22 unique intents (ATM, CARD, LOAN, ACCOUNT, etc.)
- Source: Combination of:
- Bitext Retail Banking Dataset (Hugging Face)
- RetailBanking-Conversations Dataset
- Manually curated FAQs from SBI, ICICI, HDFC, Yes Bank, Axis Bank
2. Retriever Training Dataset
- Size: 11,655 paraphrases
- Source: 1,665 unique FAQs from knowledge base
- Paraphrases per FAQ:
- 4 English paraphrases
- 2 Hinglish paraphrases
- Original FAQ
- Training: InfoNCE Loss + Triplet Loss with E5-base-v2
3. Policy Network Training Dataset
- Size: 182 queries from 6 chat sessions
- Format: (state, action, reward) tuples
- Actions: FETCH (1) or NO_FETCH (0)
- Rewards: +2.0 (correct), +0.5 (needed fetch), -0.5 (incorrect)
π Performance Metrics
Coming soon: Detailed performance metrics including accuracy, response time, token cost reduction, and user satisfaction scores.
π API Documentation
Authentication
Register
POST /api/v1/auth/register
Content-Type: application/json
{
"username": "john_doe",
"email": "john@example.com",
"password": "securepassword123"
}
Response:
{
"message": "User registered successfully",
"user_id": "507f1f77bcf86cd799439011"
}
Login
POST /api/v1/auth/login
Content-Type: application/json
{
"username": "john_doe",
"password": "securepassword123"
}
Response:
{
"access_token": "eyJhbGciOiJIUzI1NiIs...",
"token_type": "bearer"
}
Chat
Send Message
POST /api/v1/chat/
Authorization: Bearer <token>
Content-Type: application/json
{
"query": "What are the interest rates for home loans?",
"conversation_id": "optional-session-id"
}
Response:
{
"response": "Current home loan interest rates range from 8.5% to 9.5% per annum...",
"conversation_id": "abc123",
"metadata": {
"policy_action": "FETCH",
"retrieval_score": 0.89,
"documents_retrieved": 5,
"llm_provider": "groq"
}
}
Get Conversation History
GET /api/v1/chat/conversations/{conversation_id}
Authorization: Bearer <token>
Response:
{
"conversation_id": "abc123",
"messages": [
{
"role": "user",
"content": "What are the interest rates?",
"timestamp": "2025-11-28T10:30:00Z"
},
{
"role": "assistant",
"content": "Current rates are...",
"timestamp": "2025-11-28T10:30:05Z",
"metadata": {
"policy_action": "FETCH"
}
}
]
}
List All Conversations
GET /api/v1/chat/conversations
Authorization: Bearer <token>
Delete Conversation
DELETE /api/v1/chat/conversation/{conversation_id}
Authorization: Bearer <token>
π Deployment
HuggingFace Spaces (Backend)
The backend is deployed on HuggingFace Spaces using Docker:
- Models are stored on HuggingFace Hub:
eeshanyaj/questrag_models - On first startup, models are automatically downloaded from HF Hub
- Docker container runs FastAPI with uvicorn on port 7860
- Environment secrets are securely managed in HF Space settings
Deployment Steps:
# 1. Upload models to HuggingFace Hub
huggingface-cli upload eeshanyaj/questrag_models \
app/models/best_policy_model.pth \
models/best_policy_model.pth
# 2. Push backend code to HF Space
git remote add space https://huggingface.co/spaces/eeshanyaj/questrag-backend
git push space main
# 3. Add environment secrets in HF Space Settings
# (MongoDB URI, Groq keys, JWT secret, etc.)
Frontend Deployment (Vercel/Netlify)
# Build for production
npm run build
# Deploy to Vercel
vercel --prod
# Update .env.production with backend URL
VITE_API_URL=https://eeshanyaj-questrag-backend.hf.space
π€ Contributing
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Guidelines
- Follow PEP 8 for Python code
- Use ESLint + Prettier for JavaScript/React
- Write comprehensive docstrings and comments
- Add unit tests for new features
- Update documentation accordingly
π License
MIT License β see LICENSE
π Acknowledgments
Research Inspiration
- Main Paper: "Optimizing Retrieval Augmented Generation for Domain-Specific Chatbots with Reinforcement Learning" (AAAI 2024)
- Additional References:
- "Evaluating BERT-based Rewards for Question Generation with RL"
- "Self-Reasoning for Retrieval-Augmented Language Models"
Open Source Resources
Datasets
Technologies
π Contact
Eeshanya Amit Joshi
π§ Email
πΌ LinkedIn
π Status
β Backend Deployed & Live!
- π Backend API running on HuggingFace Spaces
- π API Documentation available at /docs
- π Health status: Check here
π§ Frontend Deployment - Coming Soon!
- Will be deployed on Vercel/Netlify
- Stay tuned for full application link! β€οΈ
π Links
- Live Backend API: https://eeshanyaj-questrag-backend.hf.space
- API Documentation: https://eeshanyaj-questrag-backend.hf.space/docs
- Health Check: https://eeshanyaj-questrag-backend.hf.space/health
- HuggingFace Space: https://huggingface.co/spaces/eeshanyaj/questrag-backend
- Model Repository: https://huggingface.co/eeshanyaj/questrag_models
- Research Paper: AAAI 2024 Workshop
β¨ Made with β€οΈ for the Banking Industry β¨
Powered by HuggingFace π€| Groq β‘| MongoDB π| Docker π³|