questrag-backend / README.md
eeshanyaj's picture
added yaml format that was missing
006830b
metadata
title: QUESTRAG Backend
emoji: 🏦
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860

🏦 QUESTRAG - Banking QUEries and Support system via Trained Reinforced RAG

Python 3.12 FastAPI React License: MIT Deployed on HuggingFace

An intelligent banking chatbot powered by Retrieval-Augmented Generation (RAG) and Reinforcement Learning (RL) to provide accurate, context-aware responses to Indian banking queries while optimizing token costs.


πŸ“‹ Table of Contents


🎯 Overview

QUESTRAG is an advanced banking chatbot designed to revolutionize customer support in the Indian banking sector. By combining Retrieval-Augmented Generation (RAG) with Reinforcement Learning (RL), the system intelligently decides when to fetch external context from a knowledge base and when to respond directly, reducing token costs by up to 31% while maintaining high accuracy.

Problem Statement

Existing banking chatbots suffer from:

  • ❌ Limited response flexibility (rigid, rule-based systems)
  • ❌ Poor handling of informal/real-world queries
  • ❌ Lack of contextual understanding
  • ❌ High operational costs due to inefficient token usage
  • ❌ Low user satisfaction and trust

Solution

QUESTRAG addresses these challenges through:

  • βœ… Domain-specific RAG trained on 19,000+ banking queries / support data
  • βœ… RL-optimized policy network (BERT-based) for smart context-fetching decisions
  • βœ… Fine-tuned retriever model (E5-base-v2) using InfoNCE + Triplet Loss
  • βœ… Groq LLM with HuggingFace fallback for reliable, fast responses
  • βœ… Full-stack web application with modern UI/UX and JWT authentication

🌟 Key Features

πŸ€– Intelligent RAG Pipeline

  • FAISS-powered retrieval for fast similarity search across 19,352 documents
  • Fine-tuned embedding model (e5-base-v2) trained on English + Hinglish paraphrases
  • Context-aware response generation using Llama 3 models (8B & 70B) via Groq

🧠 Reinforcement Learning System

  • BERT-based policy network (bert-base-uncased) for FETCH/NO_FETCH decisions
  • Reward-driven optimization (+2.0 accurate, +0.5 needed fetch, -0.5 incorrect)
  • 31% token cost reduction via optimized retrieval

🎨 Modern Web Interface

  • React 18 + Vite with Tailwind CSS
  • Real-time chat, conversation history, JWT authentication
  • Responsive design for desktop and mobile

πŸ” Enterprise-Ready Backend

  • FastAPI + MongoDB Atlas for scalable async operations
  • JWT authentication with secure password hashing (bcrypt)
  • Multi-provider LLM (Groq β†’ HuggingFace automatic fallback)
  • Deployed on HuggingFace Spaces with Docker containerization

πŸ—οΈ System Architecture

System Architecture Diagram

πŸ”„ Workflow

  1. User Query β†’ FastAPI receives query via REST API
  2. Policy Decision β†’ BERT-based RL model decides FETCH or NO_FETCH
  3. Conditional Retrieval β†’ If FETCH β†’ Retrieve top-5 docs from FAISS using E5-base-v2
  4. Response Generation β†’ Llama 3 (via Groq) generates final answer
  5. Evaluation & Logging β†’ Logged in MongoDB + reward-based model update

πŸ”„ Sequence Diagram

Sequence Diagram


πŸ› οΈ Technology Stack

Frontend

  • βš›οΈ React 18.3.1 + Vite 5.4.2
  • 🎨 Tailwind CSS 3.4.1
  • πŸ”„ React Context API + Axios + React Router DOM

Backend

  • πŸš€ FastAPI 0.104.1
  • πŸ—„οΈ MongoDB Atlas + Motor (async driver)
  • πŸ”‘ JWT Auth + Passlib (bcrypt)
  • πŸ€– PyTorch 2.9.1, Transformers 4.57, FAISS 1.13.0
  • πŸ’¬ Groq (Llama 3.1 8B Instant / Llama 3.3 70B Versatile)
  • 🎯 Sentence Transformers 5.1.2

Machine Learning

  • 🧠 Policy Network: BERT-base-uncased (trained with RL)
  • πŸ” Retriever: E5-base-v2 (fine-tuned with InfoNCE + Triplet Loss)
  • πŸ“Š Vector Store: FAISS (19,352 documents)

Deployment

  • 🐳 Docker (HuggingFace Spaces)
  • πŸ€— HuggingFace Hub (model storage)
  • ☁️ MongoDB Atlas (cloud database)
  • 🌐 Python 3.12 + uvicorn

βš™οΈ Installation

🧩 Prerequisites

  • Python 3.12+
  • Node.js 18+
  • MongoDB Atlas account (or local MongoDB 6.0+)
  • Groq API key (or HuggingFace token)

πŸ”§ Backend Setup (Local Development)

# Navigate to backend
cd backend

# Create virtual environment
python -m venv venv

# Activate it
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

# Create environment file
cp .env.example .env
# Edit .env with your credentials (see Configuration section)

# Build FAISS index (one-time setup)
python build_faiss_index.py

# Start backend server
uvicorn app.main:app --reload --port 8000

πŸ’» Frontend Setup

# Navigate to frontend
cd frontend

# Install dependencies
npm install

# Create environment file
cp .env.example .env
# Update VITE_API_URL to point to your backend

# Start dev server
npm run dev

βš™οΈ Configuration

πŸ”‘ Backend .env (Key Parameters)

Category Key Example / Description
Environment ENVIRONMENT development or production
MongoDB MONGODB_URI mongodb+srv://user:pass@cluster.mongodb.net/
Authentication SECRET_KEY Generate with python -c "import secrets; print(secrets.token_urlsafe(32))"
ALGORITHM HS256
ACCESS_TOKEN_EXPIRE_MINUTES 1440 (24 hours)
Groq API GROQ_API_KEY_1 Your primary Groq API key
GROQ_API_KEY_2 Secondary key (optional)
GROQ_API_KEY_3 Tertiary key (optional)
GROQ_CHAT_MODEL llama-3.1-8b-instant
GROQ_EVAL_MODEL llama-3.3-70b-versatile
HuggingFace HF_TOKEN_1 HuggingFace token (fallback LLM)
HF_MODEL_REPO eeshanyaj/questrag_models (for model download)
Model Paths POLICY_MODEL_PATH app/models/best_policy_model.pth
RETRIEVER_MODEL_PATH app/models/best_retriever_model.pth
FAISS_INDEX_PATH app/models/faiss_index.pkl
KB_PATH app/data/final_knowledge_base.jsonl
Device DEVICE cpu or cuda
RAG Params TOP_K 5 (number of documents to retrieve)
SIMILARITY_THRESHOLD 0.5 (minimum similarity score)
Policy Network CONFIDENCE_THRESHOLD 0.7 (policy decision confidence)
CORS ALLOWED_ORIGINS http://localhost:5173 or *

🌐 Frontend .env

# Local development
VITE_API_URL=http://localhost:8000

# Production (HuggingFace Spaces)
VITE_API_URL=https://eeshanyaj-questrag-backend.hf.space

πŸš€ Usage

πŸ–₯️ Local Development

Start Backend Server

cd backend
source venv/bin/activate  # or venv\Scripts\activate
uvicorn app.main:app --reload --port 8000

Start Frontend Dev Server

cd frontend
npm run dev

🌐 Production (HuggingFace Spaces)

Backend API:

Frontend (Coming Soon):

  • Will be deployed on Vercel/Netlify

πŸ“ Project Structure

questrag/
β”‚
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/v1/
β”‚   β”‚   β”‚   β”œβ”€β”€ auth.py              # Auth endpoints (register, login)
β”‚   β”‚   β”‚   └── chat.py              # Chat endpoints
β”‚   β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”‚   β”œβ”€β”€ llm_manager.py       # Groq + HF LLM orchestration
β”‚   β”‚   β”‚   └── security.py          # JWT & password hashing
β”‚   β”‚   β”œβ”€β”€ ml/
β”‚   β”‚   β”‚   β”œβ”€β”€ policy_network.py    # RL Policy model (BERT)
β”‚   β”‚   β”‚   └── retriever.py         # E5-base-v2 retriever
β”‚   β”‚   β”œβ”€β”€ db/
β”‚   β”‚   β”‚   β”œβ”€β”€ mongodb.py           # MongoDB connection
β”‚   β”‚   β”‚   └── repositories/        # User & conversation repos
β”‚   β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”‚   └── chat_service.py      # Orchestration logic
β”‚   β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”‚   β”œβ”€β”€ best_policy_model.pth      # Trained policy network
β”‚   β”‚   β”‚   β”œβ”€β”€ best_retriever_model.pth   # Fine-tuned retriever
β”‚   β”‚   β”‚   └── faiss_index.pkl            # FAISS vector store
β”‚   β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”‚   └── final_knowledge_base.jsonl # 19,352 Q&A pairs
β”‚   β”‚   β”œβ”€β”€ config.py                # Settings & env vars
β”‚   β”‚   └── main.py                  # FastAPI app entry point
β”‚   β”œβ”€β”€ Dockerfile                   # Docker config for HF Spaces
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── .env.example
β”‚
└── frontend/
    β”œβ”€β”€ src/
    β”‚   β”œβ”€β”€ components/              # UI Components
    β”‚   β”œβ”€β”€ context/                 # Auth Context
    β”‚   β”œβ”€β”€ pages/                   # Login, Register, Chat
    β”‚   β”œβ”€β”€ services/api.js          # Axios Client
    β”‚   β”œβ”€β”€ App.jsx
    β”‚   └── main.jsx
    β”œβ”€β”€ package.json
    └── .env

πŸ“Š Datasets

1. Final Knowledge Base

  • Size: 19,352 question-answer pairs
  • Categories: 15 banking categories
  • Intents: 22 unique intents (ATM, CARD, LOAN, ACCOUNT, etc.)
  • Source: Combination of:
    • Bitext Retail Banking Dataset (Hugging Face)
    • RetailBanking-Conversations Dataset
    • Manually curated FAQs from SBI, ICICI, HDFC, Yes Bank, Axis Bank

2. Retriever Training Dataset

  • Size: 11,655 paraphrases
  • Source: 1,665 unique FAQs from knowledge base
  • Paraphrases per FAQ:
    • 4 English paraphrases
    • 2 Hinglish paraphrases
    • Original FAQ
  • Training: InfoNCE Loss + Triplet Loss with E5-base-v2

3. Policy Network Training Dataset

  • Size: 182 queries from 6 chat sessions
  • Format: (state, action, reward) tuples
  • Actions: FETCH (1) or NO_FETCH (0)
  • Rewards: +2.0 (correct), +0.5 (needed fetch), -0.5 (incorrect)

πŸ“ˆ Performance Metrics

Coming soon: Detailed performance metrics including accuracy, response time, token cost reduction, and user satisfaction scores.


πŸ“š API Documentation

Authentication

Register

POST /api/v1/auth/register
Content-Type: application/json

{
  "username": "john_doe",
  "email": "john@example.com",
  "password": "securepassword123"
}

Response:

{
  "message": "User registered successfully",
  "user_id": "507f1f77bcf86cd799439011"
}

Login

POST /api/v1/auth/login
Content-Type: application/json

{
  "username": "john_doe",
  "password": "securepassword123"
}

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIs...",
  "token_type": "bearer"
}

Chat

Send Message

POST /api/v1/chat/
Authorization: Bearer <token>
Content-Type: application/json

{
  "query": "What are the interest rates for home loans?",
  "conversation_id": "optional-session-id"
}

Response:

{
  "response": "Current home loan interest rates range from 8.5% to 9.5% per annum...",
  "conversation_id": "abc123",
  "metadata": {
    "policy_action": "FETCH",
    "retrieval_score": 0.89,
    "documents_retrieved": 5,
    "llm_provider": "groq"
  }
}

Get Conversation History

GET /api/v1/chat/conversations/{conversation_id}
Authorization: Bearer <token>

Response:

{
  "conversation_id": "abc123",
  "messages": [
    {
      "role": "user",
      "content": "What are the interest rates?",
      "timestamp": "2025-11-28T10:30:00Z"
    },
    {
      "role": "assistant",
      "content": "Current rates are...",
      "timestamp": "2025-11-28T10:30:05Z",
      "metadata": {
        "policy_action": "FETCH"
      }
    }
  ]
}

List All Conversations

GET /api/v1/chat/conversations
Authorization: Bearer <token>

Delete Conversation

DELETE /api/v1/chat/conversation/{conversation_id}
Authorization: Bearer <token>

πŸš€ Deployment

HuggingFace Spaces (Backend)

The backend is deployed on HuggingFace Spaces using Docker:

  1. Models are stored on HuggingFace Hub: eeshanyaj/questrag_models
  2. On first startup, models are automatically downloaded from HF Hub
  3. Docker container runs FastAPI with uvicorn on port 7860
  4. Environment secrets are securely managed in HF Space settings

Deployment Steps:

# 1. Upload models to HuggingFace Hub
huggingface-cli upload eeshanyaj/questrag_models \
  app/models/best_policy_model.pth \
  models/best_policy_model.pth

# 2. Push backend code to HF Space
git remote add space https://huggingface.co/spaces/eeshanyaj/questrag-backend
git push space main

# 3. Add environment secrets in HF Space Settings
# (MongoDB URI, Groq keys, JWT secret, etc.)

Frontend Deployment (Vercel/Netlify)

# Build for production
npm run build

# Deploy to Vercel
vercel --prod

# Update .env.production with backend URL
VITE_API_URL=https://eeshanyaj-questrag-backend.hf.space

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow PEP 8 for Python code
  • Use ESLint + Prettier for JavaScript/React
  • Write comprehensive docstrings and comments
  • Add unit tests for new features
  • Update documentation accordingly

πŸ“„ License

MIT License β€” see LICENSE


πŸ™ Acknowledgments

Research Inspiration

  • Main Paper: "Optimizing Retrieval Augmented Generation for Domain-Specific Chatbots with Reinforcement Learning" (AAAI 2024)
  • Additional References:
    • "Evaluating BERT-based Rewards for Question Generation with RL"
    • "Self-Reasoning for Retrieval-Augmented Language Models"

Open Source Resources

Datasets

Technologies


πŸ“ž Contact

Eeshanya Amit Joshi
πŸ“§ Email
πŸ’Ό LinkedIn


πŸ“ˆ Status

βœ… Backend Deployed & Live!

🚧 Frontend Deployment - Coming Soon!

  • Will be deployed on Vercel/Netlify
  • Stay tuned for full application link! ❀️

πŸ”— Links


✨ Made with ❀️ for the Banking Industry ✨

Powered by HuggingFace πŸ€—| Groq ⚑| MongoDB πŸƒ| Docker 🐳|