---
title: QUESTRAG Backend
emoji: 🏦
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
---
# 🏦 QUESTRAG - Banking QUEries and Support system via Trained Reinforced RAG
[](https://www.python.org/downloads/)
[](https://fastapi.tiangolo.com/)
[](https://reactjs.org/)
[](https://opensource.org/licenses/MIT)
[](https://huggingface.co/spaces/eeshanyaj/questrag-backend)
> An intelligent banking chatbot powered by **Retrieval-Augmented Generation (RAG)** and **Reinforcement Learning (RL)** to provide accurate, context-aware responses to Indian banking queries while optimizing token costs.
---
## 📋 Table of Contents
- [Overview](#overview)
- [Key Features](#key-features)
- [System Architecture](#system-architecture)
- [Technology Stack](#technology-stack)
- [Installation](#installation)
- [Configuration](#configuration)
- [Usage](#usage)
- [Project Structure](#project-structure)
- [Datasets](#datasets)
- [Performance Metrics](#performance-metrics)
- [API Documentation](#api-documentation)
- [Deployment](#deployment)
- [Contributing](#contributing)
- [License](#license)
- [Acknowledgments](#acknowledgments)
- [Contact](#contact)
- [Status](#status)
- [Links](#links)
---
## 🎯 Overview
QUESTRAG is an **advanced banking chatbot** designed to revolutionize customer support in the Indian banking sector. By combining **Retrieval-Augmented Generation (RAG)** with **Reinforcement Learning (RL)**, the system intelligently decides when to fetch external context from a knowledge base and when to respond directly, **reducing token costs by up to 31%** while maintaining high accuracy.
### Problem Statement
Existing banking chatbots suffer from:
- ❌ Limited response flexibility (rigid, rule-based systems)
- ❌ Poor handling of informal/real-world queries
- ❌ Lack of contextual understanding
- ❌ High operational costs due to inefficient token usage
- ❌ Low user satisfaction and trust
### Solution
QUESTRAG addresses these challenges through:
- ✅ **Domain-specific RAG** trained on 19,000+ banking queries / support data
- ✅ **RL-optimized policy network** (BERT-based) for smart context-fetching decisions
- ✅ **Fine-tuned retriever model** (E5-base-v2) using InfoNCE + Triplet Loss
- ✅ **Groq LLM with HuggingFace fallback** for reliable, fast responses
- ✅ **Full-stack web application** with modern UI/UX and JWT authentication
---
## 🌟 Key Features
### 🤖 Intelligent RAG Pipeline
- **FAISS-powered retrieval** for fast similarity search across 19,352 documents
- **Fine-tuned embedding model** (`e5-base-v2`) trained on English + Hinglish paraphrases
- **Context-aware response generation** using Llama 3 models (8B & 70B) via Groq
### 🧠 Reinforcement Learning System
- **BERT-based policy network** (`bert-base-uncased`) for FETCH/NO_FETCH decisions
- **Reward-driven optimization** (+2.0 accurate, +0.5 needed fetch, -0.5 incorrect)
- **31% token cost reduction** via optimized retrieval
### 🎨 Modern Web Interface
- **React 18 + Vite** with Tailwind CSS
- **Real-time chat**, conversation history, JWT authentication
- **Responsive design** for desktop and mobile
### 🔐 Enterprise-Ready Backend
- **FastAPI + MongoDB Atlas** for scalable async operations
- **JWT authentication** with secure password hashing (bcrypt)
- **Multi-provider LLM** (Groq → HuggingFace automatic fallback)
- **Deployed on HuggingFace Spaces** with Docker containerization
---
## 🏗️ System Architecture
### 🔄 Workflow
1. **User Query** → FastAPI receives query via REST API
2. **Policy Decision** → BERT-based RL model decides FETCH or NO_FETCH
3. **Conditional Retrieval** → If FETCH → Retrieve top-5 docs from FAISS using E5-base-v2
4. **Response Generation** → Llama 3 (via Groq) generates final answer
5. **Evaluation & Logging** → Logged in MongoDB + reward-based model update
---
## 🔄 Sequence Diagram
---
## 🛠️ Technology Stack
### **Frontend**
- ⚛️ React 18.3.1 + Vite 5.4.2
- 🎨 Tailwind CSS 3.4.1
- 🔄 React Context API + Axios + React Router DOM
### **Backend**
- 🚀 FastAPI 0.104.1
- 🗄️ MongoDB Atlas + Motor (async driver)
- 🔑 JWT Auth + Passlib (bcrypt)
- 🤖 PyTorch 2.9.1, Transformers 4.57, FAISS 1.13.0
- 💬 Groq (Llama 3.1 8B Instant / Llama 3.3 70B Versatile)
- 🎯 Sentence Transformers 5.1.2
### **Machine Learning**
- 🧠 **Policy Network**: BERT-base-uncased (trained with RL)
- 🔍 **Retriever**: E5-base-v2 (fine-tuned with InfoNCE + Triplet Loss)
- 📊 **Vector Store**: FAISS (19,352 documents)
### **Deployment**
- 🐳 Docker (HuggingFace Spaces)
- 🤗 HuggingFace Hub (model storage)
- ☁️ MongoDB Atlas (cloud database)
- 🌐 Python 3.12 + uvicorn
---
## ⚙️ Installation
### 🧩 Prerequisites
- Python 3.12+
- Node.js 18+
- MongoDB Atlas account (or local MongoDB 6.0+)
- Groq API key (or HuggingFace token)
### 🔧 Backend Setup (Local Development)
```bash
# Navigate to backend
cd backend
# Create virtual environment
python -m venv venv
# Activate it
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Create environment file
cp .env.example .env
# Edit .env with your credentials (see Configuration section)
# Build FAISS index (one-time setup)
python build_faiss_index.py
# Start backend server
uvicorn app.main:app --reload --port 8000
```
### 💻 Frontend Setup
```bash
# Navigate to frontend
cd frontend
# Install dependencies
npm install
# Create environment file
cp .env.example .env
# Update VITE_API_URL to point to your backend
# Start dev server
npm run dev
```
---
## ⚙️ Configuration
### 🔑 Backend `.env` (Key Parameters)
| **Category** | **Key** | **Example / Description** |
|-------------------|----------------------------------|--------------------------------------------------|
| Environment | `ENVIRONMENT` | `development` or `production` |
| MongoDB | `MONGODB_URI` | `mongodb+srv://user:pass@cluster.mongodb.net/` |
| Authentication | `SECRET_KEY` | Generate with `python -c "import secrets; print(secrets.token_urlsafe(32))"` |
| | `ALGORITHM` | `HS256` |
| | `ACCESS_TOKEN_EXPIRE_MINUTES` | `1440` (24 hours) |
| Groq API | `GROQ_API_KEY_1` | Your primary Groq API key |
| | `GROQ_API_KEY_2` | Secondary key (optional) |
| | `GROQ_API_KEY_3` | Tertiary key (optional) |
| | `GROQ_CHAT_MODEL` | `llama-3.1-8b-instant` |
| | `GROQ_EVAL_MODEL` | `llama-3.3-70b-versatile` |
| HuggingFace | `HF_TOKEN_1` | HuggingFace token (fallback LLM) |
| | `HF_MODEL_REPO` | `eeshanyaj/questrag_models` (for model download) |
| Model Paths | `POLICY_MODEL_PATH` | `app/models/best_policy_model.pth` |
| | `RETRIEVER_MODEL_PATH` | `app/models/best_retriever_model.pth` |
| | `FAISS_INDEX_PATH` | `app/models/faiss_index.pkl` |
| | `KB_PATH` | `app/data/final_knowledge_base.jsonl` |
| Device | `DEVICE` | `cpu` or `cuda` |
| RAG Params | `TOP_K` | `5` (number of documents to retrieve) |
| | `SIMILARITY_THRESHOLD` | `0.5` (minimum similarity score) |
| Policy Network | `CONFIDENCE_THRESHOLD` | `0.7` (policy decision confidence) |
| CORS | `ALLOWED_ORIGINS` | `http://localhost:5173` or `*` |
### 🌐 Frontend `.env`
```bash
# Local development
VITE_API_URL=http://localhost:8000
# Production (HuggingFace Spaces)
VITE_API_URL=https://eeshanyaj-questrag-backend.hf.space
```
---
## 🚀 Usage
### 🖥️ Local Development
#### Start Backend Server
```bash
cd backend
source venv/bin/activate # or venv\Scripts\activate
uvicorn app.main:app --reload --port 8000
```
- **Backend**: http://localhost:8000
- **API Docs**: http://localhost:8000/docs
- **Health Check**: http://localhost:8000/health
#### Start Frontend Dev Server
```bash
cd frontend
npm run dev
```
- **Frontend**: http://localhost:5173
### 🌐 Production (HuggingFace Spaces)
**Backend API**:
- **Base URL**: https://eeshanyaj-questrag-backend.hf.space
- **API Docs**: https://eeshanyaj-questrag-backend.hf.space/docs
- **Health Check**: https://eeshanyaj-questrag-backend.hf.space/health
**Frontend** (Coming Soon):
- Will be deployed on Vercel/Netlify
---
## 📁 Project Structure
```
questrag/
│
├── backend/
│ ├── app/
│ │ ├── api/v1/
│ │ │ ├── auth.py # Auth endpoints (register, login)
│ │ │ └── chat.py # Chat endpoints
│ │ ├── core/
│ │ │ ├── llm_manager.py # Groq + HF LLM orchestration
│ │ │ └── security.py # JWT & password hashing
│ │ ├── ml/
│ │ │ ├── policy_network.py # RL Policy model (BERT)
│ │ │ └── retriever.py # E5-base-v2 retriever
│ │ ├── db/
│ │ │ ├── mongodb.py # MongoDB connection
│ │ │ └── repositories/ # User & conversation repos
│ │ ├── services/
│ │ │ └── chat_service.py # Orchestration logic
│ │ ├── models/
│ │ │ ├── best_policy_model.pth # Trained policy network
│ │ │ ├── best_retriever_model.pth # Fine-tuned retriever
│ │ │ └── faiss_index.pkl # FAISS vector store
│ │ ├── data/
│ │ │ └── final_knowledge_base.jsonl # 19,352 Q&A pairs
│ │ ├── config.py # Settings & env vars
│ │ └── main.py # FastAPI app entry point
│ ├── Dockerfile # Docker config for HF Spaces
│ ├── requirements.txt
│ └── .env.example
│
└── frontend/
├── src/
│ ├── components/ # UI Components
│ ├── context/ # Auth Context
│ ├── pages/ # Login, Register, Chat
│ ├── services/api.js # Axios Client
│ ├── App.jsx
│ └── main.jsx
├── package.json
└── .env
```
---
## 📊 Datasets
### 1. Final Knowledge Base
- **Size**: 19,352 question-answer pairs
- **Categories**: 15 banking categories
- **Intents**: 22 unique intents (ATM, CARD, LOAN, ACCOUNT, etc.)
- **Source**: Combination of:
- Bitext Retail Banking Dataset (Hugging Face)
- RetailBanking-Conversations Dataset
- Manually curated FAQs from SBI, ICICI, HDFC, Yes Bank, Axis Bank
### 2. Retriever Training Dataset
- **Size**: 11,655 paraphrases
- **Source**: 1,665 unique FAQs from knowledge base
- **Paraphrases per FAQ**:
- 4 English paraphrases
- 2 Hinglish paraphrases
- Original FAQ
- **Training**: InfoNCE Loss + Triplet Loss with E5-base-v2
### 3. Policy Network Training Dataset
- **Size**: 182 queries from 6 chat sessions
- **Format**: (state, action, reward) tuples
- **Actions**: FETCH (1) or NO_FETCH (0)
- **Rewards**: +2.0 (correct), +0.5 (needed fetch), -0.5 (incorrect)
---
## 📈 Performance Metrics
*Coming soon: Detailed performance metrics including accuracy, response time, token cost reduction, and user satisfaction scores.*
---
## 📚 API Documentation
### Authentication
#### Register
```http
POST /api/v1/auth/register
Content-Type: application/json
{
"username": "john_doe",
"email": "john@example.com",
"password": "securepassword123"
}
```
**Response:**
```json
{
"message": "User registered successfully",
"user_id": "507f1f77bcf86cd799439011"
}
```
#### Login
```http
POST /api/v1/auth/login
Content-Type: application/json
{
"username": "john_doe",
"password": "securepassword123"
}
```
**Response:**
```json
{
"access_token": "eyJhbGciOiJIUzI1NiIs...",
"token_type": "bearer"
}
```
---
### Chat
#### Send Message
```http
POST /api/v1/chat/
Authorization: Bearer
Content-Type: application/json
{
"query": "What are the interest rates for home loans?",
"conversation_id": "optional-session-id"
}
```
**Response:**
```json
{
"response": "Current home loan interest rates range from 8.5% to 9.5% per annum...",
"conversation_id": "abc123",
"metadata": {
"policy_action": "FETCH",
"retrieval_score": 0.89,
"documents_retrieved": 5,
"llm_provider": "groq"
}
}
```
#### Get Conversation History
```http
GET /api/v1/chat/conversations/{conversation_id}
Authorization: Bearer
```
**Response:**
```json
{
"conversation_id": "abc123",
"messages": [
{
"role": "user",
"content": "What are the interest rates?",
"timestamp": "2025-11-28T10:30:00Z"
},
{
"role": "assistant",
"content": "Current rates are...",
"timestamp": "2025-11-28T10:30:05Z",
"metadata": {
"policy_action": "FETCH"
}
}
]
}
```
#### List All Conversations
```http
GET /api/v1/chat/conversations
Authorization: Bearer
```
#### Delete Conversation
```http
DELETE /api/v1/chat/conversation/{conversation_id}
Authorization: Bearer
```
---
## 🚀 Deployment
### HuggingFace Spaces (Backend)
The backend is deployed on HuggingFace Spaces using Docker:
1. **Models are stored** on HuggingFace Hub: `eeshanyaj/questrag_models`
2. **On first startup**, models are automatically downloaded from HF Hub
3. **Docker container** runs FastAPI with uvicorn on port 7860
4. **Environment secrets** are securely managed in HF Space settings
**Deployment Steps:**
```bash
# 1. Upload models to HuggingFace Hub
huggingface-cli upload eeshanyaj/questrag_models \
app/models/best_policy_model.pth \
models/best_policy_model.pth
# 2. Push backend code to HF Space
git remote add space https://huggingface.co/spaces/eeshanyaj/questrag-backend
git push space main
# 3. Add environment secrets in HF Space Settings
# (MongoDB URI, Groq keys, JWT secret, etc.)
```
### Frontend Deployment (Vercel/Netlify)
```bash
# Build for production
npm run build
# Deploy to Vercel
vercel --prod
# Update .env.production with backend URL
VITE_API_URL=https://eeshanyaj-questrag-backend.hf.space
```
---
## 🤝 Contributing
Contributions are welcome! Please follow these steps:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
### Development Guidelines
- Follow PEP 8 for Python code
- Use ESLint + Prettier for JavaScript/React
- Write comprehensive docstrings and comments
- Add unit tests for new features
- Update documentation accordingly
---
## 📄 License
MIT License — see [LICENSE](LICENSE)
---
## 🙏 Acknowledgments
### Research Inspiration
- **Main Paper**: "Optimizing Retrieval Augmented Generation for Domain-Specific Chatbots with Reinforcement Learning" (AAAI 2024)
- **Additional References**:
- "Evaluating BERT-based Rewards for Question Generation with RL"
- "Self-Reasoning for Retrieval-Augmented Language Models"
### Open Source Resources
- [RL-Self-Improving-RAG](https://github.com/subrata-samanta/RL-Self-Improving-RAG)
- [ARENA](https://github.com/ren258/ARENA)
- [RAGTechniques](https://github.com/NirDiamant/RAGTechniques)
- [Financial-RAG-From-Scratch](https://github.com/cse-amarjeet/Financial-RAG-From-Scratch)
### Datasets
- [Bitext Retail Banking Dataset](https://huggingface.co/datasets/bitext/Bitext-retail-banking-llm-chatbot-training-dataset)
- [RetailBanking-Conversations](https://huggingface.co/datasets/oopere/RetailBanking-Conversations)
### Technologies
- [FastAPI](https://fastapi.tiangolo.com/)
- [React](https://reactjs.org/)
- [HuggingFace](https://huggingface.co/)
- [Groq](https://groq.com/)
- [MongoDB Atlas](https://www.mongodb.com/cloud/atlas)
---
## 📞 Contact
**Eeshanya Amit Joshi**
📧 [Email](mailto:eeshanyajoshi@gmail.com)
💼 [LinkedIn](https://www.linkedin.com/in/eeshanyajoshi/)
---
## 📈 Status
### ✅ **Backend Deployed & Live!**
- 🚀 Backend API running on [HuggingFace Spaces](https://eeshanyaj-questrag-backend.hf.space)
- 📚 API Documentation available at [/docs](https://eeshanyaj-questrag-backend.hf.space/docs)
- 💚 Health status: [Check here](https://eeshanyaj-questrag-backend.hf.space/health)
### 🚧 **Frontend Deployment - Coming Soon!**
- Will be deployed on Vercel/Netlify
- Stay tuned for full application link! ❤️
---
## 🔗 Links
- **Live Backend API:** https://eeshanyaj-questrag-backend.hf.space
- **API Documentation:** https://eeshanyaj-questrag-backend.hf.space/docs
- **Health Check:** https://eeshanyaj-questrag-backend.hf.space/health
- **HuggingFace Space:** https://huggingface.co/spaces/eeshanyaj/questrag-backend
- **Model Repository:** https://huggingface.co/eeshanyaj/questrag_models
- **Research Paper:** [AAAI 2024 Workshop](https://arxiv.org/abs/2401.06800)
---
✨ Made with ❤️ for the Banking Industry ✨
Powered by HuggingFace 🤗| Groq ⚡| MongoDB 🍃| Docker 🐳|