---
title: "Odisha Disaster RAG Chatbot"
emoji: "🌊"
colorFrom: green
colorTo: yellow
sdk: docker
app_file: app.py
pinned: false
license: mit
---
# 🌀 Odisha Disaster Management RAG Chatbot

## 📌 Overview
Odisha faces recurring disasters every year such as **floods, cyclones, and droughts**.  
While the state has a strong disaster management authority (OSDMA), information is often scattered across reports, research papers, and government documents.  

This project builds a **Retrieval-Augmented Generation (RAG) based chatbot** that provides citizens, researchers, and policymakers with **clear, reliable, and contextual answers** related to Odisha’s disaster management practices.

---

## ✨ Features
-  Handles **132 PDFs** and **12 text files** (OSDMA, IMD, NDMA, research papers).  
-  **Preprocessing pipeline**: PDF/text extraction, cleaning, normalization, chunking.  
- **Embeddings** with `sentence-transformers/all-MiniLM-L6-v2`.  
-  **FAISS Vector Database** for fast and efficient retrieval.  
-  **RAG pipeline**:  
  1. User query → query structuring (handles poor English, spelling issues).  
  2. Retrieve relevant chunks from FAISS.  
  3. If no relevant results → no LLM call (saves cost).  
  4. If relevant → LLM generates structured, contextual answers.  
-  **Prompt engineering** for better accuracy and reduced hallucinations.  
-  Backend: **FastAPI**.  
-  Frontend: **HTML, CSS, JS chatbot interface**.  

---

## 🏗️ Architecture

 **User Query → Query Structuring → FAISS Retriever → Relevant Chunks → LLM → Answer**

# 🛠️ Tech Stack

-  **Python** (data handling & backend)  
-  **PyPDF, TextLoader** → PDF/Text extraction  
-  **FAISS** → Vector database  
-  **HuggingFace Sentence Transformers** → Embeddings  
-  **FastAPI** → Backend API  
-  **HTML, CSS, JavaScript** → Frontend chatbot UI  
-  **LLM (OpenAI / HuggingFace)** → Answer generation  

---

## ⚙️ Installation

### 1. Clone the repository
```bash
git clone https://github.com/subhakanta156/odisha-disaster-knowledge-assistant.git
```
### 2. Create virtual environment & install dependencies
```bash
python -m venv venv
source venv/bin/activate   # Linux/Mac
venv\Scripts\activate      # Windows

pip install -r requirements.txt
```
### 3. Prepare the data
- Place all PDFs/text files inside the data/ folder.
- Run preprocessing & embedding script:
```bash 
python scripts/build_vector_store.py
```
### 4. Run the FastAPI backend
```bash
uvicorn app.main:app --reload
```
### 5. Open the frontend
- Open `frontend/index.html` in your browser.
  
## 🚀 Usage  

Ask questions like:  

- “How does Odisha’s disaster proneness compare with other Indian states?”  
- “Provide details of relief funds sanctioned for Odisha during the 1999 Super Cyclone.” 
- “Which Odisha agency is primarily responsible for issuing cyclone alerts?” 
- “Explain the key steps taken by the Odisha government if lives are lost in a disaster?”


The system retrieves relevant chunks from reports and generates reliable, structured answers.  

---

## 📊 Optimizations  

-  Added query filtering → No LLM call if retrieval fails (reduces cost).  
-  Handled poor English queries via query restructuring.  
-  Improved prompt engineering to minimize hallucinations.  

---

## 📌 Future Improvements  

-  Add multilingual support (Odia/Hindi queries).  
-  Deploy on cloud (AWS/GCP/Azure) with Docker.  
-  Use advanced embeddings (e.g., `all-mpnet-base-v2`) for higher accuracy.  
-  Add real-time updates (e.g., cyclone alerts).  

---

## 👨‍💻 Author  

**Subhakanta Rath**  

MSc AI & ML @ IIIT Lucknow  

Passionate about AI/ML, Data Engineering