Subhakanta156's picture
Update README.md
03c774f
---
title: "Odisha Disaster RAG Chatbot"
emoji: "🌊"
colorFrom: green
colorTo: yellow
sdk: docker
app_file: app.py
pinned: false
license: mit
---
# πŸŒ€ Odisha Disaster Management RAG Chatbot
## πŸ“Œ Overview
Odisha faces recurring disasters every year such as **floods, cyclones, and droughts**.
While the state has a strong disaster management authority (OSDMA), information is often scattered across reports, research papers, and government documents.
This project builds a **Retrieval-Augmented Generation (RAG) based chatbot** that provides citizens, researchers, and policymakers with **clear, reliable, and contextual answers** related to Odisha’s disaster management practices.
---
## ✨ Features
- Handles **132 PDFs** and **12 text files** (OSDMA, IMD, NDMA, research papers).
- **Preprocessing pipeline**: PDF/text extraction, cleaning, normalization, chunking.
- **Embeddings** with `sentence-transformers/all-MiniLM-L6-v2`.
- **FAISS Vector Database** for fast and efficient retrieval.
- **RAG pipeline**:
1. User query β†’ query structuring (handles poor English, spelling issues).
2. Retrieve relevant chunks from FAISS.
3. If no relevant results β†’ no LLM call (saves cost).
4. If relevant β†’ LLM generates structured, contextual answers.
- **Prompt engineering** for better accuracy and reduced hallucinations.
- Backend: **FastAPI**.
- Frontend: **HTML, CSS, JS chatbot interface**.
---
## πŸ—οΈ Architecture
**User Query β†’ Query Structuring β†’ FAISS Retriever β†’ Relevant Chunks β†’ LLM β†’ Answer**
# πŸ› οΈ Tech Stack
- **Python** (data handling & backend)
- **PyPDF, TextLoader** β†’ PDF/Text extraction
- **FAISS** β†’ Vector database
- **HuggingFace Sentence Transformers** β†’ Embeddings
- **FastAPI** β†’ Backend API
- **HTML, CSS, JavaScript** β†’ Frontend chatbot UI
- **LLM (OpenAI / HuggingFace)** β†’ Answer generation
---
## βš™οΈ Installation
### 1. Clone the repository
```bash
git clone https://github.com/subhakanta156/odisha-disaster-knowledge-assistant.git
```
### 2. Create virtual environment & install dependencies
```bash
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
pip install -r requirements.txt
```
### 3. Prepare the data
- Place all PDFs/text files inside the data/ folder.
- Run preprocessing & embedding script:
```bash
python scripts/build_vector_store.py
```
### 4. Run the FastAPI backend
```bash
uvicorn app.main:app --reload
```
### 5. Open the frontend
- Open `frontend/index.html` in your browser.
## πŸš€ Usage
Ask questions like:
- β€œHow does Odisha’s disaster proneness compare with other Indian states?”
- β€œProvide details of relief funds sanctioned for Odisha during the 1999 Super Cyclone.”
- β€œWhich Odisha agency is primarily responsible for issuing cyclone alerts?”
- β€œExplain the key steps taken by the Odisha government if lives are lost in a disaster?”
The system retrieves relevant chunks from reports and generates reliable, structured answers.
---
## πŸ“Š Optimizations
- Added query filtering β†’ No LLM call if retrieval fails (reduces cost).
- Handled poor English queries via query restructuring.
- Improved prompt engineering to minimize hallucinations.
---
## πŸ“Œ Future Improvements
- Add multilingual support (Odia/Hindi queries).
- Deploy on cloud (AWS/GCP/Azure) with Docker.
- Use advanced embeddings (e.g., `all-mpnet-base-v2`) for higher accuracy.
- Add real-time updates (e.g., cyclone alerts).
---
## πŸ‘¨β€πŸ’» Author
**Subhakanta Rath**
MSc AI & ML @ IIIT Lucknow
Passionate about AI/ML, Data Engineering