--- title: "Odisha Disaster RAG Chatbot" emoji: "🌊" colorFrom: green colorTo: yellow sdk: docker app_file: app.py pinned: false license: mit --- # πŸŒ€ Odisha Disaster Management RAG Chatbot ## πŸ“Œ Overview Odisha faces recurring disasters every year such as **floods, cyclones, and droughts**. While the state has a strong disaster management authority (OSDMA), information is often scattered across reports, research papers, and government documents. This project builds a **Retrieval-Augmented Generation (RAG) based chatbot** that provides citizens, researchers, and policymakers with **clear, reliable, and contextual answers** related to Odisha’s disaster management practices. --- ## ✨ Features - Handles **132 PDFs** and **12 text files** (OSDMA, IMD, NDMA, research papers). - **Preprocessing pipeline**: PDF/text extraction, cleaning, normalization, chunking. - **Embeddings** with `sentence-transformers/all-MiniLM-L6-v2`. - **FAISS Vector Database** for fast and efficient retrieval. - **RAG pipeline**: 1. User query β†’ query structuring (handles poor English, spelling issues). 2. Retrieve relevant chunks from FAISS. 3. If no relevant results β†’ no LLM call (saves cost). 4. If relevant β†’ LLM generates structured, contextual answers. - **Prompt engineering** for better accuracy and reduced hallucinations. - Backend: **FastAPI**. - Frontend: **HTML, CSS, JS chatbot interface**. --- ## πŸ—οΈ Architecture **User Query β†’ Query Structuring β†’ FAISS Retriever β†’ Relevant Chunks β†’ LLM β†’ Answer** # πŸ› οΈ Tech Stack - **Python** (data handling & backend) - **PyPDF, TextLoader** β†’ PDF/Text extraction - **FAISS** β†’ Vector database - **HuggingFace Sentence Transformers** β†’ Embeddings - **FastAPI** β†’ Backend API - **HTML, CSS, JavaScript** β†’ Frontend chatbot UI - **LLM (OpenAI / HuggingFace)** β†’ Answer generation --- ## βš™οΈ Installation ### 1. Clone the repository ```bash git clone https://github.com/subhakanta156/odisha-disaster-knowledge-assistant.git ``` ### 2. Create virtual environment & install dependencies ```bash python -m venv venv source venv/bin/activate # Linux/Mac venv\Scripts\activate # Windows pip install -r requirements.txt ``` ### 3. Prepare the data - Place all PDFs/text files inside the data/ folder. - Run preprocessing & embedding script: ```bash python scripts/build_vector_store.py ``` ### 4. Run the FastAPI backend ```bash uvicorn app.main:app --reload ``` ### 5. Open the frontend - Open `frontend/index.html` in your browser. ## πŸš€ Usage Ask questions like: - β€œHow does Odisha’s disaster proneness compare with other Indian states?” - β€œProvide details of relief funds sanctioned for Odisha during the 1999 Super Cyclone.” - β€œWhich Odisha agency is primarily responsible for issuing cyclone alerts?” - β€œExplain the key steps taken by the Odisha government if lives are lost in a disaster?” The system retrieves relevant chunks from reports and generates reliable, structured answers. --- ## πŸ“Š Optimizations - Added query filtering β†’ No LLM call if retrieval fails (reduces cost). - Handled poor English queries via query restructuring. - Improved prompt engineering to minimize hallucinations. --- ## πŸ“Œ Future Improvements - Add multilingual support (Odia/Hindi queries). - Deploy on cloud (AWS/GCP/Azure) with Docker. - Use advanced embeddings (e.g., `all-mpnet-base-v2`) for higher accuracy. - Add real-time updates (e.g., cyclone alerts). --- ## πŸ‘¨β€πŸ’» Author **Subhakanta Rath** MSc AI & ML @ IIIT Lucknow Passionate about AI/ML, Data Engineering