Subhakanta156 commited on
Commit
03c774f
Β·
1 Parent(s): 4787e22

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -109
README.md CHANGED
@@ -1,109 +1,120 @@
1
- # πŸŒ€ Odisha Disaster Management RAG Chatbot
2
-
3
- ## πŸ“Œ Overview
4
- Odisha faces recurring disasters every year such as **floods, cyclones, and droughts**.
5
- While the state has a strong disaster management authority (OSDMA), information is often scattered across reports, research papers, and government documents.
6
-
7
- This project builds a **Retrieval-Augmented Generation (RAG) based chatbot** that provides citizens, researchers, and policymakers with **clear, reliable, and contextual answers** related to Odisha’s disaster management practices.
8
-
9
- ---
10
-
11
- ## ✨ Features
12
- - Handles **132 PDFs** and **12 text files** (OSDMA, IMD, NDMA, research papers).
13
- - **Preprocessing pipeline**: PDF/text extraction, cleaning, normalization, chunking.
14
- - **Embeddings** with `sentence-transformers/all-MiniLM-L6-v2`.
15
- - **FAISS Vector Database** for fast and efficient retrieval.
16
- - **RAG pipeline**:
17
- 1. User query β†’ query structuring (handles poor English, spelling issues).
18
- 2. Retrieve relevant chunks from FAISS.
19
- 3. If no relevant results β†’ no LLM call (saves cost).
20
- 4. If relevant β†’ LLM generates structured, contextual answers.
21
- - **Prompt engineering** for better accuracy and reduced hallucinations.
22
- - Backend: **FastAPI**.
23
- - Frontend: **HTML, CSS, JS chatbot interface**.
24
-
25
- ---
26
-
27
- ## πŸ—οΈ Architecture
28
-
29
- **User Query β†’ Query Structuring β†’ FAISS Retriever β†’ Relevant Chunks β†’ LLM β†’ Answer**
30
-
31
- # πŸ› οΈ Tech Stack
32
-
33
- - **Python** (data handling & backend)
34
- - **PyPDF, TextLoader** β†’ PDF/Text extraction
35
- - **FAISS** β†’ Vector database
36
- - **HuggingFace Sentence Transformers** β†’ Embeddings
37
- - **FastAPI** β†’ Backend API
38
- - **HTML, CSS, JavaScript** β†’ Frontend chatbot UI
39
- - **LLM (OpenAI / HuggingFace)** β†’ Answer generation
40
-
41
- ---
42
-
43
- ## βš™οΈ Installation
44
-
45
- ### 1. Clone the repository
46
- ```bash
47
- git clone https://github.com/subhakanta156/odisha-disaster-knowledge-assistant.git
48
- ```
49
- ### 2. Create virtual environment & install dependencies
50
- ```bash
51
- python -m venv venv
52
- source venv/bin/activate # Linux/Mac
53
- venv\Scripts\activate # Windows
54
-
55
- pip install -r requirements.txt
56
- ```
57
- ### 3. Prepare the data
58
- - Place all PDFs/text files inside the data/ folder.
59
- - Run preprocessing & embedding script:
60
- ```bash
61
- python scripts/build_vector_store.py
62
- ```
63
- ### 4. Run the FastAPI backend
64
- ```bash
65
- uvicorn app.main:app --reload
66
- ```
67
- ### 5. Open the frontend
68
- - Open `frontend/index.html` in your browser.
69
-
70
- ## πŸš€ Usage
71
-
72
- Ask questions like:
73
-
74
- - β€œHow does Odisha’s disaster proneness compare with other Indian states?”
75
- - β€œProvide details of relief funds sanctioned for Odisha during the 1999 Super Cyclone.”
76
- - β€œWhich Odisha agency is primarily responsible for issuing cyclone alerts?”
77
- - β€œExplain the key steps taken by the Odisha government if lives are lost in a disaster?”
78
-
79
-
80
- The system retrieves relevant chunks from reports and generates reliable, structured answers.
81
-
82
- ---
83
-
84
- ## πŸ“Š Optimizations
85
-
86
- - Added query filtering β†’ No LLM call if retrieval fails (reduces cost).
87
- - Handled poor English queries via query restructuring.
88
- - Improved prompt engineering to minimize hallucinations.
89
-
90
- ---
91
-
92
- ## πŸ“Œ Future Improvements
93
-
94
- - Add multilingual support (Odia/Hindi queries).
95
- - Deploy on cloud (AWS/GCP/Azure) with Docker.
96
- - Use advanced embeddings (e.g., `all-mpnet-base-v2`) for higher accuracy.
97
- - Add real-time updates (e.g., cyclone alerts).
98
-
99
- ---
100
-
101
- ## πŸ‘¨β€πŸ’» Author
102
-
103
- **Subhakanta Rath**
104
-
105
- MSc AI & ML @ IIIT Lucknow
106
-
107
- Passionate about AI/ML, Data Engineering
108
-
109
-
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ title: "Odisha Disaster RAG Chatbot"
4
+ emoji: "🌊"
5
+ colorFrom: green
6
+ colorTo: yellow
7
+ sdk: docker
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+ # πŸŒ€ Odisha Disaster Management RAG Chatbot
13
+
14
+ ## πŸ“Œ Overview
15
+ Odisha faces recurring disasters every year such as **floods, cyclones, and droughts**.
16
+ While the state has a strong disaster management authority (OSDMA), information is often scattered across reports, research papers, and government documents.
17
+
18
+ This project builds a **Retrieval-Augmented Generation (RAG) based chatbot** that provides citizens, researchers, and policymakers with **clear, reliable, and contextual answers** related to Odisha’s disaster management practices.
19
+
20
+ ---
21
+
22
+ ## ✨ Features
23
+ - Handles **132 PDFs** and **12 text files** (OSDMA, IMD, NDMA, research papers).
24
+ - **Preprocessing pipeline**: PDF/text extraction, cleaning, normalization, chunking.
25
+ - **Embeddings** with `sentence-transformers/all-MiniLM-L6-v2`.
26
+ - **FAISS Vector Database** for fast and efficient retrieval.
27
+ - **RAG pipeline**:
28
+ 1. User query β†’ query structuring (handles poor English, spelling issues).
29
+ 2. Retrieve relevant chunks from FAISS.
30
+ 3. If no relevant results β†’ no LLM call (saves cost).
31
+ 4. If relevant β†’ LLM generates structured, contextual answers.
32
+ - **Prompt engineering** for better accuracy and reduced hallucinations.
33
+ - Backend: **FastAPI**.
34
+ - Frontend: **HTML, CSS, JS chatbot interface**.
35
+
36
+ ---
37
+
38
+ ## πŸ—οΈ Architecture
39
+
40
+ **User Query β†’ Query Structuring β†’ FAISS Retriever β†’ Relevant Chunks β†’ LLM β†’ Answer**
41
+
42
+ # πŸ› οΈ Tech Stack
43
+
44
+ - **Python** (data handling & backend)
45
+ - **PyPDF, TextLoader** β†’ PDF/Text extraction
46
+ - **FAISS** β†’ Vector database
47
+ - **HuggingFace Sentence Transformers** β†’ Embeddings
48
+ - **FastAPI** β†’ Backend API
49
+ - **HTML, CSS, JavaScript** β†’ Frontend chatbot UI
50
+ - **LLM (OpenAI / HuggingFace)** β†’ Answer generation
51
+
52
+ ---
53
+
54
+ ## βš™οΈ Installation
55
+
56
+ ### 1. Clone the repository
57
+ ```bash
58
+ git clone https://github.com/subhakanta156/odisha-disaster-knowledge-assistant.git
59
+ ```
60
+ ### 2. Create virtual environment & install dependencies
61
+ ```bash
62
+ python -m venv venv
63
+ source venv/bin/activate # Linux/Mac
64
+ venv\Scripts\activate # Windows
65
+
66
+ pip install -r requirements.txt
67
+ ```
68
+ ### 3. Prepare the data
69
+ - Place all PDFs/text files inside the data/ folder.
70
+ - Run preprocessing & embedding script:
71
+ ```bash
72
+ python scripts/build_vector_store.py
73
+ ```
74
+ ### 4. Run the FastAPI backend
75
+ ```bash
76
+ uvicorn app.main:app --reload
77
+ ```
78
+ ### 5. Open the frontend
79
+ - Open `frontend/index.html` in your browser.
80
+
81
+ ## πŸš€ Usage
82
+
83
+ Ask questions like:
84
+
85
+ - β€œHow does Odisha’s disaster proneness compare with other Indian states?”
86
+ - β€œProvide details of relief funds sanctioned for Odisha during the 1999 Super Cyclone.”
87
+ - β€œWhich Odisha agency is primarily responsible for issuing cyclone alerts?”
88
+ - β€œExplain the key steps taken by the Odisha government if lives are lost in a disaster?”
89
+
90
+
91
+ The system retrieves relevant chunks from reports and generates reliable, structured answers.
92
+
93
+ ---
94
+
95
+ ## πŸ“Š Optimizations
96
+
97
+ - Added query filtering β†’ No LLM call if retrieval fails (reduces cost).
98
+ - Handled poor English queries via query restructuring.
99
+ - Improved prompt engineering to minimize hallucinations.
100
+
101
+ ---
102
+
103
+ ## πŸ“Œ Future Improvements
104
+
105
+ - Add multilingual support (Odia/Hindi queries).
106
+ - Deploy on cloud (AWS/GCP/Azure) with Docker.
107
+ - Use advanced embeddings (e.g., `all-mpnet-base-v2`) for higher accuracy.
108
+ - Add real-time updates (e.g., cyclone alerts).
109
+
110
+ ---
111
+
112
+ ## πŸ‘¨β€πŸ’» Author
113
+
114
+ **Subhakanta Rath**
115
+
116
+ MSc AI & ML @ IIIT Lucknow
117
+
118
+ Passionate about AI/ML, Data Engineering
119
+
120
+