huytrao123 commited on
Commit
f4ef748
Β·
verified Β·
1 Parent(s): ced61cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +319 -310
README.md CHANGED
@@ -1,310 +1,319 @@
1
- # RAG Personal Diary Chatbot
2
-
3
- ## πŸ“– Project Description
4
-
5
- RAG Personal Diary Chatbot is an intelligent chatbot application that uses RAG (Retrieval-Augmented Generation) architecture to interact with users' personal diaries. The application allows users to ask questions about diary content and receive accurate answers based on actual data.
6
-
7
- ## ✨ Key Features
8
-
9
-
10
- ## πŸ—οΈ System Architecture
11
-
12
- ```
13
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
14
- β”‚ Streamlit UI β”‚ β”‚ FastAPI β”‚ β”‚ Vector β”‚
15
- β”‚ (Frontend) │◄──►│ Backend │◄──►│ Database β”‚
16
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
17
- β”‚
18
- β–Ό
19
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
20
- β”‚ RAG Engine β”‚
21
- β”‚ (LLM + β”‚
22
- β”‚ Retrieval) β”‚
23
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
24
- ```
25
-
26
- ## πŸš€ Installation and Setup
27
-
28
- ### System Requirements
29
-
30
- ### Install Dependencies
31
- ```bash
32
- # Create virtual environment
33
- python -m venv .venv
34
-
35
- # Activate virtual environment
36
- # Windows
37
- .venv\Scripts\activate
38
- # Linux/Mac
39
- source .venv/bin/activate
40
-
41
- # Install packages
42
- pip install -r requirements.txt
43
- ```
44
-
45
- ### Environment Configuration
46
-
47
- Create a `.env` file in the project root directory with the following structure:
48
-
49
- ```env
50
- # API Keys
51
- OPENAI_API_KEY=your_openai_api_key_here
52
- GOOGLE_API_KEY=your_google_api_key_here
53
- ANTHROPIC_API_KEY=your_anthropic_api_key_here
54
-
55
- # Database Configuration
56
- DATABASE_URL=sqlite:///./user_database/auth.db
57
- VECTOR_DB_PATH=./VectorDB
58
-
59
- # Model Configuration
60
- EMBEDDING_MODEL=google-universal-sentence-encoder
61
- LLM_MODEL=gpt-3.5-turbo
62
- CHUNK_SIZE=1000
63
- CHUNK_OVERLAP=200
64
-
65
- # Server Configuration
66
- RAG_SERVICE_PORT=8001
67
- STREAMLIT_PORT=8501
68
- FASTAPI_PORT=8000
69
-
70
- # Security
71
- SECRET_KEY=your_secret_key_here
72
- JWT_SECRET=your_jwt_secret_here
73
-
74
- # Logging
75
- LOG_LEVEL=INFO
76
- LOG_FILE=./logs/app.log
77
-
78
- # Vector Database
79
- CHROMA_DB_PATH=./VectorDB
80
- PERSIST_DIRECTORY=./VectorDB
81
-
82
- # File Processing
83
- SUPPORTED_FORMATS=pdf,docx,txt,md
84
- MAX_FILE_SIZE=10485760
85
- TEMP_DIR=./temp
86
-
87
- # RAG Configuration
88
- TOP_K_RESULTS=5
89
- SIMILARITY_THRESHOLD=0.7
90
- MAX_TOKENS=4096
91
- TEMPERATURE=0.7
92
- ```
93
-
94
- **Important Notes:**
95
-
96
- ### Run the Application
97
-
98
- #### 1. Start RAG Service
99
- ```bash
100
- python start_rag_service.py
101
- ```
102
- Service will run at: http://127.0.0.1:8001
103
-
104
- #### 2. Start Streamlit UI
105
- ```bash
106
- cd src/streamlit_app
107
- streamlit run interface.py
108
- ```
109
- UI will run at: http://localhost:8501
110
-
111
- ## πŸ“ Directory Structure
112
-
113
- ```
114
- RAG-Personal-Diary-Chatbot/
115
- β”œβ”€β”€ src/
116
- β”‚ β”œβ”€β”€ Indexingstep/ # Data indexing pipeline
117
- β”‚ β”œβ”€β”€ Retrivel_And_Generation/ # RAG engine
118
- β”‚ β”œβ”€β”€ rag_service/ # FastAPI backend
119
- β”‚ β”œβ”€β”€ streamlit_app/ # User interface
120
- β”‚ └── VectorDB/ # Vector database
121
- β”œβ”€β”€ notebook/ # Jupyter notebooks
122
- β”œβ”€β”€ tests/ # Unit tests
123
- β”œβ”€β”€ images/ # Documentation images
124
- β”œβ”€β”€ start_rag_service.py # Service startup script
125
- β”œβ”€β”€ .env # Environment variables (create from template)
126
- β”œβ”€β”€ env_template.txt # Environment variables template
127
- └── README.md
128
- ```
129
-
130
- ## πŸ”§ Configuration
131
-
132
- ### Vector Database
133
-
134
- ### AI Models
135
-
136
- ## πŸ“Š Performance
137
-
138
-
139
- ## πŸ§ͺ Testing
140
-
141
- ```bash
142
- # Run all tests
143
- python -m pytest tests/
144
-
145
- # Run specific test
146
- python -m pytest tests/test_rag_system.py
147
- ```
148
-
149
- ## 🀝 Contributing
150
-
151
- 1. Fork the project
152
- 2. Create feature branch (`git checkout -b feature/AmazingFeature`)
153
- 3. Commit changes (`git commit -m 'Add some AmazingFeature'`)
154
- 4. Push to branch (`git push origin feature/AmazingFeature`)
155
- 5. Open Pull Request
156
-
157
- ## πŸ“ License
158
-
159
- This project is distributed under the MIT License. See the `LICENSE` file for more details.
160
-
161
- ## πŸ“ž Contact
162
-
163
-
164
- ## πŸ™ Acknowledgments
165
-
166
-
167
- ## πŸ“– Project Description
168
-
169
- RAG Personal Diary Chatbot is an intelligent chatbot application that leverages Retrieval-Augmented Generation (RAG) architecture to interact with users' personal diaries. Users can ask questions about their diary content and receive accurate, context-based answers.
170
-
171
- ## ✨ Key Features
172
-
173
- - **Diary Indexing**: Automatically processes and indexes diary files (PDF, DOCX, TXT)
174
- - **Semantic Search**: Uses a vector database for semantic search
175
- - **AI Chatbot**: Natural interaction with diary data
176
- - **User Isolation**: Each user has a separate vector database
177
- - **Web Interface**: Easy-to-use Streamlit UI
178
- - **REST API**: FastAPI backend for integration
179
-
180
- ## πŸ—οΈ System Architecture
181
-
182
- ```
183
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
184
- β”‚ Streamlit UI │◄──►│ FastAPI │◄──►│ Vector DB β”‚
185
- β”‚ (Frontend) β”‚ β”‚ Backend β”‚ β”‚ (ChromaDB) β”‚
186
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
187
- β”‚
188
- β–Ό
189
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
190
- β”‚ RAG Engine β”‚
191
- β”‚ (LLM + β”‚
192
- β”‚ Retrieval) β”‚
193
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
194
- ```
195
-
196
- ## πŸš€ Installation and Setup
197
-
198
- ### System Requirements
199
-
200
- - Python 3.8+
201
-
202
- ### Install Dependencies
203
-
204
- ```bash
205
- # Create virtual environment
206
- python -m venv .venv
207
-
208
- # Activate virtual environment
209
- # Windows
210
- .venv\Scripts\activate
211
- # Linux/Mac
212
- source .venv/bin/activate
213
-
214
- # Install packages
215
- pip install -r requirements.txt
216
- ```
217
-
218
- ### Environment Configuration
219
-
220
- Create a `.env` file in the project root directory with the following structure:
221
-
222
- ```env
223
- # Google API Configuration for RAG System
224
- GOOGLE_API_KEY=[Google API key]
225
-
226
- # Database Configuration
227
- DATABASE_PATH=./src/streamlit_app/backend/diary.db
228
-
229
- # Vector Database Configuration
230
- VECTOR_DB_PATH=./src/Indexingstep/diary_vector_db_enhanced
231
- COLLECTION_NAME=diary_entries
232
-
233
- # RAG Configuration
234
- EMBEDDING_MODEL=models/embedding-001
235
- CHAT_MODEL=gemini-2.5-flash
236
- ```
237
-
238
- **Important Notes:**
239
- - Replace all placeholder values with your actual API keys and configuration
240
- - Keep your `.env` file secure and never commit it to version control
241
- - The `.env` file is already included in `.gitignore`
242
- - Use `env_template.txt` as a reference to create your `.env` file
243
-
244
- ### Run the Application
245
-
246
- ```bash
247
- # Start the RAG backend service
248
- python start_rag_service.py
249
-
250
- # Start the Streamlit UI
251
- streamlit run src/streamlit_app/interface.py
252
- ```
253
-
254
- ## πŸ“ Directory Structure
255
-
256
- ```
257
- RAG-Personal-Diary-Chatbot/
258
- β”œβ”€β”€ src/
259
- β”‚ β”œβ”€β”€ Indexingstep/ # Data indexing pipeline
260
- β”‚ β”œβ”€β”€ Retrivel_And_Generation/ # RAG engine
261
- β”‚ β”œβ”€β”€ rag_service/ # FastAPI backend
262
- β”‚ β”œβ”€β”€ streamlit_app/ # User interface
263
- β”‚ └── VectorDB/ # Vector database
264
- β”œβ”€β”€ notebook/ # Jupyter notebooks
265
- β”œβ”€β”€ tests/ # Unit tests
266
- β”œβ”€β”€ images/ # Documentation images
267
- β”œβ”€β”€ start_rag_service.py # Service startup script
268
- β”œβ”€β”€ .env # Environment variables (create from template)
269
- β”œβ”€β”€ env_template.txt # Environment variables template
270
- └── README.md
271
- ```
272
-
273
- ## πŸ”§ Configuration
274
-
275
- ### Vector Database
276
- - **ChromaDB**: Main database for vector embeddings
277
- - **Chunk size**: 1000 characters (customizable)
278
- - **Overlap**: 200 characters between chunks
279
-
280
- ### AI Models
281
- - **Embedding**: Google's Universal Sentence Encoder
282
- - **LLM**: Google Gemini (can be replaced with other models)
283
-
284
- ## πŸ“Š Performance
285
-
286
- - **Processing time**: ~2-5 seconds per question
287
- - **Accuracy**: 85-95% depending on data quality
288
- - **Scalability**: Supports thousands of diaries
289
-
290
-
291
- ## 🀝 Contributing
292
-
293
- 1. Fork the project
294
- 2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
295
- 3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
296
- 4. Push to the branch (`git push origin feature/AmazingFeature`)
297
- 5. Open a Pull Request
298
- ## πŸ“ž Contact
299
-
300
- - **Author**: [DongAnh]
301
- - **Email**: [donganhng098@gmail.com]
302
- - **GitHub**: [github.com/DongAnh]
303
-
304
- ## πŸ™ Acknowledgments
305
-
306
- - Gemini for GPT models
307
- - Google for Universal Sentence Encoder
308
- - ChromaDB team for vector database
309
- - FastAPI and Streamlit communities
310
- - RAG architecture
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: FastAPI + Streamlit Demo
3
+ emoji: 🐳
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: docker
7
+ app_file: Dockerfile
8
+ pinned: false
9
+ ---
10
+ # RAG Personal Diary Chatbot
11
+
12
+ ## πŸ“– Project Description
13
+
14
+ RAG Personal Diary Chatbot is an intelligent chatbot application that uses RAG (Retrieval-Augmented Generation) architecture to interact with users' personal diaries. The application allows users to ask questions about diary content and receive accurate answers based on actual data.
15
+
16
+ ## ✨ Key Features
17
+
18
+
19
+ ## πŸ—οΈ System Architecture
20
+
21
+ ```
22
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
23
+ β”‚ Streamlit UI β”‚ β”‚ FastAPI β”‚ β”‚ Vector β”‚
24
+ β”‚ (Frontend) │◄──►│ Backend │◄──►│ Database β”‚
25
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
26
+ β”‚
27
+ β–Ό
28
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
29
+ β”‚ RAG Engine β”‚
30
+ β”‚ (LLM + β”‚
31
+ β”‚ Retrieval) β”‚
32
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
33
+ ```
34
+
35
+ ## πŸš€ Installation and Setup
36
+
37
+ ### System Requirements
38
+
39
+ ### Install Dependencies
40
+ ```bash
41
+ # Create virtual environment
42
+ python -m venv .venv
43
+
44
+ # Activate virtual environment
45
+ # Windows
46
+ .venv\Scripts\activate
47
+ # Linux/Mac
48
+ source .venv/bin/activate
49
+
50
+ # Install packages
51
+ pip install -r requirements.txt
52
+ ```
53
+
54
+ ### Environment Configuration
55
+
56
+ Create a `.env` file in the project root directory with the following structure:
57
+
58
+ ```env
59
+ # API Keys
60
+ OPENAI_API_KEY=your_openai_api_key_here
61
+ GOOGLE_API_KEY=your_google_api_key_here
62
+ ANTHROPIC_API_KEY=your_anthropic_api_key_here
63
+
64
+ # Database Configuration
65
+ DATABASE_URL=sqlite:///./user_database/auth.db
66
+ VECTOR_DB_PATH=./VectorDB
67
+
68
+ # Model Configuration
69
+ EMBEDDING_MODEL=google-universal-sentence-encoder
70
+ LLM_MODEL=gpt-3.5-turbo
71
+ CHUNK_SIZE=1000
72
+ CHUNK_OVERLAP=200
73
+
74
+ # Server Configuration
75
+ RAG_SERVICE_PORT=8001
76
+ STREAMLIT_PORT=8501
77
+ FASTAPI_PORT=8000
78
+
79
+ # Security
80
+ SECRET_KEY=your_secret_key_here
81
+ JWT_SECRET=your_jwt_secret_here
82
+
83
+ # Logging
84
+ LOG_LEVEL=INFO
85
+ LOG_FILE=./logs/app.log
86
+
87
+ # Vector Database
88
+ CHROMA_DB_PATH=./VectorDB
89
+ PERSIST_DIRECTORY=./VectorDB
90
+
91
+ # File Processing
92
+ SUPPORTED_FORMATS=pdf,docx,txt,md
93
+ MAX_FILE_SIZE=10485760
94
+ TEMP_DIR=./temp
95
+
96
+ # RAG Configuration
97
+ TOP_K_RESULTS=5
98
+ SIMILARITY_THRESHOLD=0.7
99
+ MAX_TOKENS=4096
100
+ TEMPERATURE=0.7
101
+ ```
102
+
103
+ **Important Notes:**
104
+
105
+ ### Run the Application
106
+
107
+ #### 1. Start RAG Service
108
+ ```bash
109
+ python start_rag_service.py
110
+ ```
111
+ Service will run at: http://127.0.0.1:8001
112
+
113
+ #### 2. Start Streamlit UI
114
+ ```bash
115
+ cd src/streamlit_app
116
+ streamlit run interface.py
117
+ ```
118
+ UI will run at: http://localhost:8501
119
+
120
+ ## πŸ“ Directory Structure
121
+
122
+ ```
123
+ RAG-Personal-Diary-Chatbot/
124
+ β”œβ”€β”€ src/
125
+ β”‚ β”œβ”€β”€ Indexingstep/ # Data indexing pipeline
126
+ β”‚ β”œβ”€β”€ Retrivel_And_Generation/ # RAG engine
127
+ β”‚ β”œβ”€β”€ rag_service/ # FastAPI backend
128
+ β”‚ β”œβ”€β”€ streamlit_app/ # User interface
129
+ β”‚ └── VectorDB/ # Vector database
130
+ β”œβ”€β”€ notebook/ # Jupyter notebooks
131
+ β”œβ”€β”€ tests/ # Unit tests
132
+ β”œβ”€β”€ images/ # Documentation images
133
+ β”œβ”€β”€ start_rag_service.py # Service startup script
134
+ β”œβ”€β”€ .env # Environment variables (create from template)
135
+ β”œβ”€β”€ env_template.txt # Environment variables template
136
+ └── README.md
137
+ ```
138
+
139
+ ## πŸ”§ Configuration
140
+
141
+ ### Vector Database
142
+
143
+ ### AI Models
144
+
145
+ ## πŸ“Š Performance
146
+
147
+
148
+ ## πŸ§ͺ Testing
149
+
150
+ ```bash
151
+ # Run all tests
152
+ python -m pytest tests/
153
+
154
+ # Run specific test
155
+ python -m pytest tests/test_rag_system.py
156
+ ```
157
+
158
+ ## 🀝 Contributing
159
+
160
+ 1. Fork the project
161
+ 2. Create feature branch (`git checkout -b feature/AmazingFeature`)
162
+ 3. Commit changes (`git commit -m 'Add some AmazingFeature'`)
163
+ 4. Push to branch (`git push origin feature/AmazingFeature`)
164
+ 5. Open Pull Request
165
+
166
+ ## πŸ“ License
167
+
168
+ This project is distributed under the MIT License. See the `LICENSE` file for more details.
169
+
170
+ ## πŸ“ž Contact
171
+
172
+
173
+ ## πŸ™ Acknowledgments
174
+
175
+
176
+ ## πŸ“– Project Description
177
+
178
+ RAG Personal Diary Chatbot is an intelligent chatbot application that leverages Retrieval-Augmented Generation (RAG) architecture to interact with users' personal diaries. Users can ask questions about their diary content and receive accurate, context-based answers.
179
+
180
+ ## ✨ Key Features
181
+
182
+ - **Diary Indexing**: Automatically processes and indexes diary files (PDF, DOCX, TXT)
183
+ - **Semantic Search**: Uses a vector database for semantic search
184
+ - **AI Chatbot**: Natural interaction with diary data
185
+ - **User Isolation**: Each user has a separate vector database
186
+ - **Web Interface**: Easy-to-use Streamlit UI
187
+ - **REST API**: FastAPI backend for integration
188
+
189
+ ## πŸ—οΈ System Architecture
190
+
191
+ ```
192
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
193
+ β”‚ Streamlit UI │◄──►│ FastAPI │◄──►│ Vector DB β”‚
194
+ β”‚ (Frontend) β”‚ β”‚ Backend β”‚ β”‚ (ChromaDB) β”‚
195
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
196
+ β”‚
197
+ β–Ό
198
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
199
+ β”‚ RAG Engine β”‚
200
+ β”‚ (LLM + β”‚
201
+ β”‚ Retrieval) β”‚
202
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
203
+ ```
204
+
205
+ ## πŸš€ Installation and Setup
206
+
207
+ ### System Requirements
208
+
209
+ - Python 3.8+
210
+
211
+ ### Install Dependencies
212
+
213
+ ```bash
214
+ # Create virtual environment
215
+ python -m venv .venv
216
+
217
+ # Activate virtual environment
218
+ # Windows
219
+ .venv\Scripts\activate
220
+ # Linux/Mac
221
+ source .venv/bin/activate
222
+
223
+ # Install packages
224
+ pip install -r requirements.txt
225
+ ```
226
+
227
+ ### Environment Configuration
228
+
229
+ Create a `.env` file in the project root directory with the following structure:
230
+
231
+ ```env
232
+ # Google API Configuration for RAG System
233
+ GOOGLE_API_KEY=[Google API key]
234
+
235
+ # Database Configuration
236
+ DATABASE_PATH=./src/streamlit_app/backend/diary.db
237
+
238
+ # Vector Database Configuration
239
+ VECTOR_DB_PATH=./src/Indexingstep/diary_vector_db_enhanced
240
+ COLLECTION_NAME=diary_entries
241
+
242
+ # RAG Configuration
243
+ EMBEDDING_MODEL=models/embedding-001
244
+ CHAT_MODEL=gemini-2.5-flash
245
+ ```
246
+
247
+ **Important Notes:**
248
+ - Replace all placeholder values with your actual API keys and configuration
249
+ - Keep your `.env` file secure and never commit it to version control
250
+ - The `.env` file is already included in `.gitignore`
251
+ - Use `env_template.txt` as a reference to create your `.env` file
252
+
253
+ ### Run the Application
254
+
255
+ ```bash
256
+ # Start the RAG backend service
257
+ python start_rag_service.py
258
+
259
+ # Start the Streamlit UI
260
+ streamlit run src/streamlit_app/interface.py
261
+ ```
262
+
263
+ ## πŸ“ Directory Structure
264
+
265
+ ```
266
+ RAG-Personal-Diary-Chatbot/
267
+ β”œβ”€β”€ src/
268
+ β”‚ β”œβ”€β”€ Indexingstep/ # Data indexing pipeline
269
+ β”‚ β”œβ”€β”€ Retrivel_And_Generation/ # RAG engine
270
+ β”‚ β”œβ”€β”€ rag_service/ # FastAPI backend
271
+ β”‚ β”œβ”€β”€ streamlit_app/ # User interface
272
+ β”‚ └── VectorDB/ # Vector database
273
+ β”œβ”€β”€ notebook/ # Jupyter notebooks
274
+ β”œβ”€β”€ tests/ # Unit tests
275
+ β”œβ”€β”€ images/ # Documentation images
276
+ β”œβ”€β”€ start_rag_service.py # Service startup script
277
+ β”œβ”€β”€ .env # Environment variables (create from template)
278
+ β”œβ”€β”€ env_template.txt # Environment variables template
279
+ └── README.md
280
+ ```
281
+
282
+ ## πŸ”§ Configuration
283
+
284
+ ### Vector Database
285
+ - **ChromaDB**: Main database for vector embeddings
286
+ - **Chunk size**: 1000 characters (customizable)
287
+ - **Overlap**: 200 characters between chunks
288
+
289
+ ### AI Models
290
+ - **Embedding**: Google's Universal Sentence Encoder
291
+ - **LLM**: Google Gemini (can be replaced with other models)
292
+
293
+ ## πŸ“Š Performance
294
+
295
+ - **Processing time**: ~2-5 seconds per question
296
+ - **Accuracy**: 85-95% depending on data quality
297
+ - **Scalability**: Supports thousands of diaries
298
+
299
+
300
+ ## 🀝 Contributing
301
+
302
+ 1. Fork the project
303
+ 2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
304
+ 3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
305
+ 4. Push to the branch (`git push origin feature/AmazingFeature`)
306
+ 5. Open a Pull Request
307
+ ## πŸ“ž Contact
308
+
309
+ - **Author**: [DongAnh]
310
+ - **Email**: [donganhng098@gmail.com]
311
+ - **GitHub**: [github.com/DongAnh]
312
+
313
+ ## πŸ™ Acknowledgments
314
+
315
+ - Gemini for GPT models
316
+ - Google for Universal Sentence Encoder
317
+ - ChromaDB team for vector database
318
+ - FastAPI and Streamlit communities
319
+ - RAG architecture