File size: 8,833 Bytes
388aa42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
"""
JanSahayak Architecture Overview
================================

SYSTEM COMPONENTS
-----------------

1. AGENTS (agents/)
   - profiling_agent.py     β†’ User Profile Extraction
   - scheme_agent.py        β†’ Government Scheme Recommendations
   - exam_agent.py          β†’ Competitive Exam Recommendations
   - search_agent.py        β†’ Live Web Search (Tavily)
   - rag_agent.py          β†’ Vector Database Retrieval
   - document_agent.py      β†’ PDF/Image Text Extraction
   - benefit_agent.py       β†’ Missed Benefits Calculator

2. PROMPTS (prompts/)
   - profiling_prompt.py    β†’ User profiling instructions
   - scheme_prompt.py       β†’ Scheme recommendation template
   - exam_prompt.py         β†’ Exam recommendation template
   - rag_prompt.py          β†’ RAG retrieval instructions

3. RAG SYSTEM (rag/)
   - embeddings.py          β†’ HuggingFace embeddings (CPU)
   - scheme_vectorstore.py  β†’ FAISS store for schemes
   - exam_vectorstore.py    β†’ FAISS store for exams

4. TOOLS (tools/)
   - tavily_tool.py         β†’ Live government website search

5. WORKFLOW (graph/)
   - workflow.py            β†’ LangGraph orchestration

6. I/O HANDLERS (agent_io/)
   - profiling_io.py        β†’ Profiling agent I/O
   - scheme_io.py           β†’ Scheme agent I/O
   - exam_io.py             β†’ Exam agent I/O
   - benefit_io.py          β†’ Benefit agent I/O

7. DATA (data/)
   - schemes_pdfs/          β†’ Government scheme PDFs
   - exams_pdfs/            β†’ Competitive exam PDFs

8. OUTPUTS (outputs/)
   - results_*.json         β†’ Generated analysis results

9. CONFIGURATION
   - config.py              β†’ Configuration loader
   - .env                   β†’ API keys (user creates)
   - requirements.txt       β†’ Python dependencies

10. ENTRY POINTS
    - main.py               β†’ Main application
    - setup.py              β†’ Setup wizard


WORKFLOW EXECUTION
------------------

User Input
    ↓
[Profiling Agent]
    ↓
    β”œβ”€β†’ [Scheme Agent] ──→ [Benefit Agent] ──┐
    β”‚         ↓                               β”‚
    β”‚     [RAG Search]                        β”‚
    β”‚         ↓                               β”‚
    β”‚   [Tavily Search]                       β”‚
    β”‚                                         β”‚
    └─→ [Exam Agent] ─────────────────────────
              ↓                               β”‚
          [RAG Search]                        β”‚
              ↓                               β”‚
        [Tavily Search]                       β”‚
                                             ↓
                                    [Final Output]
                                             ↓
                                   [JSON Results File]


TECHNOLOGY STACK
----------------

LLM & AI:
- Groq API (llama-3.3-70b-versatile) β†’ Fast inference
- LangChain β†’ Agent framework
- LangGraph β†’ Workflow orchestration

Embeddings & Search:
- HuggingFace Transformers β†’ sentence-transformers/all-MiniLM-L6-v2
- FAISS (CPU) β†’ Vector similarity search

Web Search:
- Tavily API β†’ Government website search

Document Processing:
- PyPDF β†’ PDF text extraction
- Pytesseract β†’ OCR for images
- Pillow β†’ Image processing

Infrastructure:
- Python 3.8+
- CPU-only deployment (no GPU needed)
- PyTorch CPU version


DATA FLOW
---------

1. User Input Processing:
   Raw Text β†’ Profiling Agent β†’ Structured JSON Profile

2. Scheme Recommendation:
   Profile β†’ RAG Query β†’ Vectorstore Search β†’ Top-K Documents
   Profile + Documents β†’ Tavily Search (optional) β†’ Web Results
   Profile + Documents + Web Results β†’ LLM β†’ Recommendations

3. Exam Recommendation:
   Profile β†’ RAG Query β†’ Vectorstore Search β†’ Top-K Documents
   Profile + Documents β†’ Tavily Search (optional) β†’ Web Results
   Profile + Documents + Web Results β†’ LLM β†’ Recommendations

4. Benefit Calculation:
   Profile + Scheme Recommendations β†’ LLM β†’ Missed Benefits Analysis

5. Final Output:
   All Results β†’ JSON Compilation β†’ File Save β†’ User Display


API INTERACTIONS
----------------

1. Groq API:
   - Used by: All LLM-powered agents
   - Model: llama-3.3-70b-versatile
   - Purpose: Natural language understanding & generation
   - Rate: Per-request basis

2. Tavily API:
   - Used by: search_agent, scheme_agent, exam_agent
   - Purpose: Live government website search
   - Filter: .gov.in domains preferred
   - Depth: Advanced search mode

3. HuggingFace:
   - Used by: embeddings module
   - Model: sentence-transformers/all-MiniLM-L6-v2
   - Purpose: Document embeddings for RAG
   - Local: Runs on CPU, cached after first download


VECTORSTORE ARCHITECTURE
------------------------

Scheme Vectorstore (rag/scheme_index/):
β”œβ”€β”€ index.faiss          β†’ FAISS index file
β”œβ”€β”€ index.pkl            β†’ Metadata pickle
└── [Embedded chunks from schemes_pdfs/]

Exam Vectorstore (rag/exam_index/):
β”œβ”€β”€ index.faiss          β†’ FAISS index file
β”œβ”€β”€ index.pkl            β†’ Metadata pickle
└── [Embedded chunks from exams_pdfs/]

Embedding Dimension: 384
Similarity Metric: Cosine similarity
Chunk Size: Auto (from PyPDF)


AGENT SPECIALIZATIONS
---------------------

1. Profiling Agent:
   - Extraction-focused
   - Low temperature (0.1)
   - JSON output required
   - No external tools

2. Scheme Agent:
   - RAG + Web search
   - Temperature: 0.3
   - Tools: Vectorstore, Tavily
   - Output: Detailed scheme info

3. Exam Agent:
   - RAG + Web search
   - Temperature: 0.3
   - Tools: Vectorstore, Tavily
   - Output: Detailed exam info

4. Benefit Agent:
   - Calculation-focused
   - Temperature: 0.2
   - No external tools
   - Output: Financial analysis

5. Search Agent:
   - Web search only
   - Tool: Tavily API
   - Focus: .gov.in domains
   - Output: Live search results

6. RAG Agent:
   - Vectorstore query only
   - Tool: FAISS
   - Similarity search
   - Output: Relevant documents

7. Document Agent:
   - File processing
   - Tools: PyPDF, Pytesseract
   - Supports: PDF, Images
   - Output: Extracted text


SECURITY & PRIVACY
------------------

- API keys stored in .env (not committed to git)
- User data processed locally except LLM calls
- No data stored on external servers (except API providers)
- PDF data remains local
- Vectorstores are local
- Output files saved locally


SCALABILITY NOTES
-----------------

Current Setup (Single User):
- Synchronous workflow
- Local vectorstores
- CPU processing

Potential Scaling:
- Add Redis for caching
- Use cloud vectorstore (Pinecone, Weaviate)
- Parallel agent execution
- GPU acceleration for embeddings
- Database for user profiles
- API service deployment


ERROR HANDLING
--------------

Each agent includes:
- Try-catch blocks
- Error state tracking
- Graceful degradation
- Partial results on failure
- Error reporting in final output


MONITORING & LOGGING
--------------------

Current:
- Console print statements
- Agent start/completion messages
- Error messages
- Final output summary

Future Enhancement:
- Structured logging (logging module)
- Performance metrics
- API usage tracking
- User feedback collection


EXTENSIBILITY
-------------

Adding New Agent:
1. Create agent file in agents/
2. Add prompt template in prompts/
3. Create node function in workflow.py
4. Add node to graph
5. Define edges (connections)
6. Optional: Create I/O handler

Adding New Data Source:
1. Create vectorstore module in rag/
2. Add PDFs to data/ subdirectory
3. Build vectorstore
4. Create agent or modify existing

Adding New Tool:
1. Create tool in tools/
2. Import in agent
3. Use in agent logic


PERFORMANCE BENCHMARKS (Typical)
---------------------------------

Vectorstore Building:
- 10 PDFs: ~2-5 minutes
- 100 PDFs: ~20-30 minutes

Query Performance:
- Profiling: ~1-2 seconds
- RAG Search: ~0.5-1 second
- LLM Call: ~1-3 seconds
- Web Search: ~2-4 seconds
- Full Workflow: ~10-20 seconds

Memory Usage:
- Base: ~500 MB
- With models: ~2-3 GB
- With large PDFs: +500 MB per 100 PDFs


FUTURE ENHANCEMENTS
-------------------

1. Multilingual Support (Hindi, regional languages)
2. Voice input/output
3. Mobile app integration
4. Database for user history
5. Notification system for deadlines
6. Document upload interface
7. Real-time scheme updates
8. Community feedback integration
9. State-specific customization
10. Integration with government portals


END OF ARCHITECTURE DOCUMENT
"""