Spaces:
Running
Running
| """ | |
| JanSahayak Architecture Overview | |
| ================================ | |
| SYSTEM COMPONENTS | |
| ----------------- | |
| 1. AGENTS (agents/) | |
| - profiling_agent.py β User Profile Extraction | |
| - scheme_agent.py β Government Scheme Recommendations | |
| - exam_agent.py β Competitive Exam Recommendations | |
| - search_agent.py β Live Web Search (Tavily) | |
| - rag_agent.py β Vector Database Retrieval | |
| - document_agent.py β PDF/Image Text Extraction | |
| - benefit_agent.py β Missed Benefits Calculator | |
| 2. PROMPTS (prompts/) | |
| - profiling_prompt.py β User profiling instructions | |
| - scheme_prompt.py β Scheme recommendation template | |
| - exam_prompt.py β Exam recommendation template | |
| - rag_prompt.py β RAG retrieval instructions | |
| 3. RAG SYSTEM (rag/) | |
| - embeddings.py β HuggingFace embeddings (CPU) | |
| - scheme_vectorstore.py β FAISS store for schemes | |
| - exam_vectorstore.py β FAISS store for exams | |
| 4. TOOLS (tools/) | |
| - tavily_tool.py β Live government website search | |
| 5. WORKFLOW (graph/) | |
| - workflow.py β LangGraph orchestration | |
| 6. I/O HANDLERS (agent_io/) | |
| - profiling_io.py β Profiling agent I/O | |
| - scheme_io.py β Scheme agent I/O | |
| - exam_io.py β Exam agent I/O | |
| - benefit_io.py β Benefit agent I/O | |
| 7. DATA (data/) | |
| - schemes_pdfs/ β Government scheme PDFs | |
| - exams_pdfs/ β Competitive exam PDFs | |
| 8. OUTPUTS (outputs/) | |
| - results_*.json β Generated analysis results | |
| 9. CONFIGURATION | |
| - config.py β Configuration loader | |
| - .env β API keys (user creates) | |
| - requirements.txt β Python dependencies | |
| 10. ENTRY POINTS | |
| - main.py β Main application | |
| - setup.py β Setup wizard | |
| WORKFLOW EXECUTION | |
| ------------------ | |
| User Input | |
| β | |
| [Profiling Agent] | |
| β | |
| βββ [Scheme Agent] βββ [Benefit Agent] βββ | |
| β β β | |
| β [RAG Search] β | |
| β β β | |
| β [Tavily Search] β | |
| β β | |
| βββ [Exam Agent] βββββββββββββββββββββββββ€ | |
| β β | |
| [RAG Search] β | |
| β β | |
| [Tavily Search] β | |
| β | |
| [Final Output] | |
| β | |
| [JSON Results File] | |
| TECHNOLOGY STACK | |
| ---------------- | |
| LLM & AI: | |
| - Groq API (llama-3.3-70b-versatile) β Fast inference | |
| - LangChain β Agent framework | |
| - LangGraph β Workflow orchestration | |
| Embeddings & Search: | |
| - HuggingFace Transformers β sentence-transformers/all-MiniLM-L6-v2 | |
| - FAISS (CPU) β Vector similarity search | |
| Web Search: | |
| - Tavily API β Government website search | |
| Document Processing: | |
| - PyPDF β PDF text extraction | |
| - Pytesseract β OCR for images | |
| - Pillow β Image processing | |
| Infrastructure: | |
| - Python 3.8+ | |
| - CPU-only deployment (no GPU needed) | |
| - PyTorch CPU version | |
| DATA FLOW | |
| --------- | |
| 1. User Input Processing: | |
| Raw Text β Profiling Agent β Structured JSON Profile | |
| 2. Scheme Recommendation: | |
| Profile β RAG Query β Vectorstore Search β Top-K Documents | |
| Profile + Documents β Tavily Search (optional) β Web Results | |
| Profile + Documents + Web Results β LLM β Recommendations | |
| 3. Exam Recommendation: | |
| Profile β RAG Query β Vectorstore Search β Top-K Documents | |
| Profile + Documents β Tavily Search (optional) β Web Results | |
| Profile + Documents + Web Results β LLM β Recommendations | |
| 4. Benefit Calculation: | |
| Profile + Scheme Recommendations β LLM β Missed Benefits Analysis | |
| 5. Final Output: | |
| All Results β JSON Compilation β File Save β User Display | |
| API INTERACTIONS | |
| ---------------- | |
| 1. Groq API: | |
| - Used by: All LLM-powered agents | |
| - Model: llama-3.3-70b-versatile | |
| - Purpose: Natural language understanding & generation | |
| - Rate: Per-request basis | |
| 2. Tavily API: | |
| - Used by: search_agent, scheme_agent, exam_agent | |
| - Purpose: Live government website search | |
| - Filter: .gov.in domains preferred | |
| - Depth: Advanced search mode | |
| 3. HuggingFace: | |
| - Used by: embeddings module | |
| - Model: sentence-transformers/all-MiniLM-L6-v2 | |
| - Purpose: Document embeddings for RAG | |
| - Local: Runs on CPU, cached after first download | |
| VECTORSTORE ARCHITECTURE | |
| ------------------------ | |
| Scheme Vectorstore (rag/scheme_index/): | |
| βββ index.faiss β FAISS index file | |
| βββ index.pkl β Metadata pickle | |
| βββ [Embedded chunks from schemes_pdfs/] | |
| Exam Vectorstore (rag/exam_index/): | |
| βββ index.faiss β FAISS index file | |
| βββ index.pkl β Metadata pickle | |
| βββ [Embedded chunks from exams_pdfs/] | |
| Embedding Dimension: 384 | |
| Similarity Metric: Cosine similarity | |
| Chunk Size: Auto (from PyPDF) | |
| AGENT SPECIALIZATIONS | |
| --------------------- | |
| 1. Profiling Agent: | |
| - Extraction-focused | |
| - Low temperature (0.1) | |
| - JSON output required | |
| - No external tools | |
| 2. Scheme Agent: | |
| - RAG + Web search | |
| - Temperature: 0.3 | |
| - Tools: Vectorstore, Tavily | |
| - Output: Detailed scheme info | |
| 3. Exam Agent: | |
| - RAG + Web search | |
| - Temperature: 0.3 | |
| - Tools: Vectorstore, Tavily | |
| - Output: Detailed exam info | |
| 4. Benefit Agent: | |
| - Calculation-focused | |
| - Temperature: 0.2 | |
| - No external tools | |
| - Output: Financial analysis | |
| 5. Search Agent: | |
| - Web search only | |
| - Tool: Tavily API | |
| - Focus: .gov.in domains | |
| - Output: Live search results | |
| 6. RAG Agent: | |
| - Vectorstore query only | |
| - Tool: FAISS | |
| - Similarity search | |
| - Output: Relevant documents | |
| 7. Document Agent: | |
| - File processing | |
| - Tools: PyPDF, Pytesseract | |
| - Supports: PDF, Images | |
| - Output: Extracted text | |
| SECURITY & PRIVACY | |
| ------------------ | |
| - API keys stored in .env (not committed to git) | |
| - User data processed locally except LLM calls | |
| - No data stored on external servers (except API providers) | |
| - PDF data remains local | |
| - Vectorstores are local | |
| - Output files saved locally | |
| SCALABILITY NOTES | |
| ----------------- | |
| Current Setup (Single User): | |
| - Synchronous workflow | |
| - Local vectorstores | |
| - CPU processing | |
| Potential Scaling: | |
| - Add Redis for caching | |
| - Use cloud vectorstore (Pinecone, Weaviate) | |
| - Parallel agent execution | |
| - GPU acceleration for embeddings | |
| - Database for user profiles | |
| - API service deployment | |
| ERROR HANDLING | |
| -------------- | |
| Each agent includes: | |
| - Try-catch blocks | |
| - Error state tracking | |
| - Graceful degradation | |
| - Partial results on failure | |
| - Error reporting in final output | |
| MONITORING & LOGGING | |
| -------------------- | |
| Current: | |
| - Console print statements | |
| - Agent start/completion messages | |
| - Error messages | |
| - Final output summary | |
| Future Enhancement: | |
| - Structured logging (logging module) | |
| - Performance metrics | |
| - API usage tracking | |
| - User feedback collection | |
| EXTENSIBILITY | |
| ------------- | |
| Adding New Agent: | |
| 1. Create agent file in agents/ | |
| 2. Add prompt template in prompts/ | |
| 3. Create node function in workflow.py | |
| 4. Add node to graph | |
| 5. Define edges (connections) | |
| 6. Optional: Create I/O handler | |
| Adding New Data Source: | |
| 1. Create vectorstore module in rag/ | |
| 2. Add PDFs to data/ subdirectory | |
| 3. Build vectorstore | |
| 4. Create agent or modify existing | |
| Adding New Tool: | |
| 1. Create tool in tools/ | |
| 2. Import in agent | |
| 3. Use in agent logic | |
| PERFORMANCE BENCHMARKS (Typical) | |
| --------------------------------- | |
| Vectorstore Building: | |
| - 10 PDFs: ~2-5 minutes | |
| - 100 PDFs: ~20-30 minutes | |
| Query Performance: | |
| - Profiling: ~1-2 seconds | |
| - RAG Search: ~0.5-1 second | |
| - LLM Call: ~1-3 seconds | |
| - Web Search: ~2-4 seconds | |
| - Full Workflow: ~10-20 seconds | |
| Memory Usage: | |
| - Base: ~500 MB | |
| - With models: ~2-3 GB | |
| - With large PDFs: +500 MB per 100 PDFs | |
| FUTURE ENHANCEMENTS | |
| ------------------- | |
| 1. Multilingual Support (Hindi, regional languages) | |
| 2. Voice input/output | |
| 3. Mobile app integration | |
| 4. Database for user history | |
| 5. Notification system for deadlines | |
| 6. Document upload interface | |
| 7. Real-time scheme updates | |
| 8. Community feedback integration | |
| 9. State-specific customization | |
| 10. Integration with government portals | |
| END OF ARCHITECTURE DOCUMENT | |
| """ | |