| # Setu π³π΅ | |
| **An AI-powered platform for legal assistance in Nepal** - making legal documents accessible, generating official letters, and detecting bias in legal text. | |
| ## π― Project Overview | |
| Setu is a comprehensive legal assistance platform that leverages AI/ML to help Nepali citizens interact with legal documents and government processes. The system consists of three main modules integrated with a modern web interface. | |
| ## π₯ Demo Video | |
| Watch the platform in action: [View Demo Video](https://drive.google.com/file/d/12j2J-_g7SHdcQTwU3hQU_uiWldB2RFUz/view?usp=drive_link) | |
| ## π Features | |
| ### Module A: Law Explanation (RAG-Based Chatbot) | |
| - **Intelligent Q&A**: Ask questions about Nepali laws in natural language (English/Nepali) | |
| - **Retrieval-Augmented Generation**: Retrieves relevant legal text and generates accurate explanations | |
| - **Source References**: Provides exact article/section references | |
| - **Vector Database**: ChromaDB with semantic search capabilities | |
| ### Module B: Multi-Category Bias Detection | |
| - **10+ Bias Categories**: Detects gender, caste, religion, age, disability, appearance, social status, political, and ambiguity biases | |
| - **Fine-tuned DistilBERT**: Custom model trained on Nepali legal texts | |
| - **Sentence Analysis**: Analyzes individual sentences or batch processing | |
| - **Debiasing Suggestions**: Provides bias-free alternatives for detected biases | |
| - **Confidence Scoring**: Returns confidence scores for each detection | |
| ### Module C: Letter Generation | |
| - **Template-Based Generation**: RAG-based intelligent template selection | |
| - **Natural Language Input**: Describe your need, get the right letter | |
| - **Smart Field Extraction**: Automatically extracts name, date, district, etc. | |
| - **Official Formats**: Generates proper Nepali government letter formats | |
| ### Utility: PDF Processing | |
| - **Text Extraction**: Extract text from legal PDFs (English & Nepali) | |
| - **Multi-method Support**: PyMuPDF, pdfplumber with intelligent fallback | |
| - **OCR Ready**: Handles scanned documents | |
| - **Integrated Pipeline**: Direct integration with bias detection | |
| ## π οΈ Tech Stack | |
| **Backend:** | |
| - FastAPI (Python) - RESTful API | |
| - ChromaDB - Vector database for embeddings | |
| - Mistral AI - LLM for generation | |
| - Sentence Transformers - Embeddings | |
| - PyMuPDF, PDFPlumber - PDF processing | |
| **Frontend:** | |
| - Next.js 16 - React framework | |
| - TypeScript - Type safety | |
| - Tailwind CSS - Styling | |
| - Radix UI - Component library | |
| - shadcn/ui - UI components | |
| **ML/AI:** | |
| - Hugging Face Transformers | |
| - Sentence Transformers | |
| - Custom fine-tuned models (Module B) | |
| ## π Prerequisites | |
| - **Python**: 3.9+ (recommended: 3.13) | |
| - **Node.js**: 18+ with pnpm | |
| - **API Keys**: Mistral AI API key | |
| - **System**: Linux/macOS/Windows | |
| ## βοΈ Installation | |
| ### 1. Clone the Repository | |
| ```bash | |
| git clone https://github.com/KhagendraN/Setu.git | |
| cd Setu | |
| ``` | |
| ### 2. Backend Setup | |
| Create a virtual environment: | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate # On Windows: venv\Scripts\activate | |
| ``` | |
| Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| Create `.env` file in the project root: | |
| ```bash | |
| MISTRAL_API_KEY=your_mistral_api_key_here | |
| ``` | |
| ### 3. Build Vector Databases | |
| **Module A (Law Explanation):** | |
| ```bash | |
| # Place your legal PDFs in data/module-A/law/ | |
| python -m module_a.process_documents | |
| python -m module_a.build_vector_db | |
| ``` | |
| **Module C (Letter Generation):** | |
| ```bash | |
| # Templates are already in data/module-C/ | |
| python -m module_c.indexer | |
| ``` | |
| ### 4. Frontend Setup | |
| ```bash | |
| cd Frontend | |
| pnpm install | |
| cd .. | |
| ``` | |
| ## π Running the Application | |
| You need **TWO terminals** to run the full application: | |
| ### Terminal 1: Backend API | |
| ```bash | |
| # Activate virtual environment | |
| source venv/bin/activate | |
| # Start the API server | |
| uvicorn api.main:app --reload --port 8000 | |
| ``` | |
| Backend will run at: `http://localhost:8000` | |
| API docs available at: `http://localhost:8000/docs` | |
| ### Terminal 2: Frontend | |
| ```bash | |
| cd Frontend | |
| pnpm dev | |
| ``` | |
| Frontend will run at: `http://localhost:3000` | |
| ## π³ Docker Usage (Recommended) | |
| The easiest way to run the entire platform is using Docker Compose. | |
| ### 1. Prerequisites | |
| - Docker and Docker Compose installed | |
| - `.env` file with `MISTRAL_API_KEY` in the root directory | |
| ### 2. Run with Docker Compose | |
| ```bash | |
| docker-compose up --build | |
| ``` | |
| This will: | |
| - Build and start the Backend API (port 8000) | |
| - Build and start the Frontend (port 3000) | |
| - Automatically run the vector database build scripts | |
| The application will be available at `http://localhost:3000`. | |
| ## π Project Structure | |
| ``` | |
| Setu/ | |
| βββ api/ # Main API endpoints | |
| β βββ main.py # FastAPI application | |
| β βββ routes/ | |
| β β βββ law_explanation.py # Module A endpoints | |
| β β βββ letter_generation.py # Module C endpoints | |
| β β βββ bias_detection.py # Module B endpoints | |
| β β βββ pdf_processing.py # PDF utility endpoints | |
| β βββ schemas.py # Pydantic models | |
| β | |
| βββ module_a/ # Law Explanation (RAG) | |
| β βββ rag_chain.py # RAG pipeline | |
| β βββ vector_db.py # ChromaDB interface | |
| β βββ process_documents.py # Document processing | |
| β βββ README.md | |
| β | |
| βββ module_b/ # Bias Detection | |
| β βββ inference.py # Model inference | |
| β βββ fine_tuning/ # Training scripts | |
| β βββ dataset/ # Training data | |
| β | |
| βββ module_c/ # Letter Generation | |
| β βββ interface.py # Main API | |
| β βββ retriever.py # Template retrieval | |
| β βββ generator.py # Letter generation | |
| β βββ indexer.py # Vector DB indexing | |
| β βββ README.md | |
| β | |
| βββ utility/ # PDF Processing | |
| β βββ pdf_processor.py # PDF extraction | |
| β βββ README.md | |
| β | |
| βββ Frontend/ # Next.js application | |
| β βββ app/ | |
| β β βββ chatbot/ # Module A UI | |
| β β βββ letter-generator/ # Module C UI | |
| β β βββ bias-checker/ # Module B UI | |
| β β βββ dashboard/ # Main dashboard | |
| β β βββ login/ # Authentication pages | |
| β βββ components/ # Reusable components | |
| β | |
| βββ data/ # Data storage | |
| βββ module-A/ # Law documents & vector DB | |
| βββ module-C/ # Letter templates & vector DB | |
| βββ module-B/ # Bias detection datasets | |
| ``` | |
| ## π API Endpoints | |
| ### Authentication | |
| - `POST /api/v1/signup` - Register a new user | |
| - `POST /api/v1/login` - User login | |
| - `GET /api/v1/me` - Get current user profile | |
| - `POST /api/v1/refresh` - Refresh access token | |
| ### Law Explanation (Module A) | |
| - `POST /api/v1/law-explanation/explain` - Ask legal questions (basic) | |
| - `POST /api/v1/law-explanation/chat` - Context-aware chat with conversation history | |
| - `GET /api/v1/law-explanation/sources` - Get source documents only | |
| ### Chat History | |
| - `POST /api/v1/chat-history/conversations` - Create a new conversation | |
| - `GET /api/v1/chat-history/conversations` - List all user conversations | |
| - `GET /api/v1/chat-history/conversations/{id}` - Get specific conversation with messages | |
| - `DELETE /api/v1/chat-history/conversations/{id}` - Delete a conversation | |
| - `POST /api/v1/chat-history/messages` - Save a message to conversation | |
| ### Letter Generation (Module C) | |
| - `POST /api/v1/search-template` - Search for letter templates | |
| - `POST /api/v1/get-template-details` - Get template requirements | |
| - `POST /api/v1/fill-template` - Fill template with user data | |
| - `POST /api/v1/generate-letter` - Generate complete letter (smart generation) | |
| - `POST /api/v1/analyze-requirements` - Analyze missing fields in template | |
| ### Bias Detection (Module B) | |
| - `POST /api/v1/detect-bias` - Detect bias in text | |
| - `POST /api/v1/detect-bias/batch` - Batch bias detection | |
| - `POST /api/v1/debias-sentence` - Get debiased alternatives | |
| - `POST /api/v1/debias-sentence/batch` - Batch debiasing | |
| - `GET /api/v1/health` - Health check | |
| ### Bias Detection HITL (Human-in-the-Loop) | |
| - `POST /api/v1/bias-detection-hitl/detect` - Detect bias with HITL workflow | |
| - `POST /api/v1/bias-detection-hitl/approve` - Approve bias detection results | |
| - `POST /api/v1/bias-detection-hitl/regenerate` - Regenerate debiased suggestions | |
| - `POST /api/v1/bias-detection-hitl/generate-pdf` - Generate PDF report | |
| ### PDF Processing (Utility) | |
| - `POST /api/v1/process-pdf` - Extract text from PDF | |
| - `POST /api/v1/process-pdf-to-bias` - Extract PDF and detect bias | |
| - `GET /api/v1/pdf-health` - Health check | |
| ### System | |
| - `GET /` - API welcome message | |
| - `GET /health` - System health check | |
| Full API documentation: `http://localhost:8000/docs` (when server is running) | |
| ## π¨ Frontend Features | |
| - **Dashboard**: Overview of all modules | |
| - **Chatbot**: Interactive law explanation interface | |
| - **Letter Generator**: Step-by-step letter creation wizard | |
| - **Bias Checker**: Upload documents or paste text for analysis | |
| - **User Profile**: User account management | |
| - **Responsive Design**: Works on desktop and mobile | |
| ## π§ͺ Testing | |
| ### Test Module A (Law Explanation) | |
| ```bash | |
| python -m module_a.test_rag | |
| ``` | |
| ### Test Module C (Letter Generation) | |
| ```bash | |
| python -m module_c.test_generation | |
| python -m module_c.test_interactive | |
| ``` | |
| ### Test PDF Processing | |
| ```bash | |
| python -m utility.test_pdf_processor | |
| ``` | |
| ### Test API Endpoints | |
| ```bash | |
| python -m api.test_api | |
| ``` | |
| ## π Configuration | |
| ### Environment Variables (.env) | |
| ```bash | |
| # Required | |
| MISTRAL_API_KEY=your_api_key_here | |
| # Optional - MongoDB (if using Auth Backend) | |
| # MONGODB_URL=mongodb://localhost:27017 | |
| # SECRET_KEY=your_secret_key | |
| ``` | |
| ### Module Configurations | |
| - **Module A**: [module_a/config.py](module_a/config.py) | |
| - **Module C**: [module_c/config.py](module_c/config.py) | |
| ## π Troubleshooting | |
| ### Backend Issues | |
| - **Import errors**: Make sure virtual environment is activated | |
| - **Vector DB empty**: Run the build scripts for modules A & C | |
| - **API key errors**: Check `.env` file has valid `MISTRAL_API_KEY` | |
| ### Frontend Issues | |
| - **Port 3000 in use**: Change port with `pnpm dev -- -p 3001` | |
| - **Module not found**: Run `pnpm install` in Frontend directory | |
| - **API connection failed**: Ensure backend is running on port 8000 | |
| ### Common Errors | |
| ```bash | |
| # Reinstall dependencies | |
| pip install --upgrade -r requirements.txt | |
| # Rebuild vector databases | |
| python -m module_a.build_vector_db | |
| python -m module_c.indexer | |
| # Clear pnpm cache | |
| cd Frontend | |
| pnpm store prune | |
| pnpm install | |
| ``` | |
| ## π Documentation | |
| - [Module A Documentation](module_a/README.md) - Law Explanation RAG Pipeline | |
| - [Module C Documentation](module_c/README.md) - Letter Generation | |
| - [PDF Processing Guide](utility/README.md) - PDF text extraction | |
| - [Implementation Guides](docs/) - Detailed implementation workflows | |
| --- | |
| > This project is under development as part of a hackathon. | |