# Setu ๐Ÿ‡ณ๐Ÿ‡ต **An AI-powered platform for legal assistance in Nepal** - making legal documents accessible, generating official letters, and detecting bias in legal text. ## ๐ŸŽฏ Project Overview Setu is a comprehensive legal assistance platform that leverages AI/ML to help Nepali citizens interact with legal documents and government processes. The system consists of three main modules integrated with a modern web interface. ## ๐ŸŽฅ Demo Video Watch the platform in action: [View Demo Video](https://drive.google.com/file/d/12j2J-_g7SHdcQTwU3hQU_uiWldB2RFUz/view?usp=drive_link) ## ๐Ÿš€ Features ### Module A: Law Explanation (RAG-Based Chatbot) - **Intelligent Q&A**: Ask questions about Nepali laws in natural language (English/Nepali) - **Retrieval-Augmented Generation**: Retrieves relevant legal text and generates accurate explanations - **Source References**: Provides exact article/section references - **Vector Database**: ChromaDB with semantic search capabilities ### Module B: Multi-Category Bias Detection - **10+ Bias Categories**: Detects gender, caste, religion, age, disability, appearance, social status, political, and ambiguity biases - **Fine-tuned DistilBERT**: Custom model trained on Nepali legal texts - **Sentence Analysis**: Analyzes individual sentences or batch processing - **Debiasing Suggestions**: Provides bias-free alternatives for detected biases - **Confidence Scoring**: Returns confidence scores for each detection ### Module C: Letter Generation - **Template-Based Generation**: RAG-based intelligent template selection - **Natural Language Input**: Describe your need, get the right letter - **Smart Field Extraction**: Automatically extracts name, date, district, etc. - **Official Formats**: Generates proper Nepali government letter formats ### Utility: PDF Processing - **Text Extraction**: Extract text from legal PDFs (English & Nepali) - **Multi-method Support**: PyMuPDF, pdfplumber with intelligent fallback - **OCR Ready**: Handles scanned documents - **Integrated Pipeline**: Direct integration with bias detection ## ๐Ÿ› ๏ธ Tech Stack **Backend:** - FastAPI (Python) - RESTful API - ChromaDB - Vector database for embeddings - Mistral AI - LLM for generation - Sentence Transformers - Embeddings - PyMuPDF, PDFPlumber - PDF processing **Frontend:** - Next.js 16 - React framework - TypeScript - Type safety - Tailwind CSS - Styling - Radix UI - Component library - shadcn/ui - UI components **ML/AI:** - Hugging Face Transformers - Sentence Transformers - Custom fine-tuned models (Module B) ## ๐Ÿ“‹ Prerequisites - **Python**: 3.9+ (recommended: 3.13) - **Node.js**: 18+ with pnpm - **API Keys**: Mistral AI API key - **System**: Linux/macOS/Windows ## โš™๏ธ Installation ### 1. Clone the Repository ```bash git clone https://github.com/KhagendraN/Setu.git cd Setu ``` ### 2. Backend Setup Create a virtual environment: ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` Install dependencies: ```bash pip install -r requirements.txt ``` Create `.env` file in the project root: ```bash MISTRAL_API_KEY=your_mistral_api_key_here ``` ### 3. Build Vector Databases **Module A (Law Explanation):** ```bash # Place your legal PDFs in data/module-A/law/ python -m module_a.process_documents python -m module_a.build_vector_db ``` **Module C (Letter Generation):** ```bash # Templates are already in data/module-C/ python -m module_c.indexer ``` ### 4. Frontend Setup ```bash cd Frontend pnpm install cd .. ``` ## ๐Ÿš€ Running the Application You need **TWO terminals** to run the full application: ### Terminal 1: Backend API ```bash # Activate virtual environment source venv/bin/activate # Start the API server uvicorn api.main:app --reload --port 8000 ``` Backend will run at: `http://localhost:8000` API docs available at: `http://localhost:8000/docs` ### Terminal 2: Frontend ```bash cd Frontend pnpm dev ``` Frontend will run at: `http://localhost:3000` ## ๐Ÿณ Docker Usage (Recommended) The easiest way to run the entire platform is using Docker Compose. ### 1. Prerequisites - Docker and Docker Compose installed - `.env` file with `MISTRAL_API_KEY` in the root directory ### 2. Run with Docker Compose ```bash docker-compose up --build ``` This will: - Build and start the Backend API (port 8000) - Build and start the Frontend (port 3000) - Automatically run the vector database build scripts The application will be available at `http://localhost:3000`. ## ๐Ÿ“ Project Structure ``` Setu/ โ”œโ”€โ”€ api/ # Main API endpoints โ”‚ โ”œโ”€โ”€ main.py # FastAPI application โ”‚ โ”œโ”€โ”€ routes/ โ”‚ โ”‚ โ”œโ”€โ”€ law_explanation.py # Module A endpoints โ”‚ โ”‚ โ”œโ”€โ”€ letter_generation.py # Module C endpoints โ”‚ โ”‚ โ”œโ”€โ”€ bias_detection.py # Module B endpoints โ”‚ โ”‚ โ””โ”€โ”€ pdf_processing.py # PDF utility endpoints โ”‚ โ””โ”€โ”€ schemas.py # Pydantic models โ”‚ โ”œโ”€โ”€ module_a/ # Law Explanation (RAG) โ”‚ โ”œโ”€โ”€ rag_chain.py # RAG pipeline โ”‚ โ”œโ”€โ”€ vector_db.py # ChromaDB interface โ”‚ โ”œโ”€โ”€ process_documents.py # Document processing โ”‚ โ””โ”€โ”€ README.md โ”‚ โ”œโ”€โ”€ module_b/ # Bias Detection โ”‚ โ”œโ”€โ”€ inference.py # Model inference โ”‚ โ”œโ”€โ”€ fine_tuning/ # Training scripts โ”‚ โ””โ”€โ”€ dataset/ # Training data โ”‚ โ”œโ”€โ”€ module_c/ # Letter Generation โ”‚ โ”œโ”€โ”€ interface.py # Main API โ”‚ โ”œโ”€โ”€ retriever.py # Template retrieval โ”‚ โ”œโ”€โ”€ generator.py # Letter generation โ”‚ โ”œโ”€โ”€ indexer.py # Vector DB indexing โ”‚ โ””โ”€โ”€ README.md โ”‚ โ”œโ”€โ”€ utility/ # PDF Processing โ”‚ โ”œโ”€โ”€ pdf_processor.py # PDF extraction โ”‚ โ””โ”€โ”€ README.md โ”‚ โ”œโ”€โ”€ Frontend/ # Next.js application โ”‚ โ”œโ”€โ”€ app/ โ”‚ โ”‚ โ”œโ”€โ”€ chatbot/ # Module A UI โ”‚ โ”‚ โ”œโ”€โ”€ letter-generator/ # Module C UI โ”‚ โ”‚ โ”œโ”€โ”€ bias-checker/ # Module B UI โ”‚ โ”‚ โ”œโ”€โ”€ dashboard/ # Main dashboard โ”‚ โ”‚ โ””โ”€โ”€ login/ # Authentication pages โ”‚ โ””โ”€โ”€ components/ # Reusable components โ”‚ โ””โ”€โ”€ data/ # Data storage โ”œโ”€โ”€ module-A/ # Law documents & vector DB โ”œโ”€โ”€ module-C/ # Letter templates & vector DB โ””โ”€โ”€ module-B/ # Bias detection datasets ``` ## ๐Ÿ”Œ API Endpoints ### Authentication - `POST /api/v1/signup` - Register a new user - `POST /api/v1/login` - User login - `GET /api/v1/me` - Get current user profile - `POST /api/v1/refresh` - Refresh access token ### Law Explanation (Module A) - `POST /api/v1/law-explanation/explain` - Ask legal questions (basic) - `POST /api/v1/law-explanation/chat` - Context-aware chat with conversation history - `GET /api/v1/law-explanation/sources` - Get source documents only ### Chat History - `POST /api/v1/chat-history/conversations` - Create a new conversation - `GET /api/v1/chat-history/conversations` - List all user conversations - `GET /api/v1/chat-history/conversations/{id}` - Get specific conversation with messages - `DELETE /api/v1/chat-history/conversations/{id}` - Delete a conversation - `POST /api/v1/chat-history/messages` - Save a message to conversation ### Letter Generation (Module C) - `POST /api/v1/search-template` - Search for letter templates - `POST /api/v1/get-template-details` - Get template requirements - `POST /api/v1/fill-template` - Fill template with user data - `POST /api/v1/generate-letter` - Generate complete letter (smart generation) - `POST /api/v1/analyze-requirements` - Analyze missing fields in template ### Bias Detection (Module B) - `POST /api/v1/detect-bias` - Detect bias in text - `POST /api/v1/detect-bias/batch` - Batch bias detection - `POST /api/v1/debias-sentence` - Get debiased alternatives - `POST /api/v1/debias-sentence/batch` - Batch debiasing - `GET /api/v1/health` - Health check ### Bias Detection HITL (Human-in-the-Loop) - `POST /api/v1/bias-detection-hitl/detect` - Detect bias with HITL workflow - `POST /api/v1/bias-detection-hitl/approve` - Approve bias detection results - `POST /api/v1/bias-detection-hitl/regenerate` - Regenerate debiased suggestions - `POST /api/v1/bias-detection-hitl/generate-pdf` - Generate PDF report ### PDF Processing (Utility) - `POST /api/v1/process-pdf` - Extract text from PDF - `POST /api/v1/process-pdf-to-bias` - Extract PDF and detect bias - `GET /api/v1/pdf-health` - Health check ### System - `GET /` - API welcome message - `GET /health` - System health check Full API documentation: `http://localhost:8000/docs` (when server is running) ## ๐ŸŽจ Frontend Features - **Dashboard**: Overview of all modules - **Chatbot**: Interactive law explanation interface - **Letter Generator**: Step-by-step letter creation wizard - **Bias Checker**: Upload documents or paste text for analysis - **User Profile**: User account management - **Responsive Design**: Works on desktop and mobile ## ๐Ÿงช Testing ### Test Module A (Law Explanation) ```bash python -m module_a.test_rag ``` ### Test Module C (Letter Generation) ```bash python -m module_c.test_generation python -m module_c.test_interactive ``` ### Test PDF Processing ```bash python -m utility.test_pdf_processor ``` ### Test API Endpoints ```bash python -m api.test_api ``` ## ๐Ÿ“ Configuration ### Environment Variables (.env) ```bash # Required MISTRAL_API_KEY=your_api_key_here # Optional - MongoDB (if using Auth Backend) # MONGODB_URL=mongodb://localhost:27017 # SECRET_KEY=your_secret_key ``` ### Module Configurations - **Module A**: [module_a/config.py](module_a/config.py) - **Module C**: [module_c/config.py](module_c/config.py) ## ๐Ÿ› Troubleshooting ### Backend Issues - **Import errors**: Make sure virtual environment is activated - **Vector DB empty**: Run the build scripts for modules A & C - **API key errors**: Check `.env` file has valid `MISTRAL_API_KEY` ### Frontend Issues - **Port 3000 in use**: Change port with `pnpm dev -- -p 3001` - **Module not found**: Run `pnpm install` in Frontend directory - **API connection failed**: Ensure backend is running on port 8000 ### Common Errors ```bash # Reinstall dependencies pip install --upgrade -r requirements.txt # Rebuild vector databases python -m module_a.build_vector_db python -m module_c.indexer # Clear pnpm cache cd Frontend pnpm store prune pnpm install ``` ## ๐Ÿ“š Documentation - [Module A Documentation](module_a/README.md) - Law Explanation RAG Pipeline - [Module C Documentation](module_c/README.md) - Letter Generation - [PDF Processing Guide](utility/README.md) - PDF text extraction - [Implementation Guides](docs/) - Detailed implementation workflows --- > This project is under development as part of a hackathon.