setu / module_a /README.md
khagu's picture
chore: finally untrack large database files
3998131
# Setu πŸ‡³πŸ‡΅
**An AI-powered platform for legal assistance in Nepal** - making legal documents accessible, generating official letters, and detecting bias in legal text.
## 🎯 Project Overview
Setu is a comprehensive legal assistance platform that leverages AI/ML to help Nepali citizens interact with legal documents and government processes. The system consists of three main modules integrated with a modern web interface.
## πŸŽ₯ Demo Video
Watch the platform in action: [View Demo Video](https://drive.google.com/file/d/12j2J-_g7SHdcQTwU3hQU_uiWldB2RFUz/view?usp=drive_link)
## πŸš€ Features
### Module A: Law Explanation (RAG-Based Chatbot)
- **Intelligent Q&A**: Ask questions about Nepali laws in natural language (English/Nepali)
- **Retrieval-Augmented Generation**: Retrieves relevant legal text and generates accurate explanations
- **Source References**: Provides exact article/section references
- **Vector Database**: ChromaDB with semantic search capabilities
### Module B: Multi-Category Bias Detection
- **10+ Bias Categories**: Detects gender, caste, religion, age, disability, appearance, social status, political, and ambiguity biases
- **Fine-tuned DistilBERT**: Custom model trained on Nepali legal texts
- **Sentence Analysis**: Analyzes individual sentences or batch processing
- **Debiasing Suggestions**: Provides bias-free alternatives for detected biases
- **Confidence Scoring**: Returns confidence scores for each detection
### Module C: Letter Generation
- **Template-Based Generation**: RAG-based intelligent template selection
- **Natural Language Input**: Describe your need, get the right letter
- **Smart Field Extraction**: Automatically extracts name, date, district, etc.
- **Official Formats**: Generates proper Nepali government letter formats
### Utility: PDF Processing
- **Text Extraction**: Extract text from legal PDFs (English & Nepali)
- **Multi-method Support**: PyMuPDF, pdfplumber with intelligent fallback
- **OCR Ready**: Handles scanned documents
- **Integrated Pipeline**: Direct integration with bias detection
## πŸ› οΈ Tech Stack
**Backend:**
- FastAPI (Python) - RESTful API
- ChromaDB - Vector database for embeddings
- Mistral AI - LLM for generation
- Sentence Transformers - Embeddings
- PyMuPDF, PDFPlumber - PDF processing
**Frontend:**
- Next.js 16 - React framework
- TypeScript - Type safety
- Tailwind CSS - Styling
- Radix UI - Component library
- shadcn/ui - UI components
**ML/AI:**
- Hugging Face Transformers
- Sentence Transformers
- Custom fine-tuned models (Module B)
## πŸ“‹ Prerequisites
- **Python**: 3.9+ (recommended: 3.13)
- **Node.js**: 18+ with pnpm
- **API Keys**: Mistral AI API key
- **System**: Linux/macOS/Windows
## βš™οΈ Installation
### 1. Clone the Repository
```bash
git clone https://github.com/KhagendraN/Setu.git
cd Setu
```
### 2. Backend Setup
Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
Install dependencies:
```bash
pip install -r requirements.txt
```
Create `.env` file in the project root:
```bash
MISTRAL_API_KEY=your_mistral_api_key_here
```
### 3. Build Vector Databases
**Module A (Law Explanation):**
```bash
# Place your legal PDFs in data/module-A/law/
python -m module_a.process_documents
python -m module_a.build_vector_db
```
**Module C (Letter Generation):**
```bash
# Templates are already in data/module-C/
python -m module_c.indexer
```
### 4. Frontend Setup
```bash
cd Frontend
pnpm install
cd ..
```
## πŸš€ Running the Application
You need **TWO terminals** to run the full application:
### Terminal 1: Backend API
```bash
# Activate virtual environment
source venv/bin/activate
# Start the API server
uvicorn api.main:app --reload --port 8000
```
Backend will run at: `http://localhost:8000`
API docs available at: `http://localhost:8000/docs`
### Terminal 2: Frontend
```bash
cd Frontend
pnpm dev
```
Frontend will run at: `http://localhost:3000`
## 🐳 Docker Usage (Recommended)
The easiest way to run the entire platform is using Docker Compose.
### 1. Prerequisites
- Docker and Docker Compose installed
- `.env` file with `MISTRAL_API_KEY` in the root directory
### 2. Run with Docker Compose
```bash
docker-compose up --build
```
This will:
- Build and start the Backend API (port 8000)
- Build and start the Frontend (port 3000)
- Automatically run the vector database build scripts
The application will be available at `http://localhost:3000`.
## πŸ“ Project Structure
```
Setu/
β”œβ”€β”€ api/ # Main API endpoints
β”‚ β”œβ”€β”€ main.py # FastAPI application
β”‚ β”œβ”€β”€ routes/
β”‚ β”‚ β”œβ”€β”€ law_explanation.py # Module A endpoints
β”‚ β”‚ β”œβ”€β”€ letter_generation.py # Module C endpoints
β”‚ β”‚ β”œβ”€β”€ bias_detection.py # Module B endpoints
β”‚ β”‚ └── pdf_processing.py # PDF utility endpoints
β”‚ └── schemas.py # Pydantic models
β”‚
β”œβ”€β”€ module_a/ # Law Explanation (RAG)
β”‚ β”œβ”€β”€ rag_chain.py # RAG pipeline
β”‚ β”œβ”€β”€ vector_db.py # ChromaDB interface
β”‚ β”œβ”€β”€ process_documents.py # Document processing
β”‚ └── README.md
β”‚
β”œβ”€β”€ module_b/ # Bias Detection
β”‚ β”œβ”€β”€ inference.py # Model inference
β”‚ β”œβ”€β”€ fine_tuning/ # Training scripts
β”‚ └── dataset/ # Training data
β”‚
β”œβ”€β”€ module_c/ # Letter Generation
β”‚ β”œβ”€β”€ interface.py # Main API
β”‚ β”œβ”€β”€ retriever.py # Template retrieval
β”‚ β”œβ”€β”€ generator.py # Letter generation
β”‚ β”œβ”€β”€ indexer.py # Vector DB indexing
β”‚ └── README.md
β”‚
β”œβ”€β”€ utility/ # PDF Processing
β”‚ β”œβ”€β”€ pdf_processor.py # PDF extraction
β”‚ └── README.md
β”‚
β”œβ”€β”€ Frontend/ # Next.js application
β”‚ β”œβ”€β”€ app/
β”‚ β”‚ β”œβ”€β”€ chatbot/ # Module A UI
β”‚ β”‚ β”œβ”€β”€ letter-generator/ # Module C UI
β”‚ β”‚ β”œβ”€β”€ bias-checker/ # Module B UI
β”‚ β”‚ β”œβ”€β”€ dashboard/ # Main dashboard
β”‚ β”‚ └── login/ # Authentication pages
β”‚ └── components/ # Reusable components
β”‚
└── data/ # Data storage
β”œβ”€β”€ module-A/ # Law documents & vector DB
β”œβ”€β”€ module-C/ # Letter templates & vector DB
└── module-B/ # Bias detection datasets
```
## πŸ”Œ API Endpoints
### Authentication
- `POST /api/v1/signup` - Register a new user
- `POST /api/v1/login` - User login
- `GET /api/v1/me` - Get current user profile
- `POST /api/v1/refresh` - Refresh access token
### Law Explanation (Module A)
- `POST /api/v1/law-explanation/explain` - Ask legal questions (basic)
- `POST /api/v1/law-explanation/chat` - Context-aware chat with conversation history
- `GET /api/v1/law-explanation/sources` - Get source documents only
### Chat History
- `POST /api/v1/chat-history/conversations` - Create a new conversation
- `GET /api/v1/chat-history/conversations` - List all user conversations
- `GET /api/v1/chat-history/conversations/{id}` - Get specific conversation with messages
- `DELETE /api/v1/chat-history/conversations/{id}` - Delete a conversation
- `POST /api/v1/chat-history/messages` - Save a message to conversation
### Letter Generation (Module C)
- `POST /api/v1/search-template` - Search for letter templates
- `POST /api/v1/get-template-details` - Get template requirements
- `POST /api/v1/fill-template` - Fill template with user data
- `POST /api/v1/generate-letter` - Generate complete letter (smart generation)
- `POST /api/v1/analyze-requirements` - Analyze missing fields in template
### Bias Detection (Module B)
- `POST /api/v1/detect-bias` - Detect bias in text
- `POST /api/v1/detect-bias/batch` - Batch bias detection
- `POST /api/v1/debias-sentence` - Get debiased alternatives
- `POST /api/v1/debias-sentence/batch` - Batch debiasing
- `GET /api/v1/health` - Health check
### Bias Detection HITL (Human-in-the-Loop)
- `POST /api/v1/bias-detection-hitl/detect` - Detect bias with HITL workflow
- `POST /api/v1/bias-detection-hitl/approve` - Approve bias detection results
- `POST /api/v1/bias-detection-hitl/regenerate` - Regenerate debiased suggestions
- `POST /api/v1/bias-detection-hitl/generate-pdf` - Generate PDF report
### PDF Processing (Utility)
- `POST /api/v1/process-pdf` - Extract text from PDF
- `POST /api/v1/process-pdf-to-bias` - Extract PDF and detect bias
- `GET /api/v1/pdf-health` - Health check
### System
- `GET /` - API welcome message
- `GET /health` - System health check
Full API documentation: `http://localhost:8000/docs` (when server is running)
## 🎨 Frontend Features
- **Dashboard**: Overview of all modules
- **Chatbot**: Interactive law explanation interface
- **Letter Generator**: Step-by-step letter creation wizard
- **Bias Checker**: Upload documents or paste text for analysis
- **User Profile**: User account management
- **Responsive Design**: Works on desktop and mobile
## πŸ§ͺ Testing
### Test Module A (Law Explanation)
```bash
python -m module_a.test_rag
```
### Test Module C (Letter Generation)
```bash
python -m module_c.test_generation
python -m module_c.test_interactive
```
### Test PDF Processing
```bash
python -m utility.test_pdf_processor
```
### Test API Endpoints
```bash
python -m api.test_api
```
## πŸ“ Configuration
### Environment Variables (.env)
```bash
# Required
MISTRAL_API_KEY=your_api_key_here
# Optional - MongoDB (if using Auth Backend)
# MONGODB_URL=mongodb://localhost:27017
# SECRET_KEY=your_secret_key
```
### Module Configurations
- **Module A**: [module_a/config.py](module_a/config.py)
- **Module C**: [module_c/config.py](module_c/config.py)
## πŸ› Troubleshooting
### Backend Issues
- **Import errors**: Make sure virtual environment is activated
- **Vector DB empty**: Run the build scripts for modules A & C
- **API key errors**: Check `.env` file has valid `MISTRAL_API_KEY`
### Frontend Issues
- **Port 3000 in use**: Change port with `pnpm dev -- -p 3001`
- **Module not found**: Run `pnpm install` in Frontend directory
- **API connection failed**: Ensure backend is running on port 8000
### Common Errors
```bash
# Reinstall dependencies
pip install --upgrade -r requirements.txt
# Rebuild vector databases
python -m module_a.build_vector_db
python -m module_c.indexer
# Clear pnpm cache
cd Frontend
pnpm store prune
pnpm install
```
## πŸ“š Documentation
- [Module A Documentation](module_a/README.md) - Law Explanation RAG Pipeline
- [Module C Documentation](module_c/README.md) - Letter Generation
- [PDF Processing Guide](utility/README.md) - PDF text extraction
- [Implementation Guides](docs/) - Detailed implementation workflows
---
> This project is under development as part of a hackathon.