Spaces:

khagu
/

setu

Running

App Files Files Community

setu / module_a /README.md

khagu

chore: finally untrack large database files

3998131 30 days ago

preview code

raw

history blame contribute delete

11.1 kB

Setu 🇳🇵

An AI-powered platform for legal assistance in Nepal - making legal documents accessible, generating official letters, and detecting bias in legal text.

🎯 Project Overview

Setu is a comprehensive legal assistance platform that leverages AI/ML to help Nepali citizens interact with legal documents and government processes. The system consists of three main modules integrated with a modern web interface.

🎥 Demo Video

Watch the platform in action: View Demo Video

🚀 Features

Module A: Law Explanation (RAG-Based Chatbot)

Intelligent Q&A: Ask questions about Nepali laws in natural language (English/Nepali)
Retrieval-Augmented Generation: Retrieves relevant legal text and generates accurate explanations
Source References: Provides exact article/section references
Vector Database: ChromaDB with semantic search capabilities

Module B: Multi-Category Bias Detection

10+ Bias Categories: Detects gender, caste, religion, age, disability, appearance, social status, political, and ambiguity biases
Fine-tuned DistilBERT: Custom model trained on Nepali legal texts
Sentence Analysis: Analyzes individual sentences or batch processing
Debiasing Suggestions: Provides bias-free alternatives for detected biases
Confidence Scoring: Returns confidence scores for each detection

Module C: Letter Generation

Template-Based Generation: RAG-based intelligent template selection
Natural Language Input: Describe your need, get the right letter
Smart Field Extraction: Automatically extracts name, date, district, etc.
Official Formats: Generates proper Nepali government letter formats

Utility: PDF Processing

Text Extraction: Extract text from legal PDFs (English & Nepali)
Multi-method Support: PyMuPDF, pdfplumber with intelligent fallback
OCR Ready: Handles scanned documents
Integrated Pipeline: Direct integration with bias detection

🛠️ Tech Stack

Backend:

FastAPI (Python) - RESTful API
ChromaDB - Vector database for embeddings
Mistral AI - LLM for generation
Sentence Transformers - Embeddings
PyMuPDF, PDFPlumber - PDF processing

Frontend:

Next.js 16 - React framework
TypeScript - Type safety
Tailwind CSS - Styling
Radix UI - Component library
shadcn/ui - UI components

ML/AI:

Hugging Face Transformers
Sentence Transformers
Custom fine-tuned models (Module B)

📋 Prerequisites

Python: 3.9+ (recommended: 3.13)
Node.js: 18+ with pnpm
API Keys: Mistral AI API key
System: Linux/macOS/Windows

⚙️ Installation

1. Clone the Repository

git clone https://github.com/KhagendraN/Setu.git
cd Setu

2. Backend Setup

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Create .env file in the project root:

MISTRAL_API_KEY=your_mistral_api_key_here

3. Build Vector Databases

Module A (Law Explanation):

# Place your legal PDFs in data/module-A/law/
python -m module_a.process_documents
python -m module_a.build_vector_db

Module C (Letter Generation):

# Templates are already in data/module-C/
python -m module_c.indexer

4. Frontend Setup

cd Frontend
pnpm install
cd ..

🚀 Running the Application

You need TWO terminals to run the full application:

Terminal 1: Backend API

# Activate virtual environment
source venv/bin/activate

# Start the API server
uvicorn api.main:app --reload --port 8000

Backend will run at: http://localhost:8000 API docs available at: http://localhost:8000/docs

Terminal 2: Frontend

cd Frontend
pnpm dev

Frontend will run at: http://localhost:3000

🐳 Docker Usage (Recommended)

The easiest way to run the entire platform is using Docker Compose.

1. Prerequisites

Docker and Docker Compose installed
.env file with MISTRAL_API_KEY in the root directory

2. Run with Docker Compose

docker-compose up --build

This will:

Build and start the Backend API (port 8000)
Build and start the Frontend (port 3000)
Automatically run the vector database build scripts

The application will be available at http://localhost:3000.

📁 Project Structure

Setu/
├── api/                          # Main API endpoints
│   ├── main.py                   # FastAPI application
│   ├── routes/
│   │   ├── law_explanation.py    # Module A endpoints
│   │   ├── letter_generation.py  # Module C endpoints
│   │   ├── bias_detection.py     # Module B endpoints
│   │   └── pdf_processing.py     # PDF utility endpoints
│   └── schemas.py                # Pydantic models
│
├── module_a/                     # Law Explanation (RAG)
│   ├── rag_chain.py             # RAG pipeline
│   ├── vector_db.py             # ChromaDB interface
│   ├── process_documents.py     # Document processing
│   └── README.md
│
├── module_b/                     # Bias Detection
│   ├── inference.py             # Model inference
│   ├── fine_tuning/             # Training scripts
│   └── dataset/                 # Training data
│
├── module_c/                     # Letter Generation
│   ├── interface.py             # Main API
│   ├── retriever.py             # Template retrieval
│   ├── generator.py             # Letter generation
│   ├── indexer.py               # Vector DB indexing
│   └── README.md
│
├── utility/                      # PDF Processing
│   ├── pdf_processor.py         # PDF extraction
│   └── README.md
│
├── Frontend/                     # Next.js application
│   ├── app/
│   │   ├── chatbot/             # Module A UI
│   │   ├── letter-generator/    # Module C UI
│   │   ├── bias-checker/        # Module B UI
│   │   ├── dashboard/           # Main dashboard
│   │   └── login/               # Authentication pages
│   └── components/              # Reusable components
│
└── data/                        # Data storage
    ├── module-A/                # Law documents & vector DB
    ├── module-C/                # Letter templates & vector DB
    └── module-B/                # Bias detection datasets

🔌 API Endpoints

Authentication

POST /api/v1/signup - Register a new user
POST /api/v1/login - User login
GET /api/v1/me - Get current user profile
POST /api/v1/refresh - Refresh access token

Law Explanation (Module A)

POST /api/v1/law-explanation/explain - Ask legal questions (basic)
POST /api/v1/law-explanation/chat - Context-aware chat with conversation history
GET /api/v1/law-explanation/sources - Get source documents only

Chat History

POST /api/v1/chat-history/conversations - Create a new conversation
GET /api/v1/chat-history/conversations - List all user conversations
GET /api/v1/chat-history/conversations/{id} - Get specific conversation with messages
DELETE /api/v1/chat-history/conversations/{id} - Delete a conversation
POST /api/v1/chat-history/messages - Save a message to conversation

Letter Generation (Module C)

POST /api/v1/search-template - Search for letter templates
POST /api/v1/get-template-details - Get template requirements
POST /api/v1/fill-template - Fill template with user data
POST /api/v1/generate-letter - Generate complete letter (smart generation)
POST /api/v1/analyze-requirements - Analyze missing fields in template

Bias Detection (Module B)

POST /api/v1/detect-bias - Detect bias in text
POST /api/v1/detect-bias/batch - Batch bias detection
POST /api/v1/debias-sentence - Get debiased alternatives
POST /api/v1/debias-sentence/batch - Batch debiasing
GET /api/v1/health - Health check

Bias Detection HITL (Human-in-the-Loop)

POST /api/v1/bias-detection-hitl/detect - Detect bias with HITL workflow
POST /api/v1/bias-detection-hitl/approve - Approve bias detection results
POST /api/v1/bias-detection-hitl/regenerate - Regenerate debiased suggestions
POST /api/v1/bias-detection-hitl/generate-pdf - Generate PDF report

PDF Processing (Utility)

POST /api/v1/process-pdf - Extract text from PDF
POST /api/v1/process-pdf-to-bias - Extract PDF and detect bias
GET /api/v1/pdf-health - Health check

System

GET / - API welcome message
GET /health - System health check

Full API documentation: http://localhost:8000/docs (when server is running)

🎨 Frontend Features

Dashboard: Overview of all modules
Chatbot: Interactive law explanation interface
Letter Generator: Step-by-step letter creation wizard
Bias Checker: Upload documents or paste text for analysis
User Profile: User account management
Responsive Design: Works on desktop and mobile

🧪 Testing

Test Module A (Law Explanation)

python -m module_a.test_rag

Test Module C (Letter Generation)

python -m module_c.test_generation
python -m module_c.test_interactive

Test PDF Processing

python -m utility.test_pdf_processor

Test API Endpoints

python -m api.test_api

📝 Configuration

Environment Variables (.env)

# Required
MISTRAL_API_KEY=your_api_key_here

# Optional - MongoDB (if using Auth Backend)
# MONGODB_URL=mongodb://localhost:27017
# SECRET_KEY=your_secret_key

Module Configurations

Module A: module_a/config.py
Module C: module_c/config.py

🐛 Troubleshooting

Backend Issues

Import errors: Make sure virtual environment is activated
Vector DB empty: Run the build scripts for modules A & C
API key errors: Check .env file has valid MISTRAL_API_KEY

Frontend Issues

Port 3000 in use: Change port with pnpm dev -- -p 3001
Module not found: Run pnpm install in Frontend directory
API connection failed: Ensure backend is running on port 8000

Common Errors

# Reinstall dependencies
pip install --upgrade -r requirements.txt

# Rebuild vector databases
python -m module_a.build_vector_db
python -m module_c.indexer

# Clear pnpm cache
cd Frontend
pnpm store prune
pnpm install

📚 Documentation

Module A Documentation - Law Explanation RAG Pipeline
Module C Documentation - Letter Generation
PDF Processing Guide - PDF text extraction
Implementation Guides - Detailed implementation workflows

This project is under development as part of a hackathon.