setu / module_a /README.md
khagu's picture
chore: finally untrack large database files
3998131

Setu πŸ‡³πŸ‡΅

An AI-powered platform for legal assistance in Nepal - making legal documents accessible, generating official letters, and detecting bias in legal text.

🎯 Project Overview

Setu is a comprehensive legal assistance platform that leverages AI/ML to help Nepali citizens interact with legal documents and government processes. The system consists of three main modules integrated with a modern web interface.

πŸŽ₯ Demo Video

Watch the platform in action: View Demo Video

πŸš€ Features

Module A: Law Explanation (RAG-Based Chatbot)

  • Intelligent Q&A: Ask questions about Nepali laws in natural language (English/Nepali)
  • Retrieval-Augmented Generation: Retrieves relevant legal text and generates accurate explanations
  • Source References: Provides exact article/section references
  • Vector Database: ChromaDB with semantic search capabilities

Module B: Multi-Category Bias Detection

  • 10+ Bias Categories: Detects gender, caste, religion, age, disability, appearance, social status, political, and ambiguity biases
  • Fine-tuned DistilBERT: Custom model trained on Nepali legal texts
  • Sentence Analysis: Analyzes individual sentences or batch processing
  • Debiasing Suggestions: Provides bias-free alternatives for detected biases
  • Confidence Scoring: Returns confidence scores for each detection

Module C: Letter Generation

  • Template-Based Generation: RAG-based intelligent template selection
  • Natural Language Input: Describe your need, get the right letter
  • Smart Field Extraction: Automatically extracts name, date, district, etc.
  • Official Formats: Generates proper Nepali government letter formats

Utility: PDF Processing

  • Text Extraction: Extract text from legal PDFs (English & Nepali)
  • Multi-method Support: PyMuPDF, pdfplumber with intelligent fallback
  • OCR Ready: Handles scanned documents
  • Integrated Pipeline: Direct integration with bias detection

πŸ› οΈ Tech Stack

Backend:

  • FastAPI (Python) - RESTful API
  • ChromaDB - Vector database for embeddings
  • Mistral AI - LLM for generation
  • Sentence Transformers - Embeddings
  • PyMuPDF, PDFPlumber - PDF processing

Frontend:

  • Next.js 16 - React framework
  • TypeScript - Type safety
  • Tailwind CSS - Styling
  • Radix UI - Component library
  • shadcn/ui - UI components

ML/AI:

  • Hugging Face Transformers
  • Sentence Transformers
  • Custom fine-tuned models (Module B)

πŸ“‹ Prerequisites

  • Python: 3.9+ (recommended: 3.13)
  • Node.js: 18+ with pnpm
  • API Keys: Mistral AI API key
  • System: Linux/macOS/Windows

βš™οΈ Installation

1. Clone the Repository

git clone https://github.com/KhagendraN/Setu.git
cd Setu

2. Backend Setup

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Create .env file in the project root:

MISTRAL_API_KEY=your_mistral_api_key_here

3. Build Vector Databases

Module A (Law Explanation):

# Place your legal PDFs in data/module-A/law/
python -m module_a.process_documents
python -m module_a.build_vector_db

Module C (Letter Generation):

# Templates are already in data/module-C/
python -m module_c.indexer

4. Frontend Setup

cd Frontend
pnpm install
cd ..

πŸš€ Running the Application

You need TWO terminals to run the full application:

Terminal 1: Backend API

# Activate virtual environment
source venv/bin/activate

# Start the API server
uvicorn api.main:app --reload --port 8000

Backend will run at: http://localhost:8000 API docs available at: http://localhost:8000/docs

Terminal 2: Frontend

cd Frontend
pnpm dev

Frontend will run at: http://localhost:3000

🐳 Docker Usage (Recommended)

The easiest way to run the entire platform is using Docker Compose.

1. Prerequisites

  • Docker and Docker Compose installed
  • .env file with MISTRAL_API_KEY in the root directory

2. Run with Docker Compose

docker-compose up --build

This will:

  • Build and start the Backend API (port 8000)
  • Build and start the Frontend (port 3000)
  • Automatically run the vector database build scripts

The application will be available at http://localhost:3000.

πŸ“ Project Structure

Setu/
β”œβ”€β”€ api/                          # Main API endpoints
β”‚   β”œβ”€β”€ main.py                   # FastAPI application
β”‚   β”œβ”€β”€ routes/
β”‚   β”‚   β”œβ”€β”€ law_explanation.py    # Module A endpoints
β”‚   β”‚   β”œβ”€β”€ letter_generation.py  # Module C endpoints
β”‚   β”‚   β”œβ”€β”€ bias_detection.py     # Module B endpoints
β”‚   β”‚   └── pdf_processing.py     # PDF utility endpoints
β”‚   └── schemas.py                # Pydantic models
β”‚
β”œβ”€β”€ module_a/                     # Law Explanation (RAG)
β”‚   β”œβ”€β”€ rag_chain.py             # RAG pipeline
β”‚   β”œβ”€β”€ vector_db.py             # ChromaDB interface
β”‚   β”œβ”€β”€ process_documents.py     # Document processing
β”‚   └── README.md
β”‚
β”œβ”€β”€ module_b/                     # Bias Detection
β”‚   β”œβ”€β”€ inference.py             # Model inference
β”‚   β”œβ”€β”€ fine_tuning/             # Training scripts
β”‚   └── dataset/                 # Training data
β”‚
β”œβ”€β”€ module_c/                     # Letter Generation
β”‚   β”œβ”€β”€ interface.py             # Main API
β”‚   β”œβ”€β”€ retriever.py             # Template retrieval
β”‚   β”œβ”€β”€ generator.py             # Letter generation
β”‚   β”œβ”€β”€ indexer.py               # Vector DB indexing
β”‚   └── README.md
β”‚
β”œβ”€β”€ utility/                      # PDF Processing
β”‚   β”œβ”€β”€ pdf_processor.py         # PDF extraction
β”‚   └── README.md
β”‚
β”œβ”€β”€ Frontend/                     # Next.js application
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ chatbot/             # Module A UI
β”‚   β”‚   β”œβ”€β”€ letter-generator/    # Module C UI
β”‚   β”‚   β”œβ”€β”€ bias-checker/        # Module B UI
β”‚   β”‚   β”œβ”€β”€ dashboard/           # Main dashboard
β”‚   β”‚   └── login/               # Authentication pages
β”‚   └── components/              # Reusable components
β”‚
└── data/                        # Data storage
    β”œβ”€β”€ module-A/                # Law documents & vector DB
    β”œβ”€β”€ module-C/                # Letter templates & vector DB
    └── module-B/                # Bias detection datasets

πŸ”Œ API Endpoints

Authentication

  • POST /api/v1/signup - Register a new user
  • POST /api/v1/login - User login
  • GET /api/v1/me - Get current user profile
  • POST /api/v1/refresh - Refresh access token

Law Explanation (Module A)

  • POST /api/v1/law-explanation/explain - Ask legal questions (basic)
  • POST /api/v1/law-explanation/chat - Context-aware chat with conversation history
  • GET /api/v1/law-explanation/sources - Get source documents only

Chat History

  • POST /api/v1/chat-history/conversations - Create a new conversation
  • GET /api/v1/chat-history/conversations - List all user conversations
  • GET /api/v1/chat-history/conversations/{id} - Get specific conversation with messages
  • DELETE /api/v1/chat-history/conversations/{id} - Delete a conversation
  • POST /api/v1/chat-history/messages - Save a message to conversation

Letter Generation (Module C)

  • POST /api/v1/search-template - Search for letter templates
  • POST /api/v1/get-template-details - Get template requirements
  • POST /api/v1/fill-template - Fill template with user data
  • POST /api/v1/generate-letter - Generate complete letter (smart generation)
  • POST /api/v1/analyze-requirements - Analyze missing fields in template

Bias Detection (Module B)

  • POST /api/v1/detect-bias - Detect bias in text
  • POST /api/v1/detect-bias/batch - Batch bias detection
  • POST /api/v1/debias-sentence - Get debiased alternatives
  • POST /api/v1/debias-sentence/batch - Batch debiasing
  • GET /api/v1/health - Health check

Bias Detection HITL (Human-in-the-Loop)

  • POST /api/v1/bias-detection-hitl/detect - Detect bias with HITL workflow
  • POST /api/v1/bias-detection-hitl/approve - Approve bias detection results
  • POST /api/v1/bias-detection-hitl/regenerate - Regenerate debiased suggestions
  • POST /api/v1/bias-detection-hitl/generate-pdf - Generate PDF report

PDF Processing (Utility)

  • POST /api/v1/process-pdf - Extract text from PDF
  • POST /api/v1/process-pdf-to-bias - Extract PDF and detect bias
  • GET /api/v1/pdf-health - Health check

System

  • GET / - API welcome message
  • GET /health - System health check

Full API documentation: http://localhost:8000/docs (when server is running)

🎨 Frontend Features

  • Dashboard: Overview of all modules
  • Chatbot: Interactive law explanation interface
  • Letter Generator: Step-by-step letter creation wizard
  • Bias Checker: Upload documents or paste text for analysis
  • User Profile: User account management
  • Responsive Design: Works on desktop and mobile

πŸ§ͺ Testing

Test Module A (Law Explanation)

python -m module_a.test_rag

Test Module C (Letter Generation)

python -m module_c.test_generation
python -m module_c.test_interactive

Test PDF Processing

python -m utility.test_pdf_processor

Test API Endpoints

python -m api.test_api

πŸ“ Configuration

Environment Variables (.env)

# Required
MISTRAL_API_KEY=your_api_key_here

# Optional - MongoDB (if using Auth Backend)
# MONGODB_URL=mongodb://localhost:27017
# SECRET_KEY=your_secret_key

Module Configurations

πŸ› Troubleshooting

Backend Issues

  • Import errors: Make sure virtual environment is activated
  • Vector DB empty: Run the build scripts for modules A & C
  • API key errors: Check .env file has valid MISTRAL_API_KEY

Frontend Issues

  • Port 3000 in use: Change port with pnpm dev -- -p 3001
  • Module not found: Run pnpm install in Frontend directory
  • API connection failed: Ensure backend is running on port 8000

Common Errors

# Reinstall dependencies
pip install --upgrade -r requirements.txt

# Rebuild vector databases
python -m module_a.build_vector_db
python -m module_c.indexer

# Clear pnpm cache
cd Frontend
pnpm store prune
pnpm install

πŸ“š Documentation


This project is under development as part of a hackathon.