File size: 11,104 Bytes
3998131 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 |
# Setu π³π΅
**An AI-powered platform for legal assistance in Nepal** - making legal documents accessible, generating official letters, and detecting bias in legal text.
## π― Project Overview
Setu is a comprehensive legal assistance platform that leverages AI/ML to help Nepali citizens interact with legal documents and government processes. The system consists of three main modules integrated with a modern web interface.
## π₯ Demo Video
Watch the platform in action: [View Demo Video](https://drive.google.com/file/d/12j2J-_g7SHdcQTwU3hQU_uiWldB2RFUz/view?usp=drive_link)
## π Features
### Module A: Law Explanation (RAG-Based Chatbot)
- **Intelligent Q&A**: Ask questions about Nepali laws in natural language (English/Nepali)
- **Retrieval-Augmented Generation**: Retrieves relevant legal text and generates accurate explanations
- **Source References**: Provides exact article/section references
- **Vector Database**: ChromaDB with semantic search capabilities
### Module B: Multi-Category Bias Detection
- **10+ Bias Categories**: Detects gender, caste, religion, age, disability, appearance, social status, political, and ambiguity biases
- **Fine-tuned DistilBERT**: Custom model trained on Nepali legal texts
- **Sentence Analysis**: Analyzes individual sentences or batch processing
- **Debiasing Suggestions**: Provides bias-free alternatives for detected biases
- **Confidence Scoring**: Returns confidence scores for each detection
### Module C: Letter Generation
- **Template-Based Generation**: RAG-based intelligent template selection
- **Natural Language Input**: Describe your need, get the right letter
- **Smart Field Extraction**: Automatically extracts name, date, district, etc.
- **Official Formats**: Generates proper Nepali government letter formats
### Utility: PDF Processing
- **Text Extraction**: Extract text from legal PDFs (English & Nepali)
- **Multi-method Support**: PyMuPDF, pdfplumber with intelligent fallback
- **OCR Ready**: Handles scanned documents
- **Integrated Pipeline**: Direct integration with bias detection
## π οΈ Tech Stack
**Backend:**
- FastAPI (Python) - RESTful API
- ChromaDB - Vector database for embeddings
- Mistral AI - LLM for generation
- Sentence Transformers - Embeddings
- PyMuPDF, PDFPlumber - PDF processing
**Frontend:**
- Next.js 16 - React framework
- TypeScript - Type safety
- Tailwind CSS - Styling
- Radix UI - Component library
- shadcn/ui - UI components
**ML/AI:**
- Hugging Face Transformers
- Sentence Transformers
- Custom fine-tuned models (Module B)
## π Prerequisites
- **Python**: 3.9+ (recommended: 3.13)
- **Node.js**: 18+ with pnpm
- **API Keys**: Mistral AI API key
- **System**: Linux/macOS/Windows
## βοΈ Installation
### 1. Clone the Repository
```bash
git clone https://github.com/KhagendraN/Setu.git
cd Setu
```
### 2. Backend Setup
Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
Install dependencies:
```bash
pip install -r requirements.txt
```
Create `.env` file in the project root:
```bash
MISTRAL_API_KEY=your_mistral_api_key_here
```
### 3. Build Vector Databases
**Module A (Law Explanation):**
```bash
# Place your legal PDFs in data/module-A/law/
python -m module_a.process_documents
python -m module_a.build_vector_db
```
**Module C (Letter Generation):**
```bash
# Templates are already in data/module-C/
python -m module_c.indexer
```
### 4. Frontend Setup
```bash
cd Frontend
pnpm install
cd ..
```
## π Running the Application
You need **TWO terminals** to run the full application:
### Terminal 1: Backend API
```bash
# Activate virtual environment
source venv/bin/activate
# Start the API server
uvicorn api.main:app --reload --port 8000
```
Backend will run at: `http://localhost:8000`
API docs available at: `http://localhost:8000/docs`
### Terminal 2: Frontend
```bash
cd Frontend
pnpm dev
```
Frontend will run at: `http://localhost:3000`
## π³ Docker Usage (Recommended)
The easiest way to run the entire platform is using Docker Compose.
### 1. Prerequisites
- Docker and Docker Compose installed
- `.env` file with `MISTRAL_API_KEY` in the root directory
### 2. Run with Docker Compose
```bash
docker-compose up --build
```
This will:
- Build and start the Backend API (port 8000)
- Build and start the Frontend (port 3000)
- Automatically run the vector database build scripts
The application will be available at `http://localhost:3000`.
## π Project Structure
```
Setu/
βββ api/ # Main API endpoints
β βββ main.py # FastAPI application
β βββ routes/
β β βββ law_explanation.py # Module A endpoints
β β βββ letter_generation.py # Module C endpoints
β β βββ bias_detection.py # Module B endpoints
β β βββ pdf_processing.py # PDF utility endpoints
β βββ schemas.py # Pydantic models
β
βββ module_a/ # Law Explanation (RAG)
β βββ rag_chain.py # RAG pipeline
β βββ vector_db.py # ChromaDB interface
β βββ process_documents.py # Document processing
β βββ README.md
β
βββ module_b/ # Bias Detection
β βββ inference.py # Model inference
β βββ fine_tuning/ # Training scripts
β βββ dataset/ # Training data
β
βββ module_c/ # Letter Generation
β βββ interface.py # Main API
β βββ retriever.py # Template retrieval
β βββ generator.py # Letter generation
β βββ indexer.py # Vector DB indexing
β βββ README.md
β
βββ utility/ # PDF Processing
β βββ pdf_processor.py # PDF extraction
β βββ README.md
β
βββ Frontend/ # Next.js application
β βββ app/
β β βββ chatbot/ # Module A UI
β β βββ letter-generator/ # Module C UI
β β βββ bias-checker/ # Module B UI
β β βββ dashboard/ # Main dashboard
β β βββ login/ # Authentication pages
β βββ components/ # Reusable components
β
βββ data/ # Data storage
βββ module-A/ # Law documents & vector DB
βββ module-C/ # Letter templates & vector DB
βββ module-B/ # Bias detection datasets
```
## π API Endpoints
### Authentication
- `POST /api/v1/signup` - Register a new user
- `POST /api/v1/login` - User login
- `GET /api/v1/me` - Get current user profile
- `POST /api/v1/refresh` - Refresh access token
### Law Explanation (Module A)
- `POST /api/v1/law-explanation/explain` - Ask legal questions (basic)
- `POST /api/v1/law-explanation/chat` - Context-aware chat with conversation history
- `GET /api/v1/law-explanation/sources` - Get source documents only
### Chat History
- `POST /api/v1/chat-history/conversations` - Create a new conversation
- `GET /api/v1/chat-history/conversations` - List all user conversations
- `GET /api/v1/chat-history/conversations/{id}` - Get specific conversation with messages
- `DELETE /api/v1/chat-history/conversations/{id}` - Delete a conversation
- `POST /api/v1/chat-history/messages` - Save a message to conversation
### Letter Generation (Module C)
- `POST /api/v1/search-template` - Search for letter templates
- `POST /api/v1/get-template-details` - Get template requirements
- `POST /api/v1/fill-template` - Fill template with user data
- `POST /api/v1/generate-letter` - Generate complete letter (smart generation)
- `POST /api/v1/analyze-requirements` - Analyze missing fields in template
### Bias Detection (Module B)
- `POST /api/v1/detect-bias` - Detect bias in text
- `POST /api/v1/detect-bias/batch` - Batch bias detection
- `POST /api/v1/debias-sentence` - Get debiased alternatives
- `POST /api/v1/debias-sentence/batch` - Batch debiasing
- `GET /api/v1/health` - Health check
### Bias Detection HITL (Human-in-the-Loop)
- `POST /api/v1/bias-detection-hitl/detect` - Detect bias with HITL workflow
- `POST /api/v1/bias-detection-hitl/approve` - Approve bias detection results
- `POST /api/v1/bias-detection-hitl/regenerate` - Regenerate debiased suggestions
- `POST /api/v1/bias-detection-hitl/generate-pdf` - Generate PDF report
### PDF Processing (Utility)
- `POST /api/v1/process-pdf` - Extract text from PDF
- `POST /api/v1/process-pdf-to-bias` - Extract PDF and detect bias
- `GET /api/v1/pdf-health` - Health check
### System
- `GET /` - API welcome message
- `GET /health` - System health check
Full API documentation: `http://localhost:8000/docs` (when server is running)
## π¨ Frontend Features
- **Dashboard**: Overview of all modules
- **Chatbot**: Interactive law explanation interface
- **Letter Generator**: Step-by-step letter creation wizard
- **Bias Checker**: Upload documents or paste text for analysis
- **User Profile**: User account management
- **Responsive Design**: Works on desktop and mobile
## π§ͺ Testing
### Test Module A (Law Explanation)
```bash
python -m module_a.test_rag
```
### Test Module C (Letter Generation)
```bash
python -m module_c.test_generation
python -m module_c.test_interactive
```
### Test PDF Processing
```bash
python -m utility.test_pdf_processor
```
### Test API Endpoints
```bash
python -m api.test_api
```
## π Configuration
### Environment Variables (.env)
```bash
# Required
MISTRAL_API_KEY=your_api_key_here
# Optional - MongoDB (if using Auth Backend)
# MONGODB_URL=mongodb://localhost:27017
# SECRET_KEY=your_secret_key
```
### Module Configurations
- **Module A**: [module_a/config.py](module_a/config.py)
- **Module C**: [module_c/config.py](module_c/config.py)
## π Troubleshooting
### Backend Issues
- **Import errors**: Make sure virtual environment is activated
- **Vector DB empty**: Run the build scripts for modules A & C
- **API key errors**: Check `.env` file has valid `MISTRAL_API_KEY`
### Frontend Issues
- **Port 3000 in use**: Change port with `pnpm dev -- -p 3001`
- **Module not found**: Run `pnpm install` in Frontend directory
- **API connection failed**: Ensure backend is running on port 8000
### Common Errors
```bash
# Reinstall dependencies
pip install --upgrade -r requirements.txt
# Rebuild vector databases
python -m module_a.build_vector_db
python -m module_c.indexer
# Clear pnpm cache
cd Frontend
pnpm store prune
pnpm install
```
## π Documentation
- [Module A Documentation](module_a/README.md) - Law Explanation RAG Pipeline
- [Module C Documentation](module_c/README.md) - Letter Generation
- [PDF Processing Guide](utility/README.md) - PDF text extraction
- [Implementation Guides](docs/) - Detailed implementation workflows
---
> This project is under development as part of a hackathon.
|