File size: 11,104 Bytes
3998131
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
# Setu πŸ‡³πŸ‡΅

**An AI-powered platform for legal assistance in Nepal** - making legal documents accessible, generating official letters, and detecting bias in legal text.

## 🎯 Project Overview

Setu is a comprehensive legal assistance platform that leverages AI/ML to help Nepali citizens interact with legal documents and government processes. The system consists of three main modules integrated with a modern web interface.

## πŸŽ₯ Demo Video

Watch the platform in action: [View Demo Video](https://drive.google.com/file/d/12j2J-_g7SHdcQTwU3hQU_uiWldB2RFUz/view?usp=drive_link)

## πŸš€ Features

### Module A: Law Explanation (RAG-Based Chatbot)
- **Intelligent Q&A**: Ask questions about Nepali laws in natural language (English/Nepali)
- **Retrieval-Augmented Generation**: Retrieves relevant legal text and generates accurate explanations
- **Source References**: Provides exact article/section references
- **Vector Database**: ChromaDB with semantic search capabilities

### Module B: Multi-Category Bias Detection
- **10+ Bias Categories**: Detects gender, caste, religion, age, disability, appearance, social status, political, and ambiguity biases
- **Fine-tuned DistilBERT**: Custom model trained on Nepali legal texts
- **Sentence Analysis**: Analyzes individual sentences or batch processing
- **Debiasing Suggestions**: Provides bias-free alternatives for detected biases
- **Confidence Scoring**: Returns confidence scores for each detection

### Module C: Letter Generation
- **Template-Based Generation**: RAG-based intelligent template selection
- **Natural Language Input**: Describe your need, get the right letter
- **Smart Field Extraction**: Automatically extracts name, date, district, etc.
- **Official Formats**: Generates proper Nepali government letter formats

### Utility: PDF Processing
- **Text Extraction**: Extract text from legal PDFs (English & Nepali)
- **Multi-method Support**: PyMuPDF, pdfplumber with intelligent fallback
- **OCR Ready**: Handles scanned documents
- **Integrated Pipeline**: Direct integration with bias detection

## πŸ› οΈ Tech Stack

**Backend:**
- FastAPI (Python) - RESTful API
- ChromaDB - Vector database for embeddings
- Mistral AI - LLM for generation
- Sentence Transformers - Embeddings
- PyMuPDF, PDFPlumber - PDF processing

**Frontend:**
- Next.js 16 - React framework
- TypeScript - Type safety
- Tailwind CSS - Styling
- Radix UI - Component library
- shadcn/ui - UI components

**ML/AI:**
- Hugging Face Transformers
- Sentence Transformers
- Custom fine-tuned models (Module B)

## πŸ“‹ Prerequisites

- **Python**: 3.9+ (recommended: 3.13)
- **Node.js**: 18+ with pnpm
- **API Keys**: Mistral AI API key
- **System**: Linux/macOS/Windows

## βš™οΈ Installation

### 1. Clone the Repository
```bash
git clone https://github.com/KhagendraN/Setu.git
cd Setu
```

### 2. Backend Setup

Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
```

Install dependencies:
```bash
pip install -r requirements.txt
```

Create `.env` file in the project root:
```bash
MISTRAL_API_KEY=your_mistral_api_key_here
```

### 3. Build Vector Databases

**Module A (Law Explanation):**
```bash
# Place your legal PDFs in data/module-A/law/
python -m module_a.process_documents
python -m module_a.build_vector_db
```

**Module C (Letter Generation):**
```bash
# Templates are already in data/module-C/
python -m module_c.indexer
```

### 4. Frontend Setup
```bash
cd Frontend
pnpm install
cd ..
```

## πŸš€ Running the Application

You need **TWO terminals** to run the full application:

### Terminal 1: Backend API
```bash
# Activate virtual environment
source venv/bin/activate

# Start the API server
uvicorn api.main:app --reload --port 8000
```

Backend will run at: `http://localhost:8000`
API docs available at: `http://localhost:8000/docs`

### Terminal 2: Frontend
```bash
cd Frontend
pnpm dev
```

Frontend will run at: `http://localhost:3000`

## 🐳 Docker Usage (Recommended)

The easiest way to run the entire platform is using Docker Compose.

### 1. Prerequisites
- Docker and Docker Compose installed
- `.env` file with `MISTRAL_API_KEY` in the root directory

### 2. Run with Docker Compose
```bash
docker-compose up --build
```

This will:
- Build and start the Backend API (port 8000)
- Build and start the Frontend (port 3000)
- Automatically run the vector database build scripts

The application will be available at `http://localhost:3000`.

## πŸ“ Project Structure

```
Setu/
β”œβ”€β”€ api/                          # Main API endpoints
β”‚   β”œβ”€β”€ main.py                   # FastAPI application
β”‚   β”œβ”€β”€ routes/
β”‚   β”‚   β”œβ”€β”€ law_explanation.py    # Module A endpoints
β”‚   β”‚   β”œβ”€β”€ letter_generation.py  # Module C endpoints
β”‚   β”‚   β”œβ”€β”€ bias_detection.py     # Module B endpoints
β”‚   β”‚   └── pdf_processing.py     # PDF utility endpoints
β”‚   └── schemas.py                # Pydantic models
β”‚
β”œβ”€β”€ module_a/                     # Law Explanation (RAG)
β”‚   β”œβ”€β”€ rag_chain.py             # RAG pipeline
β”‚   β”œβ”€β”€ vector_db.py             # ChromaDB interface
β”‚   β”œβ”€β”€ process_documents.py     # Document processing
β”‚   └── README.md
β”‚
β”œβ”€β”€ module_b/                     # Bias Detection
β”‚   β”œβ”€β”€ inference.py             # Model inference
β”‚   β”œβ”€β”€ fine_tuning/             # Training scripts
β”‚   └── dataset/                 # Training data
β”‚
β”œβ”€β”€ module_c/                     # Letter Generation
β”‚   β”œβ”€β”€ interface.py             # Main API
β”‚   β”œβ”€β”€ retriever.py             # Template retrieval
β”‚   β”œβ”€β”€ generator.py             # Letter generation
β”‚   β”œβ”€β”€ indexer.py               # Vector DB indexing
β”‚   └── README.md
β”‚
β”œβ”€β”€ utility/                      # PDF Processing
β”‚   β”œβ”€β”€ pdf_processor.py         # PDF extraction
β”‚   └── README.md
β”‚
β”œβ”€β”€ Frontend/                     # Next.js application
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ chatbot/             # Module A UI
β”‚   β”‚   β”œβ”€β”€ letter-generator/    # Module C UI
β”‚   β”‚   β”œβ”€β”€ bias-checker/        # Module B UI
β”‚   β”‚   β”œβ”€β”€ dashboard/           # Main dashboard
β”‚   β”‚   └── login/               # Authentication pages
β”‚   └── components/              # Reusable components
β”‚
└── data/                        # Data storage
    β”œβ”€β”€ module-A/                # Law documents & vector DB
    β”œβ”€β”€ module-C/                # Letter templates & vector DB
    └── module-B/                # Bias detection datasets
```

## πŸ”Œ API Endpoints

### Authentication
- `POST /api/v1/signup` - Register a new user
- `POST /api/v1/login` - User login
- `GET /api/v1/me` - Get current user profile
- `POST /api/v1/refresh` - Refresh access token

### Law Explanation (Module A)
- `POST /api/v1/law-explanation/explain` - Ask legal questions (basic)
- `POST /api/v1/law-explanation/chat` - Context-aware chat with conversation history
- `GET /api/v1/law-explanation/sources` - Get source documents only

### Chat History
- `POST /api/v1/chat-history/conversations` - Create a new conversation
- `GET /api/v1/chat-history/conversations` - List all user conversations
- `GET /api/v1/chat-history/conversations/{id}` - Get specific conversation with messages
- `DELETE /api/v1/chat-history/conversations/{id}` - Delete a conversation
- `POST /api/v1/chat-history/messages` - Save a message to conversation

### Letter Generation (Module C)
- `POST /api/v1/search-template` - Search for letter templates
- `POST /api/v1/get-template-details` - Get template requirements
- `POST /api/v1/fill-template` - Fill template with user data
- `POST /api/v1/generate-letter` - Generate complete letter (smart generation)
- `POST /api/v1/analyze-requirements` - Analyze missing fields in template

### Bias Detection (Module B)
- `POST /api/v1/detect-bias` - Detect bias in text
- `POST /api/v1/detect-bias/batch` - Batch bias detection
- `POST /api/v1/debias-sentence` - Get debiased alternatives
- `POST /api/v1/debias-sentence/batch` - Batch debiasing
- `GET /api/v1/health` - Health check

### Bias Detection HITL (Human-in-the-Loop)
- `POST /api/v1/bias-detection-hitl/detect` - Detect bias with HITL workflow
- `POST /api/v1/bias-detection-hitl/approve` - Approve bias detection results
- `POST /api/v1/bias-detection-hitl/regenerate` - Regenerate debiased suggestions
- `POST /api/v1/bias-detection-hitl/generate-pdf` - Generate PDF report

### PDF Processing (Utility)
- `POST /api/v1/process-pdf` - Extract text from PDF
- `POST /api/v1/process-pdf-to-bias` - Extract PDF and detect bias
- `GET /api/v1/pdf-health` - Health check

### System
- `GET /` - API welcome message
- `GET /health` - System health check

Full API documentation: `http://localhost:8000/docs` (when server is running)

## 🎨 Frontend Features

- **Dashboard**: Overview of all modules
- **Chatbot**: Interactive law explanation interface
- **Letter Generator**: Step-by-step letter creation wizard
- **Bias Checker**: Upload documents or paste text for analysis
- **User Profile**: User account management
- **Responsive Design**: Works on desktop and mobile

## πŸ§ͺ Testing

### Test Module A (Law Explanation)
```bash
python -m module_a.test_rag
```

### Test Module C (Letter Generation)
```bash
python -m module_c.test_generation
python -m module_c.test_interactive
```

### Test PDF Processing
```bash
python -m utility.test_pdf_processor
```

### Test API Endpoints
```bash
python -m api.test_api
```

## πŸ“ Configuration

### Environment Variables (.env)
```bash
# Required
MISTRAL_API_KEY=your_api_key_here

# Optional - MongoDB (if using Auth Backend)
# MONGODB_URL=mongodb://localhost:27017
# SECRET_KEY=your_secret_key
```

### Module Configurations
- **Module A**: [module_a/config.py](module_a/config.py)
- **Module C**: [module_c/config.py](module_c/config.py)

## πŸ› Troubleshooting

### Backend Issues
- **Import errors**: Make sure virtual environment is activated
- **Vector DB empty**: Run the build scripts for modules A & C
- **API key errors**: Check `.env` file has valid `MISTRAL_API_KEY`

### Frontend Issues
- **Port 3000 in use**: Change port with `pnpm dev -- -p 3001`
- **Module not found**: Run `pnpm install` in Frontend directory
- **API connection failed**: Ensure backend is running on port 8000

### Common Errors
```bash
# Reinstall dependencies
pip install --upgrade -r requirements.txt

# Rebuild vector databases
python -m module_a.build_vector_db
python -m module_c.indexer

# Clear pnpm cache
cd Frontend
pnpm store prune
pnpm install
```

## πŸ“š Documentation

- [Module A Documentation](module_a/README.md) - Law Explanation RAG Pipeline
- [Module C Documentation](module_c/README.md) - Letter Generation
- [PDF Processing Guide](utility/README.md) - PDF text extraction
- [Implementation Guides](docs/) - Detailed implementation workflows

---

> This project is under development as part of a hackathon.