|
|
--- |
|
|
title: Setu |
|
|
emoji: π |
|
|
colorFrom: blue |
|
|
colorTo: red |
|
|
sdk: docker |
|
|
app_port: 7860 |
|
|
--- |
|
|
|
|
|
# Setu π³π΅ |
|
|
|
|
|
**An AI-powered platform for legal assistance in Nepal** - making legal documents accessible, generating official letters, and detecting bias in legal text. |
|
|
|
|
|
## π― Project Overview |
|
|
|
|
|
Setu is a comprehensive legal assistance platform that leverages AI/ML to help Nepali citizens interact with legal documents and government processes. The system consists of three main modules integrated with a modern web interface. |
|
|
|
|
|
## π₯ Team |
|
|
|
|
|
### Khagendra Neupane |
|
|
[GitHub](https://github.com/KhagendraN) | [Portfolio](https://www.khagendraneupane.com.np/) | [LinkedIn](https://www.linkedin.com/in/khagendra-neupane-37427532a/) |
|
|
|
|
|
|
|
|
### Sangam Silwal |
|
|
[GitHub](https://github.com/SangamSilwal) | [LinkedIn](https://www.linkedin.com/in/sangam-silwal-b0654b316?utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=android_app) |
|
|
|
|
|
### Sambhav Regmi |
|
|
[GitHub](https://github.com/sambhav605) | [Portfolio](https://sambhavregmi.com.np/) | [LinkedIn](https://www.linkedin.com/in/sambhav-regmi-350512321?utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=android_app) |
|
|
|
|
|
### Rupak Adhikari |
|
|
[GitHub](https://github.com/sampletestg) | [LinkedIn](https://www.linkedin.com/in/rupak-adhikari-40b436344/) |
|
|
|
|
|
|
|
|
## π₯ Demo Video |
|
|
|
|
|
Watch the platform in action: [View Demo Video](https://drive.google.com/file/d/12j2J-_g7SHdcQTwU3hQU_uiWldB2RFUz/view?usp=drive_link) |
|
|
|
|
|
## π Features |
|
|
|
|
|
### Module A: Law Explanation (RAG-Based Chatbot) |
|
|
- **Intelligent Q&A**: Ask questions about Nepali laws in natural language (English/Nepali) |
|
|
- **Retrieval-Augmented Generation**: Retrieves relevant legal text and generates accurate explanations |
|
|
- **Source References**: Provides exact article/section references |
|
|
- **Vector Database**: ChromaDB with semantic search capabilities |
|
|
|
|
|
### Module B: Multi-Category Bias Detection |
|
|
- **10+ Bias Categories**: Detects gender, caste, religion, age, disability, appearance, social status, political, and ambiguity biases |
|
|
- **Fine-tuned DistilBERT**: Custom model trained on Nepali legal texts |
|
|
- **Sentence Analysis**: Analyzes individual sentences or batch processing |
|
|
- **Debiasing Suggestions**: Provides bias-free alternatives for detected biases |
|
|
- **Confidence Scoring**: Returns confidence scores for each detection |
|
|
|
|
|
### Module C: Letter Generation |
|
|
- **Template-Based Generation**: RAG-based intelligent template selection |
|
|
- **Natural Language Input**: Describe your need, get the right letter |
|
|
- **Smart Field Extraction**: Automatically extracts name, date, district, etc. |
|
|
- **Official Formats**: Generates proper Nepali government letter formats |
|
|
|
|
|
### Utility: PDF Processing |
|
|
- **Text Extraction**: Extract text from legal PDFs (English & Nepali) |
|
|
- **Multi-method Support**: PyMuPDF, pdfplumber with intelligent fallback |
|
|
- **OCR Ready**: Handles scanned documents |
|
|
- **Integrated Pipeline**: Direct integration with bias detection |
|
|
|
|
|
## π οΈ Tech Stack |
|
|
|
|
|
**Backend:** |
|
|
- FastAPI (Python) - RESTful API |
|
|
- ChromaDB - Vector database for embeddings |
|
|
- Mistral AI - LLM for generation |
|
|
- Sentence Transformers - Embeddings |
|
|
- PyMuPDF, PDFPlumber - PDF processing |
|
|
|
|
|
**Frontend:** |
|
|
- Next.js 16 - React framework |
|
|
- TypeScript - Type safety |
|
|
- Tailwind CSS - Styling |
|
|
- Radix UI - Component library |
|
|
- shadcn/ui - UI components |
|
|
|
|
|
**ML/AI:** |
|
|
- Hugging Face Transformers |
|
|
- Sentence Transformers |
|
|
- Custom fine-tuned models (Module B) |
|
|
|
|
|
## π Prerequisites |
|
|
|
|
|
- **Python**: 3.9+ (recommended: 3.13) |
|
|
- **Node.js**: 18+ with pnpm |
|
|
- **API Keys**: Mistral AI API key |
|
|
- **System**: Linux/macOS/Windows |
|
|
|
|
|
## βοΈ Installation |
|
|
|
|
|
### 1. Clone the Repository |
|
|
```bash |
|
|
git clone https://github.com/KhagendraN/Setu.git |
|
|
cd Setu |
|
|
``` |
|
|
|
|
|
### 2. Backend Setup |
|
|
|
|
|
Create a virtual environment: |
|
|
```bash |
|
|
python -m venv venv |
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate |
|
|
``` |
|
|
|
|
|
Install dependencies: |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
Create `.env` file in the project root: |
|
|
```bash |
|
|
MISTRAL_API_KEY=your_mistral_api_key_here |
|
|
``` |
|
|
|
|
|
### 3. Build Vector Databases |
|
|
|
|
|
**Module A (Law Explanation):** |
|
|
```bash |
|
|
# Place your legal PDFs in data/module-A/law/ |
|
|
python -m module_a.process_documents |
|
|
python -m module_a.build_vector_db |
|
|
``` |
|
|
|
|
|
**Module C (Letter Generation):** |
|
|
```bash |
|
|
# Templates are already in data/module-C/ |
|
|
python -m module_c.indexer |
|
|
``` |
|
|
|
|
|
### 4. Frontend Setup |
|
|
```bash |
|
|
cd Frontend |
|
|
pnpm install |
|
|
cd .. |
|
|
``` |
|
|
|
|
|
## π Running the Application |
|
|
|
|
|
You need **TWO terminals** to run the full application: |
|
|
|
|
|
### Terminal 1: Backend API |
|
|
```bash |
|
|
# Activate virtual environment |
|
|
source venv/bin/activate |
|
|
|
|
|
# Start the API server |
|
|
uvicorn api.main:app --reload --port 8000 |
|
|
``` |
|
|
|
|
|
Backend will run at: `http://localhost:8000` |
|
|
API docs available at: `http://localhost:8000/docs` |
|
|
|
|
|
### Terminal 2: Frontend |
|
|
```bash |
|
|
cd Frontend |
|
|
pnpm dev |
|
|
``` |
|
|
|
|
|
Frontend will run at: `http://localhost:3000` |
|
|
|
|
|
## π³ Docker Usage (Recommended) |
|
|
|
|
|
The easiest way to run the entire platform is using Docker Compose. |
|
|
|
|
|
### 1. Prerequisites |
|
|
- Docker and Docker Compose installed |
|
|
- `.env` file with `MISTRAL_API_KEY` in the root directory |
|
|
|
|
|
### 2. Run with Docker Compose |
|
|
```bash |
|
|
docker-compose up --build |
|
|
``` |
|
|
|
|
|
This will: |
|
|
- Build and start the Backend API (port 8000) |
|
|
- Build and start the Frontend (port 3000) |
|
|
- Automatically run the vector database build scripts |
|
|
|
|
|
The application will be available at `http://localhost:3000`. |
|
|
|
|
|
## π Project Structure |
|
|
|
|
|
``` |
|
|
Setu/ |
|
|
βββ api/ # Main API endpoints |
|
|
β βββ main.py # FastAPI application |
|
|
β βββ routes/ |
|
|
β β βββ law_explanation.py # Module A endpoints |
|
|
β β βββ letter_generation.py # Module C endpoints |
|
|
β β βββ bias_detection.py # Module B endpoints |
|
|
β β βββ pdf_processing.py # PDF utility endpoints |
|
|
β βββ schemas.py # Pydantic models |
|
|
β |
|
|
βββ module_a/ # Law Explanation (RAG) |
|
|
β βββ rag_chain.py # RAG pipeline |
|
|
β βββ vector_db.py # ChromaDB interface |
|
|
β βββ process_documents.py # Document processing |
|
|
β βββ README.md |
|
|
β |
|
|
βββ module_b/ # Bias Detection |
|
|
β βββ inference.py # Model inference |
|
|
β βββ fine_tuning/ # Training scripts |
|
|
β βββ dataset/ # Training data |
|
|
β |
|
|
βββ module_c/ # Letter Generation |
|
|
β βββ interface.py # Main API |
|
|
β βββ retriever.py # Template retrieval |
|
|
β βββ generator.py # Letter generation |
|
|
β βββ indexer.py # Vector DB indexing |
|
|
β βββ README.md |
|
|
β |
|
|
βββ utility/ # PDF Processing |
|
|
β βββ pdf_processor.py # PDF extraction |
|
|
β βββ README.md |
|
|
β |
|
|
βββ Frontend/ # Next.js application |
|
|
β βββ app/ |
|
|
β β βββ chatbot/ # Module A UI |
|
|
β β βββ letter-generator/ # Module C UI |
|
|
β β βββ bias-checker/ # Module B UI |
|
|
β β βββ dashboard/ # Main dashboard |
|
|
β β βββ login/ # Authentication pages |
|
|
β βββ components/ # Reusable components |
|
|
β |
|
|
βββ data/ # Data storage |
|
|
βββ module-A/ # Law documents & vector DB |
|
|
βββ module-C/ # Letter templates & vector DB |
|
|
βββ module-B/ # Bias detection datasets |
|
|
``` |
|
|
|
|
|
## π API Endpoints |
|
|
|
|
|
### Authentication |
|
|
- `POST /api/v1/signup` - Register a new user |
|
|
- `POST /api/v1/login` - User login |
|
|
- `GET /api/v1/me` - Get current user profile |
|
|
- `POST /api/v1/refresh` - Refresh access token |
|
|
|
|
|
### Law Explanation (Module A) |
|
|
- `POST /api/v1/law-explanation/explain` - Ask legal questions (basic) |
|
|
- `POST /api/v1/law-explanation/chat` - Context-aware chat with conversation history |
|
|
- `GET /api/v1/law-explanation/sources` - Get source documents only |
|
|
|
|
|
### Chat History |
|
|
- `POST /api/v1/chat-history/conversations` - Create a new conversation |
|
|
- `GET /api/v1/chat-history/conversations` - List all user conversations |
|
|
- `GET /api/v1/chat-history/conversations/{id}` - Get specific conversation with messages |
|
|
- `DELETE /api/v1/chat-history/conversations/{id}` - Delete a conversation |
|
|
- `POST /api/v1/chat-history/messages` - Save a message to conversation |
|
|
|
|
|
### Letter Generation (Module C) |
|
|
- `POST /api/v1/search-template` - Search for letter templates |
|
|
- `POST /api/v1/get-template-details` - Get template requirements |
|
|
- `POST /api/v1/fill-template` - Fill template with user data |
|
|
- `POST /api/v1/generate-letter` - Generate complete letter (smart generation) |
|
|
- `POST /api/v1/analyze-requirements` - Analyze missing fields in template |
|
|
|
|
|
### Bias Detection (Module B) |
|
|
- `POST /api/v1/detect-bias` - Detect bias in text |
|
|
- `POST /api/v1/detect-bias/batch` - Batch bias detection |
|
|
- `POST /api/v1/debias-sentence` - Get debiased alternatives |
|
|
- `POST /api/v1/debias-sentence/batch` - Batch debiasing |
|
|
- `GET /api/v1/health` - Health check |
|
|
|
|
|
### Bias Detection HITL (Human-in-the-Loop) |
|
|
- `POST /api/v1/bias-detection-hitl/detect` - Detect bias with HITL workflow |
|
|
- `POST /api/v1/bias-detection-hitl/approve` - Approve bias detection results |
|
|
- `POST /api/v1/bias-detection-hitl/regenerate` - Regenerate debiased suggestions |
|
|
- `POST /api/v1/bias-detection-hitl/generate-pdf` - Generate PDF report |
|
|
|
|
|
### PDF Processing (Utility) |
|
|
- `POST /api/v1/process-pdf` - Extract text from PDF |
|
|
- `POST /api/v1/process-pdf-to-bias` - Extract PDF and detect bias |
|
|
- `GET /api/v1/pdf-health` - Health check |
|
|
|
|
|
### System |
|
|
- `GET /` - API welcome message |
|
|
- `GET /health` - System health check |
|
|
|
|
|
Full API documentation: `http://localhost:8000/docs` (when server is running) |
|
|
|
|
|
## π¨ Frontend Features |
|
|
|
|
|
- **Dashboard**: Overview of all modules |
|
|
- **Chatbot**: Interactive law explanation interface |
|
|
- **Letter Generator**: Step-by-step letter creation wizard |
|
|
- **Bias Checker**: Upload documents or paste text for analysis |
|
|
- **User Profile**: User account management |
|
|
- **Responsive Design**: Works on desktop and mobile |
|
|
|
|
|
## π§ͺ Testing |
|
|
|
|
|
### Test Module A (Law Explanation) |
|
|
```bash |
|
|
python -m module_a.test_rag |
|
|
``` |
|
|
|
|
|
### Test Module C (Letter Generation) |
|
|
```bash |
|
|
python -m module_c.test_generation |
|
|
python -m module_c.test_interactive |
|
|
``` |
|
|
|
|
|
### Test PDF Processing |
|
|
```bash |
|
|
python -m utility.test_pdf_processor |
|
|
``` |
|
|
|
|
|
### Test API Endpoints |
|
|
```bash |
|
|
python -m api.test_api |
|
|
``` |
|
|
|
|
|
## π Configuration |
|
|
|
|
|
### Environment Variables (.env) |
|
|
```bash |
|
|
# Required |
|
|
MISTRAL_API_KEY=your_api_key_here |
|
|
|
|
|
# Optional - MongoDB (if using Auth Backend) |
|
|
# MONGODB_URL=mongodb://localhost:27017 |
|
|
# SECRET_KEY=your_secret_key |
|
|
``` |
|
|
|
|
|
### Module Configurations |
|
|
- **Module A**: [module_a/config.py](module_a/config.py) |
|
|
- **Module C**: [module_c/config.py](module_c/config.py) |
|
|
|
|
|
## π Troubleshooting |
|
|
|
|
|
### Backend Issues |
|
|
- **Import errors**: Make sure virtual environment is activated |
|
|
- **Vector DB empty**: Run the build scripts for modules A & C |
|
|
- **API key errors**: Check `.env` file has valid `MISTRAL_API_KEY` |
|
|
|
|
|
### Frontend Issues |
|
|
- **Port 3000 in use**: Change port with `pnpm dev -- -p 3001` |
|
|
- **Module not found**: Run `pnpm install` in Frontend directory |
|
|
- **API connection failed**: Ensure backend is running on port 8000 |
|
|
|
|
|
### Common Errors |
|
|
```bash |
|
|
# Reinstall dependencies |
|
|
pip install --upgrade -r requirements.txt |
|
|
|
|
|
# Rebuild vector databases |
|
|
python -m module_a.build_vector_db |
|
|
python -m module_c.indexer |
|
|
|
|
|
# Clear pnpm cache |
|
|
cd Frontend |
|
|
pnpm store prune |
|
|
pnpm install |
|
|
``` |
|
|
|
|
|
## π Documentation |
|
|
|
|
|
- [Module A Documentation](module_a/README.md) - Law Explanation RAG Pipeline |
|
|
- [Module C Documentation](module_c/README.md) - Letter Generation |
|
|
- [PDF Processing Guide](utility/README.md) - PDF text extraction |
|
|
- [Implementation Guides](docs/) - Detailed implementation workflows |
|
|
|
|
|
--- |
|
|
|
|
|
> This project is under development as part of a hackathon. |
|
|
|