Spaces:

CoderNoah
/

Lega.AI

Sleeping

File size: 19,675 Bytes

8b7e8f0

---
title: Lega.AI
emoji: ⚖️
colorFrom: pink
colorTo: indigo
sdk: docker
pinned: false
---

# Lega.AI

AI-powered legal document analysis and simplification platform that makes complex legal documents accessible to everyone.

![Python](https://img.shields.io/badge/Python-3.13+-blue.svg)
![Streamlit](https://img.shields.io/badge/Streamlit-1.49+-red.svg)
![LangChain](https://img.shields.io/badge/LangChain-0.3+-green.svg)
![License](https://img.shields.io/badge/License-MIT-yellow.svg)

## 📋 Table of Contents

- [🚀 Features](#-features)
- [🛠️ Tech Stack](#️-tech-stack)
- [📋 Prerequisites](#-prerequisites)
- [🚀 Quick Start](#-quick-start)
- [🐳 Docker Deployment](#-docker-deployment)
- [📁 Project Structure](#-project-structure)
- [🎯 Usage Guide](#-usage-guide)
- [📄 Sample Documents](#-sample-documents)
- [🚨 Document Types Supported](#-document-types-supported)
- [⚡ Key Features Deep Dive](#-key-features-deep-dive)
- [🔧 Configuration Options](#-configuration-options)
- [🔒 Privacy & Security](#-privacy--security)
- [🤝 Contributing](#-contributing)
- [🆘 Support](#-support)
- [🎯 Roadmap](#-roadmap)

## 🚀 Features

- **🔍 Advanced Document Analysis**: Upload PDF/DOCX/TXT files and get comprehensive AI-powered analysis using Google's Gemini
- **📝 Plain Language Translation**: Convert complex legal jargon into clear, understandable language with context-aware explanations
- **⚠️ Intelligent Risk Assessment**: Multi-dimensional risk scoring with color-coded severity levels and detailed explanations
- **💬 Interactive Q&A Assistant**: Ask specific questions about your documents and get instant, context-aware AI responses
- **🎯 Smart Clause Highlighting**: Visual highlighting of risky clauses with interactive tooltips and improvement suggestions
- **📊 Vector-Powered Similarity Search**: Find similar clauses across documents using Chroma vector database
- **📚 Persistent Document Library**: Organize, search, and manage all analyzed documents with metadata
- **⚠️ Risk Visualization**: Interactive charts and gauges showing risk distribution and severity
- **🗓️ Key Information Extraction**: Automatically identify important dates, deadlines, and financial terms
- **💾 Local Data Persistence**: Secure local storage of analysis results and vector embeddings
- **🎨 Modern UI/UX**: Responsive Streamlit interface with custom CSS and intuitive navigation

## 🛠️ Tech Stack

- **Frontend**: Streamlit with multi-page navigation and custom CSS styling
- **AI/ML**: LangChain + Google Generative AI (Gemini Pro)
- **Embeddings**: Google Generative AI Embeddings (models/text-embedding-004)
- **Vector Store**: Chroma for document similarity search and persistence
- **Document Processing**: PyPDF for PDF extraction, python-docx for Word documents
- **Package Management**: UV (modern Python package manager)
- **Configuration**: Python-dotenv for environment management
- **Visualization**: Plotly for interactive charts and analytics
- **UI Components**: Streamlit-option-menu for enhanced navigation

## 📋 Prerequisites

- Python 3.13+ (required for latest features and performance)
- Google AI API key (get from [Google AI Studio](https://aistudio.google.com/))
- UV package manager (recommended for fast, reliable dependency management)

## 🚀 Quick Start

### 1. **Clone and navigate to the project**:

```bash
git clone <repository-url>
cd Lega.AI
```

### 2. **Install UV (if not already installed)**:

```bash
# On macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or using pip
pip install uv
```

### 3. **Set up environment and install dependencies**:

```bash
# Create and activate virtual environment with dependencies
uv sync

# Or if you prefer traditional approach:
# uv venv
# source .venv/bin/activate  # On Windows: .venv\Scripts\activate
# uv pip install -r pyproject.toml
```

### 4. **Configure environment**:

```bash
# Copy the template file
cp .env.example .env

# Edit .env file and update the following required settings:
```

**Required Configuration:**

```env
# Get your API key from: https://aistudio.google.com/
GOOGLE_API_KEY=your-google-api-key-here
```

**Optional Configuration (with sensible defaults):**

```env
# Application Settings
DEBUG=True
LOG_LEVEL=INFO
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=localhost

# File Upload Settings
MAX_FILE_SIZE_MB=10
SUPPORTED_FILE_TYPES=pdf,docx,txt

# AI Model Settings
TEMPERATURE=0.2
MAX_TOKENS=2048
EMBEDDING_MODEL=models/text-embedding-004

# Storage Configuration
CHROMA_PERSIST_DIRECTORY=./data/chroma_db
UPLOAD_DIR=./uploads
DATA_DIR=./data
LOG_FILE=./data/app.log

# Security Settings
SECRET_KEY=your-secret-key-here
SESSION_TIMEOUT_MINUTES=60
```

### 5. **Run the application**:

```bash
# If using UV (recommended)
uv run streamlit run main.py

# Or with activated virtual environment
streamlit run main.py
```

### 6. **Open your browser** to `http://localhost:8501`

### 🎯 Try the Demo

Once running, you can immediately test the application with the included sample documents:

- Navigate to **📄 Upload** page
- Try the sample documents: Employment contracts, NDAs, Lease agreements, Service agreements
- Experience the full analysis workflow without needing your own documents

## 🐳 Docker Deployment

### Local Docker Deployment

```bash
# Build the Docker image
docker build -t lega-ai .

# Run the container
docker run -p 7860:7860 -e GOOGLE_API_KEY=your_api_key_here lega-ai
```

### Hugging Face Spaces Deployment

Deploy Lega.AI to Hugging Face Spaces with one click!

[![Deploy to Hugging Face Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-md.svg)](https://huggingface.co/spaces)

**Quick Setup:**

1. Create a new [Hugging Face Space](https://huggingface.co/spaces) with SDK: Docker
2. Upload this repository to your Space
3. Set `GOOGLE_API_KEY` in Space Settings → Variables
4. Your app will be live at `https://huggingface.co/spaces/[username]/[space-name]`

📋 **Detailed Instructions**: See [HUGGINGFACE_DEPLOYMENT.md](./HUGGINGFACE_DEPLOYMENT.md) for complete setup guide.

## 📁 Project Structure

```
Lega.AI/
├── main.py                 # Main Streamlit application entry point
├── pyproject.toml          # UV/pip package configuration and dependencies
├── requirements.txt        # Docker-compatible requirements file
├── uv.lock                 # UV lockfile for reproducible builds
├── setup.py                # Legacy Python package setup
├── Dockerfile              # Docker container configuration
├── .dockerignore          # Docker build optimization
├── start.sh               # Hugging Face Spaces startup script
├── .env.example           # Environment variables template
├── .env.hf                # Hugging Face Spaces configuration
├── README.md              # Project documentation
├── HUGGINGFACE_DEPLOYMENT.md # HF Spaces deployment guide
├── src/                   # Main application source code
│   ├── __init__.py
│   ├── models/
│   │   ├── __init__.py
│   │   └── document.py    # Document data models and schemas
│   ├── services/
│   │   ├── __init__.py
│   │   ├── document_processor.py  # PDF/DOCX text extraction
│   │   ├── ai_analyzer.py         # AI analysis and risk assessment
│   │   └── vector_store.py        # Chroma vector database management
│   ├── pages/
│   │   ├── __init__.py
│   │   ├── upload.py      # Document upload interface
│   │   ├── analysis.py    # Document analysis dashboard
│   │   ├── qa_assistant.py # Interactive Q&A chat interface
│   │   ├── library.py     # Document library management
│   │   └── settings.py    # Application settings and configuration
│   └── utils/
│       ├── __init__.py
│       ├── config.py      # Environment configuration management
│       ├── logger.py      # Logging utilities and setup
│       └── helpers.py     # Common helper functions
├── sample/                # Sample legal documents for testing
│   ├── Employment_Offer_Letter.pdf
│   ├── Master_Services_Agreement.pdf
│   ├── Mutual_NDA.pdf
│   └── Residential_Lease_Agreement.pdf
├── data/                  # Local data storage and persistence
│   ├── app.log           # Application logs
│   └── chroma_db/        # Vector database storage
└── uploads/              # Temporary file uploads directory
```

## 🎯 Usage Guide

### 1. Document Upload & Processing

- Navigate to **📄 Upload** page
- Upload PDF, DOCX, or TXT files (max 10MB per file)
- Try the included sample documents for immediate testing
- Automatic document type detection and text extraction

### 2. Comprehensive Analysis Dashboard

Visit **📊 Analysis** to explore:

- **Risk Score Gauge**: Interactive 0-100 risk assessment with color coding
- **Side-by-Side Comparison**: Original text vs. simplified plain language
- **Risk Factor Breakdown**: Detailed explanations of identified risks with severity levels
- **Interactive Clause Highlighting**: Hover over highlighted text for tooltips with suggestions
- **Financial & Date Extraction**: Automatic identification of monetary amounts and key dates
- **Risk Visualization Charts**: Visual distribution of risk categories and severity

### 3. Interactive Q&A Assistant

- Use **💬 Q&A** for document-specific questions and analysis
- Get context-aware answers powered by vector similarity search
- Access suggested questions based on document type and content
- Chat history preservation for reference and record-keeping

### 4. Document Library Management

- **📚 Library** provides persistent storage of all analyzed documents
- Advanced filtering by document type, risk level, upload date
- Full-text search across document content and analysis results
- Quick re-analysis and direct access to Q&A for stored documents
- Document metadata and analysis summary views

### 5. Settings & Configuration

- **⚙️ Settings** for API key management and validation
- Application configuration and performance monitoring
- Usage statistics and system health information

## 🔧 Configuration Options

The application uses environment variables for configuration. All settings can be customized in the `.env` file based on the `.env.example` template.

### 🔑 Required Settings

| Variable         | Description                      | Example                       |
| ---------------- | -------------------------------- | ----------------------------- |
| `GOOGLE_API_KEY` | Google AI API key for Gemini Pro | `xyz` (from AI Studio) |

### ⚙️ Application Settings

| Variable                   | Default        | Description                        |
| -------------------------- | -------------- | ---------------------------------- |
| `DEBUG`                    | `True`         | Enable debug mode and verbose logs |
| `LOG_LEVEL`                | `INFO`         | Logging level (DEBUG/INFO/WARNING) |
| `STREAMLIT_SERVER_PORT`    | `8501`         | Port for Streamlit server          |
| `STREAMLIT_SERVER_ADDRESS` | `localhost`    | Server address binding             |
| `MAX_FILE_SIZE_MB`         | `10`           | Maximum upload file size           |
| `SUPPORTED_FILE_TYPES`     | `pdf,docx,txt` | Allowed file extensions            |

### 🤖 AI Model Settings

| Variable          | Default                | Description                      |
| ----------------- | ---------------------- | -------------------------------- |
| `TEMPERATURE`     | `0.2`                  | AI response creativity (0.0-1.0) |
| `MAX_TOKENS`      | `2048`                 | Maximum response length          |
| `EMBEDDING_MODEL` | `models/embedding-001` | Google AI embedding model        |

### 💾 Storage Configuration

| Variable                   | Default            | Description                  |
| -------------------------- | ------------------ | ---------------------------- |
| `CHROMA_PERSIST_DIRECTORY` | `./data/chroma_db` | Vector database storage path |
| `UPLOAD_DIR`               | `./uploads`        | Temporary file uploads       |
| `DATA_DIR`                 | `./data`           | Application data directory   |
| `LOG_FILE`                 | `./data/app.log`   | Application log file path    |

### 🔒 Security Settings

| Variable                  | Default | Description              |
| ------------------------- | ------- | ------------------------ |
| `SECRET_KEY`              | None    | Application secret key   |
| `SESSION_TIMEOUT_MINUTES` | `60`    | Session timeout duration |

### Example .env configuration:

```bash
# Required
GOOGLE_API_KEY=your-google-ai-api-key

# Optional (with defaults shown)
DEBUG=True
LOG_LEVEL=INFO
MAX_FILE_SIZE_MB=10
SUPPORTED_FILE_TYPES=pdf,docx,txt
CHROMA_PERSIST_DIRECTORY=./data/chroma_db
TEMPERATURE=0.2
```

## � Sample Documents

The project includes professionally-crafted sample legal documents for testing and demonstration:

| Document Type                | Filename                          | Purpose                                  |
| ---------------------------- | --------------------------------- | ---------------------------------------- |
| **Employment Contract**      | `Employment_Offer_Letter.pdf`     | Test employment-related clause analysis  |
| **Service Agreement**        | `Master_Services_Agreement.pdf`   | Demonstrate commercial contract analysis |
| **Non-Disclosure Agreement** | `Mutual_NDA.pdf`                  | Show confidentiality clause assessment   |
| **Lease Agreement**          | `Residential_Lease_Agreement.pdf` | Test rental/property contract analysis   |

These documents are located in the `sample/` directory and can be uploaded directly through the application to:

- Experience the complete analysis workflow
- Test different document types and complexity levels
- Understand risk assessment capabilities
- Explore Q&A functionality with real legal content

## �🚨 Document Types Supported

Currently optimized for:

- **🏠 Rental/Lease Agreements**
- **💰 Loan Contracts**
- **💼 Employment Contracts**
- **🤝 Service Agreements**
- **🔒 Non-Disclosure Agreements (NDAs)**
- **📄 General Legal Documents**

## ⚡ Key Features Deep Dive

### 🔍 Advanced Risk Assessment Engine

- **Multi-dimensional Analysis**: Evaluates financial, legal commitment, and rights-related risks
- **Intelligent Severity Classification**: Categorizes risks as Low, Medium, High, or Critical
- **Contextual Risk Scoring**: Dynamic 0-100 scale based on document type and complexity
- **Actionable Recommendations**: Specific suggestions for improving problematic clauses

### 📝 AI-Powered Plain Language Translation

- **Context-Aware Simplification**: Maintains legal accuracy while improving readability
- **Jargon Definition System**: Interactive tooltips for complex legal terms
- **Document Type Optimization**: Tailored simplification based on contract category
- **Preservation of Legal Intent**: Ensures meaning is not lost in translation

### 🎯 Interactive Clause Analysis

- **Smart Highlighting System**: Visual identification of risky and important clauses
- **Hover Tooltips**: Immediate access to explanations and suggestions
- **Clause Categorization**: Organized by risk type and legal significance
- **Improvement Suggestions**: Specific recommendations for clause modifications

### 🔍 Vector-Powered Document Intelligence

- **Semantic Search**: Find similar clauses across your document library
- **Context-Aware Q&A**: Answers grounded in actual document content
- **Document Similarity**: Compare clauses against known patterns and standards
- **Persistent Knowledge Base**: Chroma vector database for fast, accurate retrieval

### 📊 Advanced Visualization & Analytics

- **Interactive Risk Gauges**: Real-time visual risk assessment
- **Risk Distribution Charts**: Breakdown of risk categories and severity
- **Financial Terms Extraction**: Automatic identification of monetary obligations
- **Timeline Analysis**: Key dates and deadline extraction with visualization

### 💾 Enterprise-Grade Data Management

- **Local Data Persistence**: Secure storage of documents and analysis results
- **Document Library**: Organized management with search and filtering
- **Analysis History**: Complete audit trail of document processing
- **Metadata Extraction**: Automatic tagging and categorization

## 🔒 Privacy & Security

### 🛡️ Data Protection

- **Local Processing**: Documents analyzed locally with secure API calls to Google AI
- **No Data Sharing**: Zero third-party data sharing or storage outside your environment
- **Secure Storage**: Vector embeddings and analysis results stored locally in Chroma database
- **Environment Security**: API keys managed through secure environment variables

### 🔐 Security Best Practices

- **API Key Protection**: Secure credential management with environment-based configuration
- **Local Vector Storage**: Document embeddings stored exclusively on your local system
- **Session Management**: Configurable session timeouts and secure state management
- **Input Validation**: Comprehensive file type and size validation for uploads

### 📋 Data Handling

- **Temporary Upload Storage**: Uploaded files processed and optionally removed from temp storage
- **Persistent Analysis**: Analysis results retained locally for document library functionality
- **User Control**: Complete control over data retention and deletion
- **Audit Trail**: Transparent logging of all document processing activities

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request

## 📄 License

MIT License - see LICENSE file for details.

## 🆘 Support

### 📚 Documentation & Resources

- **In-Code Documentation**: Comprehensive docstrings and code comments throughout the project
- **Configuration Guide**: Detailed environment setup and configuration options above
- **Sample Documents**: Use included sample contracts to understand features and capabilities

### 🐛 Issues & Bug Reports

- **GitHub Issues**: Report bugs, request features, or ask questions via [GitHub Issues](https://github.com/your-repo/Lega.AI/issues)
- **Bug Reports**: Include system info, error logs, and steps to reproduce
- **Feature Requests**: Describe use cases and expected functionality

### 🛠️ Development & API References

- **Google AI Documentation**: [Google AI Developer Guide](https://ai.google.dev/) for Gemini API details
- **LangChain Documentation**: [LangChain Docs](https://python.langchain.com/) for framework reference
- **Streamlit Documentation**: [Streamlit Docs](https://docs.streamlit.io/) for UI framework guidance
- **Chroma Documentation**: [Chroma Docs](https://docs.trychroma.com/) for vector database operations

### 💡 Getting Help

1. **Check Documentation**: Review this README and in-code comments first
2. **Try Sample Documents**: Use provided samples to test functionality
3. **Check Logs**: Review `data/app.log` for detailed error information
4. **Environment Issues**: Verify `.env` configuration and API key validity
5. **Community Support**: Open GitHub discussions for general questions

---

**Made with ❤️ using Streamlit, LangChain, and Google AI**