--- title: Lega.AI emoji: ⚖️ colorFrom: pink colorTo: indigo sdk: docker pinned: false --- # Lega.AI AI-powered legal document analysis and simplification platform that makes complex legal documents accessible to everyone. ![Python](https://img.shields.io/badge/Python-3.13+-blue.svg) ![Streamlit](https://img.shields.io/badge/Streamlit-1.49+-red.svg) ![LangChain](https://img.shields.io/badge/LangChain-0.3+-green.svg) ![License](https://img.shields.io/badge/License-MIT-yellow.svg) ## 📋 Table of Contents - [🚀 Features](#-features) - [🛠️ Tech Stack](#️-tech-stack) - [📋 Prerequisites](#-prerequisites) - [🚀 Quick Start](#-quick-start) - [🐳 Docker Deployment](#-docker-deployment) - [📁 Project Structure](#-project-structure) - [🎯 Usage Guide](#-usage-guide) - [📄 Sample Documents](#-sample-documents) - [🚨 Document Types Supported](#-document-types-supported) - [⚡ Key Features Deep Dive](#-key-features-deep-dive) - [🔧 Configuration Options](#-configuration-options) - [🔒 Privacy & Security](#-privacy--security) - [🤝 Contributing](#-contributing) - [🆘 Support](#-support) - [🎯 Roadmap](#-roadmap) ## 🚀 Features - **🔍 Advanced Document Analysis**: Upload PDF/DOCX/TXT files and get comprehensive AI-powered analysis using Google's Gemini - **📝 Plain Language Translation**: Convert complex legal jargon into clear, understandable language with context-aware explanations - **⚠️ Intelligent Risk Assessment**: Multi-dimensional risk scoring with color-coded severity levels and detailed explanations - **💬 Interactive Q&A Assistant**: Ask specific questions about your documents and get instant, context-aware AI responses - **🎯 Smart Clause Highlighting**: Visual highlighting of risky clauses with interactive tooltips and improvement suggestions - **📊 Vector-Powered Similarity Search**: Find similar clauses across documents using Chroma vector database - **📚 Persistent Document Library**: Organize, search, and manage all analyzed documents with metadata - **⚠️ Risk Visualization**: Interactive charts and gauges showing risk distribution and severity - **🗓️ Key Information Extraction**: Automatically identify important dates, deadlines, and financial terms - **💾 Local Data Persistence**: Secure local storage of analysis results and vector embeddings - **🎨 Modern UI/UX**: Responsive Streamlit interface with custom CSS and intuitive navigation ## 🛠️ Tech Stack - **Frontend**: Streamlit with multi-page navigation and custom CSS styling - **AI/ML**: LangChain + Google Generative AI (Gemini Pro) - **Embeddings**: Google Generative AI Embeddings (models/text-embedding-004) - **Vector Store**: Chroma for document similarity search and persistence - **Document Processing**: PyPDF for PDF extraction, python-docx for Word documents - **Package Management**: UV (modern Python package manager) - **Configuration**: Python-dotenv for environment management - **Visualization**: Plotly for interactive charts and analytics - **UI Components**: Streamlit-option-menu for enhanced navigation ## 📋 Prerequisites - Python 3.13+ (required for latest features and performance) - Google AI API key (get from [Google AI Studio](https://aistudio.google.com/)) - UV package manager (recommended for fast, reliable dependency management) ## 🚀 Quick Start ### 1. **Clone and navigate to the project**: ```bash git clone cd Lega.AI ``` ### 2. **Install UV (if not already installed)**: ```bash # On macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # On Windows (PowerShell) powershell -c "irm https://astral.sh/uv/install.ps1 | iex" # Or using pip pip install uv ``` ### 3. **Set up environment and install dependencies**: ```bash # Create and activate virtual environment with dependencies uv sync # Or if you prefer traditional approach: # uv venv # source .venv/bin/activate # On Windows: .venv\Scripts\activate # uv pip install -r pyproject.toml ``` ### 4. **Configure environment**: ```bash # Copy the template file cp .env.example .env # Edit .env file and update the following required settings: ``` **Required Configuration:** ```env # Get your API key from: https://aistudio.google.com/ GOOGLE_API_KEY=your-google-api-key-here ``` **Optional Configuration (with sensible defaults):** ```env # Application Settings DEBUG=True LOG_LEVEL=INFO STREAMLIT_SERVER_PORT=8501 STREAMLIT_SERVER_ADDRESS=localhost # File Upload Settings MAX_FILE_SIZE_MB=10 SUPPORTED_FILE_TYPES=pdf,docx,txt # AI Model Settings TEMPERATURE=0.2 MAX_TOKENS=2048 EMBEDDING_MODEL=models/text-embedding-004 # Storage Configuration CHROMA_PERSIST_DIRECTORY=./data/chroma_db UPLOAD_DIR=./uploads DATA_DIR=./data LOG_FILE=./data/app.log # Security Settings SECRET_KEY=your-secret-key-here SESSION_TIMEOUT_MINUTES=60 ``` ### 5. **Run the application**: ```bash # If using UV (recommended) uv run streamlit run main.py # Or with activated virtual environment streamlit run main.py ``` ### 6. **Open your browser** to `http://localhost:8501` ### 🎯 Try the Demo Once running, you can immediately test the application with the included sample documents: - Navigate to **📄 Upload** page - Try the sample documents: Employment contracts, NDAs, Lease agreements, Service agreements - Experience the full analysis workflow without needing your own documents ## 🐳 Docker Deployment ### Local Docker Deployment ```bash # Build the Docker image docker build -t lega-ai . # Run the container docker run -p 7860:7860 -e GOOGLE_API_KEY=your_api_key_here lega-ai ``` ### Hugging Face Spaces Deployment Deploy Lega.AI to Hugging Face Spaces with one click! [![Deploy to Hugging Face Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-md.svg)](https://huggingface.co/spaces) **Quick Setup:** 1. Create a new [Hugging Face Space](https://huggingface.co/spaces) with SDK: Docker 2. Upload this repository to your Space 3. Set `GOOGLE_API_KEY` in Space Settings → Variables 4. Your app will be live at `https://huggingface.co/spaces/[username]/[space-name]` 📋 **Detailed Instructions**: See [HUGGINGFACE_DEPLOYMENT.md](./HUGGINGFACE_DEPLOYMENT.md) for complete setup guide. ## 📁 Project Structure ``` Lega.AI/ ├── main.py # Main Streamlit application entry point ├── pyproject.toml # UV/pip package configuration and dependencies ├── requirements.txt # Docker-compatible requirements file ├── uv.lock # UV lockfile for reproducible builds ├── setup.py # Legacy Python package setup ├── Dockerfile # Docker container configuration ├── .dockerignore # Docker build optimization ├── start.sh # Hugging Face Spaces startup script ├── .env.example # Environment variables template ├── .env.hf # Hugging Face Spaces configuration ├── README.md # Project documentation ├── HUGGINGFACE_DEPLOYMENT.md # HF Spaces deployment guide ├── src/ # Main application source code │ ├── __init__.py │ ├── models/ │ │ ├── __init__.py │ │ └── document.py # Document data models and schemas │ ├── services/ │ │ ├── __init__.py │ │ ├── document_processor.py # PDF/DOCX text extraction │ │ ├── ai_analyzer.py # AI analysis and risk assessment │ │ └── vector_store.py # Chroma vector database management │ ├── pages/ │ │ ├── __init__.py │ │ ├── upload.py # Document upload interface │ │ ├── analysis.py # Document analysis dashboard │ │ ├── qa_assistant.py # Interactive Q&A chat interface │ │ ├── library.py # Document library management │ │ └── settings.py # Application settings and configuration │ └── utils/ │ ├── __init__.py │ ├── config.py # Environment configuration management │ ├── logger.py # Logging utilities and setup │ └── helpers.py # Common helper functions ├── sample/ # Sample legal documents for testing │ ├── Employment_Offer_Letter.pdf │ ├── Master_Services_Agreement.pdf │ ├── Mutual_NDA.pdf │ └── Residential_Lease_Agreement.pdf ├── data/ # Local data storage and persistence │ ├── app.log # Application logs │ └── chroma_db/ # Vector database storage └── uploads/ # Temporary file uploads directory ``` ## 🎯 Usage Guide ### 1. Document Upload & Processing - Navigate to **📄 Upload** page - Upload PDF, DOCX, or TXT files (max 10MB per file) - Try the included sample documents for immediate testing - Automatic document type detection and text extraction ### 2. Comprehensive Analysis Dashboard Visit **📊 Analysis** to explore: - **Risk Score Gauge**: Interactive 0-100 risk assessment with color coding - **Side-by-Side Comparison**: Original text vs. simplified plain language - **Risk Factor Breakdown**: Detailed explanations of identified risks with severity levels - **Interactive Clause Highlighting**: Hover over highlighted text for tooltips with suggestions - **Financial & Date Extraction**: Automatic identification of monetary amounts and key dates - **Risk Visualization Charts**: Visual distribution of risk categories and severity ### 3. Interactive Q&A Assistant - Use **💬 Q&A** for document-specific questions and analysis - Get context-aware answers powered by vector similarity search - Access suggested questions based on document type and content - Chat history preservation for reference and record-keeping ### 4. Document Library Management - **📚 Library** provides persistent storage of all analyzed documents - Advanced filtering by document type, risk level, upload date - Full-text search across document content and analysis results - Quick re-analysis and direct access to Q&A for stored documents - Document metadata and analysis summary views ### 5. Settings & Configuration - **⚙️ Settings** for API key management and validation - Application configuration and performance monitoring - Usage statistics and system health information ## 🔧 Configuration Options The application uses environment variables for configuration. All settings can be customized in the `.env` file based on the `.env.example` template. ### 🔑 Required Settings | Variable | Description | Example | | ---------------- | -------------------------------- | ----------------------------- | | `GOOGLE_API_KEY` | Google AI API key for Gemini Pro | `xyz` (from AI Studio) | ### ⚙️ Application Settings | Variable | Default | Description | | -------------------------- | -------------- | ---------------------------------- | | `DEBUG` | `True` | Enable debug mode and verbose logs | | `LOG_LEVEL` | `INFO` | Logging level (DEBUG/INFO/WARNING) | | `STREAMLIT_SERVER_PORT` | `8501` | Port for Streamlit server | | `STREAMLIT_SERVER_ADDRESS` | `localhost` | Server address binding | | `MAX_FILE_SIZE_MB` | `10` | Maximum upload file size | | `SUPPORTED_FILE_TYPES` | `pdf,docx,txt` | Allowed file extensions | ### 🤖 AI Model Settings | Variable | Default | Description | | ----------------- | ---------------------- | -------------------------------- | | `TEMPERATURE` | `0.2` | AI response creativity (0.0-1.0) | | `MAX_TOKENS` | `2048` | Maximum response length | | `EMBEDDING_MODEL` | `models/embedding-001` | Google AI embedding model | ### 💾 Storage Configuration | Variable | Default | Description | | -------------------------- | ------------------ | ---------------------------- | | `CHROMA_PERSIST_DIRECTORY` | `./data/chroma_db` | Vector database storage path | | `UPLOAD_DIR` | `./uploads` | Temporary file uploads | | `DATA_DIR` | `./data` | Application data directory | | `LOG_FILE` | `./data/app.log` | Application log file path | ### 🔒 Security Settings | Variable | Default | Description | | ------------------------- | ------- | ------------------------ | | `SECRET_KEY` | None | Application secret key | | `SESSION_TIMEOUT_MINUTES` | `60` | Session timeout duration | ### Example .env configuration: ```bash # Required GOOGLE_API_KEY=your-google-ai-api-key # Optional (with defaults shown) DEBUG=True LOG_LEVEL=INFO MAX_FILE_SIZE_MB=10 SUPPORTED_FILE_TYPES=pdf,docx,txt CHROMA_PERSIST_DIRECTORY=./data/chroma_db TEMPERATURE=0.2 ``` ## � Sample Documents The project includes professionally-crafted sample legal documents for testing and demonstration: | Document Type | Filename | Purpose | | ---------------------------- | --------------------------------- | ---------------------------------------- | | **Employment Contract** | `Employment_Offer_Letter.pdf` | Test employment-related clause analysis | | **Service Agreement** | `Master_Services_Agreement.pdf` | Demonstrate commercial contract analysis | | **Non-Disclosure Agreement** | `Mutual_NDA.pdf` | Show confidentiality clause assessment | | **Lease Agreement** | `Residential_Lease_Agreement.pdf` | Test rental/property contract analysis | These documents are located in the `sample/` directory and can be uploaded directly through the application to: - Experience the complete analysis workflow - Test different document types and complexity levels - Understand risk assessment capabilities - Explore Q&A functionality with real legal content ## �🚨 Document Types Supported Currently optimized for: - **🏠 Rental/Lease Agreements** - **💰 Loan Contracts** - **💼 Employment Contracts** - **🤝 Service Agreements** - **🔒 Non-Disclosure Agreements (NDAs)** - **📄 General Legal Documents** ## ⚡ Key Features Deep Dive ### 🔍 Advanced Risk Assessment Engine - **Multi-dimensional Analysis**: Evaluates financial, legal commitment, and rights-related risks - **Intelligent Severity Classification**: Categorizes risks as Low, Medium, High, or Critical - **Contextual Risk Scoring**: Dynamic 0-100 scale based on document type and complexity - **Actionable Recommendations**: Specific suggestions for improving problematic clauses ### 📝 AI-Powered Plain Language Translation - **Context-Aware Simplification**: Maintains legal accuracy while improving readability - **Jargon Definition System**: Interactive tooltips for complex legal terms - **Document Type Optimization**: Tailored simplification based on contract category - **Preservation of Legal Intent**: Ensures meaning is not lost in translation ### 🎯 Interactive Clause Analysis - **Smart Highlighting System**: Visual identification of risky and important clauses - **Hover Tooltips**: Immediate access to explanations and suggestions - **Clause Categorization**: Organized by risk type and legal significance - **Improvement Suggestions**: Specific recommendations for clause modifications ### 🔍 Vector-Powered Document Intelligence - **Semantic Search**: Find similar clauses across your document library - **Context-Aware Q&A**: Answers grounded in actual document content - **Document Similarity**: Compare clauses against known patterns and standards - **Persistent Knowledge Base**: Chroma vector database for fast, accurate retrieval ### 📊 Advanced Visualization & Analytics - **Interactive Risk Gauges**: Real-time visual risk assessment - **Risk Distribution Charts**: Breakdown of risk categories and severity - **Financial Terms Extraction**: Automatic identification of monetary obligations - **Timeline Analysis**: Key dates and deadline extraction with visualization ### 💾 Enterprise-Grade Data Management - **Local Data Persistence**: Secure storage of documents and analysis results - **Document Library**: Organized management with search and filtering - **Analysis History**: Complete audit trail of document processing - **Metadata Extraction**: Automatic tagging and categorization ## 🔒 Privacy & Security ### 🛡️ Data Protection - **Local Processing**: Documents analyzed locally with secure API calls to Google AI - **No Data Sharing**: Zero third-party data sharing or storage outside your environment - **Secure Storage**: Vector embeddings and analysis results stored locally in Chroma database - **Environment Security**: API keys managed through secure environment variables ### 🔐 Security Best Practices - **API Key Protection**: Secure credential management with environment-based configuration - **Local Vector Storage**: Document embeddings stored exclusively on your local system - **Session Management**: Configurable session timeouts and secure state management - **Input Validation**: Comprehensive file type and size validation for uploads ### 📋 Data Handling - **Temporary Upload Storage**: Uploaded files processed and optionally removed from temp storage - **Persistent Analysis**: Analysis results retained locally for document library functionality - **User Control**: Complete control over data retention and deletion - **Audit Trail**: Transparent logging of all document processing activities ## 🤝 Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Test thoroughly 5. Submit a pull request ## 📄 License MIT License - see LICENSE file for details. ## 🆘 Support ### 📚 Documentation & Resources - **In-Code Documentation**: Comprehensive docstrings and code comments throughout the project - **Configuration Guide**: Detailed environment setup and configuration options above - **Sample Documents**: Use included sample contracts to understand features and capabilities ### 🐛 Issues & Bug Reports - **GitHub Issues**: Report bugs, request features, or ask questions via [GitHub Issues](https://github.com/your-repo/Lega.AI/issues) - **Bug Reports**: Include system info, error logs, and steps to reproduce - **Feature Requests**: Describe use cases and expected functionality ### 🛠️ Development & API References - **Google AI Documentation**: [Google AI Developer Guide](https://ai.google.dev/) for Gemini API details - **LangChain Documentation**: [LangChain Docs](https://python.langchain.com/) for framework reference - **Streamlit Documentation**: [Streamlit Docs](https://docs.streamlit.io/) for UI framework guidance - **Chroma Documentation**: [Chroma Docs](https://docs.trychroma.com/) for vector database operations ### 💡 Getting Help 1. **Check Documentation**: Review this README and in-code comments first 2. **Try Sample Documents**: Use provided samples to test functionality 3. **Check Logs**: Review `data/app.log` for detailed error information 4. **Environment Issues**: Verify `.env` configuration and API key validity 5. **Community Support**: Open GitHub discussions for general questions --- **Made with ❤️ using Streamlit, LangChain, and Google AI**