Lega.AI / README.md
CoderNoah
Initial commit
8b7e8f0
metadata
title: Lega.AI
emoji: ⚖️
colorFrom: pink
colorTo: indigo
sdk: docker
pinned: false

Lega.AI

AI-powered legal document analysis and simplification platform that makes complex legal documents accessible to everyone.

Python Streamlit LangChain License

📋 Table of Contents

🚀 Features

  • 🔍 Advanced Document Analysis: Upload PDF/DOCX/TXT files and get comprehensive AI-powered analysis using Google's Gemini
  • 📝 Plain Language Translation: Convert complex legal jargon into clear, understandable language with context-aware explanations
  • ⚠️ Intelligent Risk Assessment: Multi-dimensional risk scoring with color-coded severity levels and detailed explanations
  • 💬 Interactive Q&A Assistant: Ask specific questions about your documents and get instant, context-aware AI responses
  • 🎯 Smart Clause Highlighting: Visual highlighting of risky clauses with interactive tooltips and improvement suggestions
  • 📊 Vector-Powered Similarity Search: Find similar clauses across documents using Chroma vector database
  • 📚 Persistent Document Library: Organize, search, and manage all analyzed documents with metadata
  • ⚠️ Risk Visualization: Interactive charts and gauges showing risk distribution and severity
  • 🗓️ Key Information Extraction: Automatically identify important dates, deadlines, and financial terms
  • 💾 Local Data Persistence: Secure local storage of analysis results and vector embeddings
  • 🎨 Modern UI/UX: Responsive Streamlit interface with custom CSS and intuitive navigation

🛠️ Tech Stack

  • Frontend: Streamlit with multi-page navigation and custom CSS styling
  • AI/ML: LangChain + Google Generative AI (Gemini Pro)
  • Embeddings: Google Generative AI Embeddings (models/text-embedding-004)
  • Vector Store: Chroma for document similarity search and persistence
  • Document Processing: PyPDF for PDF extraction, python-docx for Word documents
  • Package Management: UV (modern Python package manager)
  • Configuration: Python-dotenv for environment management
  • Visualization: Plotly for interactive charts and analytics
  • UI Components: Streamlit-option-menu for enhanced navigation

📋 Prerequisites

  • Python 3.13+ (required for latest features and performance)
  • Google AI API key (get from Google AI Studio)
  • UV package manager (recommended for fast, reliable dependency management)

🚀 Quick Start

1. Clone and navigate to the project:

git clone <repository-url>
cd Lega.AI

2. Install UV (if not already installed):

# On macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or using pip
pip install uv

3. Set up environment and install dependencies:

# Create and activate virtual environment with dependencies
uv sync

# Or if you prefer traditional approach:
# uv venv
# source .venv/bin/activate  # On Windows: .venv\Scripts\activate
# uv pip install -r pyproject.toml

4. Configure environment:

# Copy the template file
cp .env.example .env

# Edit .env file and update the following required settings:

Required Configuration:

# Get your API key from: https://aistudio.google.com/
GOOGLE_API_KEY=your-google-api-key-here

Optional Configuration (with sensible defaults):

# Application Settings
DEBUG=True
LOG_LEVEL=INFO
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=localhost

# File Upload Settings
MAX_FILE_SIZE_MB=10
SUPPORTED_FILE_TYPES=pdf,docx,txt

# AI Model Settings
TEMPERATURE=0.2
MAX_TOKENS=2048
EMBEDDING_MODEL=models/text-embedding-004

# Storage Configuration
CHROMA_PERSIST_DIRECTORY=./data/chroma_db
UPLOAD_DIR=./uploads
DATA_DIR=./data
LOG_FILE=./data/app.log

# Security Settings
SECRET_KEY=your-secret-key-here
SESSION_TIMEOUT_MINUTES=60

5. Run the application:

# If using UV (recommended)
uv run streamlit run main.py

# Or with activated virtual environment
streamlit run main.py

6. Open your browser to http://localhost:8501

🎯 Try the Demo

Once running, you can immediately test the application with the included sample documents:

  • Navigate to 📄 Upload page
  • Try the sample documents: Employment contracts, NDAs, Lease agreements, Service agreements
  • Experience the full analysis workflow without needing your own documents

🐳 Docker Deployment

Local Docker Deployment

# Build the Docker image
docker build -t lega-ai .

# Run the container
docker run -p 7860:7860 -e GOOGLE_API_KEY=your_api_key_here lega-ai

Hugging Face Spaces Deployment

Deploy Lega.AI to Hugging Face Spaces with one click!

Deploy to Hugging Face Spaces

Quick Setup:

  1. Create a new Hugging Face Space with SDK: Docker
  2. Upload this repository to your Space
  3. Set GOOGLE_API_KEY in Space Settings → Variables
  4. Your app will be live at https://huggingface.co/spaces/[username]/[space-name]

📋 Detailed Instructions: See HUGGINGFACE_DEPLOYMENT.md for complete setup guide.

📁 Project Structure

Lega.AI/
├── main.py                 # Main Streamlit application entry point
├── pyproject.toml          # UV/pip package configuration and dependencies
├── requirements.txt        # Docker-compatible requirements file
├── uv.lock                 # UV lockfile for reproducible builds
├── setup.py                # Legacy Python package setup
├── Dockerfile              # Docker container configuration
├── .dockerignore          # Docker build optimization
├── start.sh               # Hugging Face Spaces startup script
├── .env.example           # Environment variables template
├── .env.hf                # Hugging Face Spaces configuration
├── README.md              # Project documentation
├── HUGGINGFACE_DEPLOYMENT.md # HF Spaces deployment guide
├── src/                   # Main application source code
│   ├── __init__.py
│   ├── models/
│   │   ├── __init__.py
│   │   └── document.py    # Document data models and schemas
│   ├── services/
│   │   ├── __init__.py
│   │   ├── document_processor.py  # PDF/DOCX text extraction
│   │   ├── ai_analyzer.py         # AI analysis and risk assessment
│   │   └── vector_store.py        # Chroma vector database management
│   ├── pages/
│   │   ├── __init__.py
│   │   ├── upload.py      # Document upload interface
│   │   ├── analysis.py    # Document analysis dashboard
│   │   ├── qa_assistant.py # Interactive Q&A chat interface
│   │   ├── library.py     # Document library management
│   │   └── settings.py    # Application settings and configuration
│   └── utils/
│       ├── __init__.py
│       ├── config.py      # Environment configuration management
│       ├── logger.py      # Logging utilities and setup
│       └── helpers.py     # Common helper functions
├── sample/                # Sample legal documents for testing
│   ├── Employment_Offer_Letter.pdf
│   ├── Master_Services_Agreement.pdf
│   ├── Mutual_NDA.pdf
│   └── Residential_Lease_Agreement.pdf
├── data/                  # Local data storage and persistence
│   ├── app.log           # Application logs
│   └── chroma_db/        # Vector database storage
└── uploads/              # Temporary file uploads directory

🎯 Usage Guide

1. Document Upload & Processing

  • Navigate to 📄 Upload page
  • Upload PDF, DOCX, or TXT files (max 10MB per file)
  • Try the included sample documents for immediate testing
  • Automatic document type detection and text extraction

2. Comprehensive Analysis Dashboard

Visit 📊 Analysis to explore:

  • Risk Score Gauge: Interactive 0-100 risk assessment with color coding
  • Side-by-Side Comparison: Original text vs. simplified plain language
  • Risk Factor Breakdown: Detailed explanations of identified risks with severity levels
  • Interactive Clause Highlighting: Hover over highlighted text for tooltips with suggestions
  • Financial & Date Extraction: Automatic identification of monetary amounts and key dates
  • Risk Visualization Charts: Visual distribution of risk categories and severity

3. Interactive Q&A Assistant

  • Use 💬 Q&A for document-specific questions and analysis
  • Get context-aware answers powered by vector similarity search
  • Access suggested questions based on document type and content
  • Chat history preservation for reference and record-keeping

4. Document Library Management

  • 📚 Library provides persistent storage of all analyzed documents
  • Advanced filtering by document type, risk level, upload date
  • Full-text search across document content and analysis results
  • Quick re-analysis and direct access to Q&A for stored documents
  • Document metadata and analysis summary views

5. Settings & Configuration

  • ⚙️ Settings for API key management and validation
  • Application configuration and performance monitoring
  • Usage statistics and system health information

🔧 Configuration Options

The application uses environment variables for configuration. All settings can be customized in the .env file based on the .env.example template.

🔑 Required Settings

Variable Description Example
GOOGLE_API_KEY Google AI API key for Gemini Pro xyz (from AI Studio)

⚙️ Application Settings

Variable Default Description
DEBUG True Enable debug mode and verbose logs
LOG_LEVEL INFO Logging level (DEBUG/INFO/WARNING)
STREAMLIT_SERVER_PORT 8501 Port for Streamlit server
STREAMLIT_SERVER_ADDRESS localhost Server address binding
MAX_FILE_SIZE_MB 10 Maximum upload file size
SUPPORTED_FILE_TYPES pdf,docx,txt Allowed file extensions

🤖 AI Model Settings

Variable Default Description
TEMPERATURE 0.2 AI response creativity (0.0-1.0)
MAX_TOKENS 2048 Maximum response length
EMBEDDING_MODEL models/embedding-001 Google AI embedding model

💾 Storage Configuration

Variable Default Description
CHROMA_PERSIST_DIRECTORY ./data/chroma_db Vector database storage path
UPLOAD_DIR ./uploads Temporary file uploads
DATA_DIR ./data Application data directory
LOG_FILE ./data/app.log Application log file path

🔒 Security Settings

Variable Default Description
SECRET_KEY None Application secret key
SESSION_TIMEOUT_MINUTES 60 Session timeout duration

Example .env configuration:

# Required
GOOGLE_API_KEY=your-google-ai-api-key

# Optional (with defaults shown)
DEBUG=True
LOG_LEVEL=INFO
MAX_FILE_SIZE_MB=10
SUPPORTED_FILE_TYPES=pdf,docx,txt
CHROMA_PERSIST_DIRECTORY=./data/chroma_db
TEMPERATURE=0.2

� Sample Documents

The project includes professionally-crafted sample legal documents for testing and demonstration:

Document Type Filename Purpose
Employment Contract Employment_Offer_Letter.pdf Test employment-related clause analysis
Service Agreement Master_Services_Agreement.pdf Demonstrate commercial contract analysis
Non-Disclosure Agreement Mutual_NDA.pdf Show confidentiality clause assessment
Lease Agreement Residential_Lease_Agreement.pdf Test rental/property contract analysis

These documents are located in the sample/ directory and can be uploaded directly through the application to:

  • Experience the complete analysis workflow
  • Test different document types and complexity levels
  • Understand risk assessment capabilities
  • Explore Q&A functionality with real legal content

�🚨 Document Types Supported

Currently optimized for:

  • 🏠 Rental/Lease Agreements
  • 💰 Loan Contracts
  • 💼 Employment Contracts
  • 🤝 Service Agreements
  • 🔒 Non-Disclosure Agreements (NDAs)
  • 📄 General Legal Documents

⚡ Key Features Deep Dive

🔍 Advanced Risk Assessment Engine

  • Multi-dimensional Analysis: Evaluates financial, legal commitment, and rights-related risks
  • Intelligent Severity Classification: Categorizes risks as Low, Medium, High, or Critical
  • Contextual Risk Scoring: Dynamic 0-100 scale based on document type and complexity
  • Actionable Recommendations: Specific suggestions for improving problematic clauses

📝 AI-Powered Plain Language Translation

  • Context-Aware Simplification: Maintains legal accuracy while improving readability
  • Jargon Definition System: Interactive tooltips for complex legal terms
  • Document Type Optimization: Tailored simplification based on contract category
  • Preservation of Legal Intent: Ensures meaning is not lost in translation

🎯 Interactive Clause Analysis

  • Smart Highlighting System: Visual identification of risky and important clauses
  • Hover Tooltips: Immediate access to explanations and suggestions
  • Clause Categorization: Organized by risk type and legal significance
  • Improvement Suggestions: Specific recommendations for clause modifications

🔍 Vector-Powered Document Intelligence

  • Semantic Search: Find similar clauses across your document library
  • Context-Aware Q&A: Answers grounded in actual document content
  • Document Similarity: Compare clauses against known patterns and standards
  • Persistent Knowledge Base: Chroma vector database for fast, accurate retrieval

📊 Advanced Visualization & Analytics

  • Interactive Risk Gauges: Real-time visual risk assessment
  • Risk Distribution Charts: Breakdown of risk categories and severity
  • Financial Terms Extraction: Automatic identification of monetary obligations
  • Timeline Analysis: Key dates and deadline extraction with visualization

💾 Enterprise-Grade Data Management

  • Local Data Persistence: Secure storage of documents and analysis results
  • Document Library: Organized management with search and filtering
  • Analysis History: Complete audit trail of document processing
  • Metadata Extraction: Automatic tagging and categorization

🔒 Privacy & Security

🛡️ Data Protection

  • Local Processing: Documents analyzed locally with secure API calls to Google AI
  • No Data Sharing: Zero third-party data sharing or storage outside your environment
  • Secure Storage: Vector embeddings and analysis results stored locally in Chroma database
  • Environment Security: API keys managed through secure environment variables

🔐 Security Best Practices

  • API Key Protection: Secure credential management with environment-based configuration
  • Local Vector Storage: Document embeddings stored exclusively on your local system
  • Session Management: Configurable session timeouts and secure state management
  • Input Validation: Comprehensive file type and size validation for uploads

📋 Data Handling

  • Temporary Upload Storage: Uploaded files processed and optionally removed from temp storage
  • Persistent Analysis: Analysis results retained locally for document library functionality
  • User Control: Complete control over data retention and deletion
  • Audit Trail: Transparent logging of all document processing activities

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

📄 License

MIT License - see LICENSE file for details.

🆘 Support

📚 Documentation & Resources

  • In-Code Documentation: Comprehensive docstrings and code comments throughout the project
  • Configuration Guide: Detailed environment setup and configuration options above
  • Sample Documents: Use included sample contracts to understand features and capabilities

🐛 Issues & Bug Reports

  • GitHub Issues: Report bugs, request features, or ask questions via GitHub Issues
  • Bug Reports: Include system info, error logs, and steps to reproduce
  • Feature Requests: Describe use cases and expected functionality

🛠️ Development & API References

💡 Getting Help

  1. Check Documentation: Review this README and in-code comments first
  2. Try Sample Documents: Use provided samples to test functionality
  3. Check Logs: Review data/app.log for detailed error information
  4. Environment Issues: Verify .env configuration and API key validity
  5. Community Support: Open GitHub discussions for general questions

Made with ❤️ using Streamlit, LangChain, and Google AI