Spaces:

CoderNoah
/

Lega.AI

Sleeping

App Files Files Community

Lega.AI / README.md

CoderNoah

Initial commit

8b7e8f0 6 months ago

preview code

raw

history blame contribute delete

19.7 kB

	---
	title: Lega.AI
	emoji: ⚖️
	colorFrom: pink
	colorTo: indigo
	sdk: docker
	pinned: false
	---

	# Lega.AI

	AI-powered legal document analysis and simplification platform that makes complex legal documents accessible to everyone.

	![Python](https://img.shields.io/badge/Python-3.13+-blue.svg)
	![Streamlit](https://img.shields.io/badge/Streamlit-1.49+-red.svg)
	![LangChain](https://img.shields.io/badge/LangChain-0.3+-green.svg)
	![License](https://img.shields.io/badge/License-MIT-yellow.svg)

	## 📋 Table of Contents

	- [🚀 Features](#-features)
	- [🛠️ Tech Stack](#️-tech-stack)
	- [📋 Prerequisites](#-prerequisites)
	- [🚀 Quick Start](#-quick-start)
	- [🐳 Docker Deployment](#-docker-deployment)
	- [📁 Project Structure](#-project-structure)
	- [🎯 Usage Guide](#-usage-guide)
	- [📄 Sample Documents](#-sample-documents)
	- [🚨 Document Types Supported](#-document-types-supported)
	- [⚡ Key Features Deep Dive](#-key-features-deep-dive)
	- [🔧 Configuration Options](#-configuration-options)
	- [🔒 Privacy & Security](#-privacy--security)
	- [🤝 Contributing](#-contributing)
	- [🆘 Support](#-support)
	- [🎯 Roadmap](#-roadmap)

	## 🚀 Features

	- 🔍 Advanced Document Analysis: Upload PDF/DOCX/TXT files and get comprehensive AI-powered analysis using Google's Gemini
	- 📝 Plain Language Translation: Convert complex legal jargon into clear, understandable language with context-aware explanations
	- ⚠️ Intelligent Risk Assessment: Multi-dimensional risk scoring with color-coded severity levels and detailed explanations
	- 💬 Interactive Q&A Assistant: Ask specific questions about your documents and get instant, context-aware AI responses
	- 🎯 Smart Clause Highlighting: Visual highlighting of risky clauses with interactive tooltips and improvement suggestions
	- 📊 Vector-Powered Similarity Search: Find similar clauses across documents using Chroma vector database
	- 📚 Persistent Document Library: Organize, search, and manage all analyzed documents with metadata
	- ⚠️ Risk Visualization: Interactive charts and gauges showing risk distribution and severity
	- 🗓️ Key Information Extraction: Automatically identify important dates, deadlines, and financial terms
	- 💾 Local Data Persistence: Secure local storage of analysis results and vector embeddings
	- 🎨 Modern UI/UX: Responsive Streamlit interface with custom CSS and intuitive navigation

	## 🛠️ Tech Stack

	- Frontend: Streamlit with multi-page navigation and custom CSS styling
	- AI/ML: LangChain + Google Generative AI (Gemini Pro)
	- Embeddings: Google Generative AI Embeddings (models/text-embedding-004)
	- Vector Store: Chroma for document similarity search and persistence
	- Document Processing: PyPDF for PDF extraction, python-docx for Word documents
	- Package Management: UV (modern Python package manager)
	- Configuration: Python-dotenv for environment management
	- Visualization: Plotly for interactive charts and analytics
	- UI Components: Streamlit-option-menu for enhanced navigation

	## 📋 Prerequisites

	- Python 3.13+ (required for latest features and performance)
	- Google AI API key (get from [Google AI Studio](https://aistudio.google.com/))
	- UV package manager (recommended for fast, reliable dependency management)

	## 🚀 Quick Start

	### 1. Clone and navigate to the project:

	```bash
	git clone <repository-url>
	cd Lega.AI
	```

	### 2. Install UV (if not already installed):

	```bash
	# On macOS/Linux
	curl -LsSf https://astral.sh/uv/install.sh \| sh

	# On Windows (PowerShell)
	powershell -c "irm https://astral.sh/uv/install.ps1 \| iex"

	# Or using pip
	pip install uv
	```

	### 3. Set up environment and install dependencies:

	```bash
	# Create and activate virtual environment with dependencies
	uv sync

	# Or if you prefer traditional approach:
	# uv venv
	# source .venv/bin/activate # On Windows: .venv\Scripts\activate
	# uv pip install -r pyproject.toml
	```

	### 4. Configure environment:

	```bash
	# Copy the template file
	cp .env.example .env

	# Edit .env file and update the following required settings:
	```

	Required Configuration:

	```env
	# Get your API key from: https://aistudio.google.com/
	GOOGLE_API_KEY=your-google-api-key-here
	```

	Optional Configuration (with sensible defaults):

	```env
	# Application Settings
	DEBUG=True
	LOG_LEVEL=INFO
	STREAMLIT_SERVER_PORT=8501
	STREAMLIT_SERVER_ADDRESS=localhost

	# File Upload Settings
	MAX_FILE_SIZE_MB=10
	SUPPORTED_FILE_TYPES=pdf,docx,txt

	# AI Model Settings
	TEMPERATURE=0.2
	MAX_TOKENS=2048
	EMBEDDING_MODEL=models/text-embedding-004

	# Storage Configuration
	CHROMA_PERSIST_DIRECTORY=./data/chroma_db
	UPLOAD_DIR=./uploads
	DATA_DIR=./data
	LOG_FILE=./data/app.log

	# Security Settings
	SECRET_KEY=your-secret-key-here
	SESSION_TIMEOUT_MINUTES=60
	```

	### 5. Run the application:

	```bash
	# If using UV (recommended)
	uv run streamlit run main.py

	# Or with activated virtual environment
	streamlit run main.py
	```

	### 6. Open your browser to `http://localhost:8501`

	### 🎯 Try the Demo

	Once running, you can immediately test the application with the included sample documents:

	- Navigate to 📄 Upload page
	- Try the sample documents: Employment contracts, NDAs, Lease agreements, Service agreements
	- Experience the full analysis workflow without needing your own documents

	## 🐳 Docker Deployment

	### Local Docker Deployment

	```bash
	# Build the Docker image
	docker build -t lega-ai .

	# Run the container
	docker run -p 7860:7860 -e GOOGLE_API_KEY=your_api_key_here lega-ai
	```

	### Hugging Face Spaces Deployment

	Deploy Lega.AI to Hugging Face Spaces with one click!

	[![Deploy to Hugging Face Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/deploy-to-spaces-md.svg)](https://huggingface.co/spaces)

	Quick Setup:

	1. Create a new [Hugging Face Space](https://huggingface.co/spaces) with SDK: Docker
	2. Upload this repository to your Space
	3. Set `GOOGLE_API_KEY` in Space Settings → Variables
	4. Your app will be live at `https://huggingface.co/spaces/[username]/[space-name]`

	📋 Detailed Instructions: See [HUGGINGFACE_DEPLOYMENT.md](./HUGGINGFACE_DEPLOYMENT.md) for complete setup guide.

	## 📁 Project Structure

	```
	Lega.AI/
	├── main.py # Main Streamlit application entry point
	├── pyproject.toml # UV/pip package configuration and dependencies
	├── requirements.txt # Docker-compatible requirements file
	├── uv.lock # UV lockfile for reproducible builds
	├── setup.py # Legacy Python package setup
	├── Dockerfile # Docker container configuration
	├── .dockerignore # Docker build optimization
	├── start.sh # Hugging Face Spaces startup script
	├── .env.example # Environment variables template
	├── .env.hf # Hugging Face Spaces configuration
	├── README.md # Project documentation
	├── HUGGINGFACE_DEPLOYMENT.md # HF Spaces deployment guide
	├── src/ # Main application source code
	│ ├── __init__.py
	│ ├── models/
	│ │ ├── __init__.py
	│ │ └── document.py # Document data models and schemas
	│ ├── services/
	│ │ ├── __init__.py
	│ │ ├── document_processor.py # PDF/DOCX text extraction
	│ │ ├── ai_analyzer.py # AI analysis and risk assessment
	│ │ └── vector_store.py # Chroma vector database management
	│ ├── pages/
	│ │ ├── __init__.py
	│ │ ├── upload.py # Document upload interface
	│ │ ├── analysis.py # Document analysis dashboard
	│ │ ├── qa_assistant.py # Interactive Q&A chat interface
	│ │ ├── library.py # Document library management
	│ │ └── settings.py # Application settings and configuration
	│ └── utils/
	│ ├── __init__.py
	│ ├── config.py # Environment configuration management
	│ ├── logger.py # Logging utilities and setup
	│ └── helpers.py # Common helper functions
	├── sample/ # Sample legal documents for testing
	│ ├── Employment_Offer_Letter.pdf
	│ ├── Master_Services_Agreement.pdf
	│ ├── Mutual_NDA.pdf
	│ └── Residential_Lease_Agreement.pdf
	├── data/ # Local data storage and persistence
	│ ├── app.log # Application logs
	│ └── chroma_db/ # Vector database storage
	└── uploads/ # Temporary file uploads directory
	```

	## 🎯 Usage Guide

	### 1. Document Upload & Processing

	- Navigate to 📄 Upload page
	- Upload PDF, DOCX, or TXT files (max 10MB per file)
	- Try the included sample documents for immediate testing
	- Automatic document type detection and text extraction

	### 2. Comprehensive Analysis Dashboard

	Visit 📊 Analysis to explore:

	- Risk Score Gauge: Interactive 0-100 risk assessment with color coding
	- Side-by-Side Comparison: Original text vs. simplified plain language
	- Risk Factor Breakdown: Detailed explanations of identified risks with severity levels
	- Interactive Clause Highlighting: Hover over highlighted text for tooltips with suggestions
	- Financial & Date Extraction: Automatic identification of monetary amounts and key dates
	- Risk Visualization Charts: Visual distribution of risk categories and severity

	### 3. Interactive Q&A Assistant

	- Use 💬 Q&A for document-specific questions and analysis
	- Get context-aware answers powered by vector similarity search
	- Access suggested questions based on document type and content
	- Chat history preservation for reference and record-keeping

	### 4. Document Library Management

	- 📚 Library provides persistent storage of all analyzed documents
	- Advanced filtering by document type, risk level, upload date
	- Full-text search across document content and analysis results
	- Quick re-analysis and direct access to Q&A for stored documents
	- Document metadata and analysis summary views

	### 5. Settings & Configuration

	- ⚙️ Settings for API key management and validation
	- Application configuration and performance monitoring
	- Usage statistics and system health information

	## 🔧 Configuration Options

	The application uses environment variables for configuration. All settings can be customized in the `.env` file based on the `.env.example` template.

	### 🔑 Required Settings

	\| Variable \| Description \| Example \|
	\| ---------------- \| -------------------------------- \| ----------------------------- \|
	\| `GOOGLE_API_KEY` \| Google AI API key for Gemini Pro \| `xyz` (from AI Studio) \|

	### ⚙️ Application Settings

	\| Variable \| Default \| Description \|
	\| -------------------------- \| -------------- \| ---------------------------------- \|
	\| `DEBUG` \| `True` \| Enable debug mode and verbose logs \|
	\| `LOG_LEVEL` \| `INFO` \| Logging level (DEBUG/INFO/WARNING) \|
	\| `STREAMLIT_SERVER_PORT` \| `8501` \| Port for Streamlit server \|
	\| `STREAMLIT_SERVER_ADDRESS` \| `localhost` \| Server address binding \|
	\| `MAX_FILE_SIZE_MB` \| `10` \| Maximum upload file size \|
	\| `SUPPORTED_FILE_TYPES` \| `pdf,docx,txt` \| Allowed file extensions \|

	### 🤖 AI Model Settings

	\| Variable \| Default \| Description \|
	\| ----------------- \| ---------------------- \| -------------------------------- \|
	\| `TEMPERATURE` \| `0.2` \| AI response creativity (0.0-1.0) \|
	\| `MAX_TOKENS` \| `2048` \| Maximum response length \|
	\| `EMBEDDING_MODEL` \| `models/embedding-001` \| Google AI embedding model \|

	### 💾 Storage Configuration

	\| Variable \| Default \| Description \|
	\| -------------------------- \| ------------------ \| ---------------------------- \|
	\| `CHROMA_PERSIST_DIRECTORY` \| `./data/chroma_db` \| Vector database storage path \|
	\| `UPLOAD_DIR` \| `./uploads` \| Temporary file uploads \|
	\| `DATA_DIR` \| `./data` \| Application data directory \|
	\| `LOG_FILE` \| `./data/app.log` \| Application log file path \|

	### 🔒 Security Settings

	\| Variable \| Default \| Description \|
	\| ------------------------- \| ------- \| ------------------------ \|
	\| `SECRET_KEY` \| None \| Application secret key \|
	\| `SESSION_TIMEOUT_MINUTES` \| `60` \| Session timeout duration \|

	### Example .env configuration:

	```bash
	# Required
	GOOGLE_API_KEY=your-google-ai-api-key

	# Optional (with defaults shown)
	DEBUG=True
	LOG_LEVEL=INFO
	MAX_FILE_SIZE_MB=10
	SUPPORTED_FILE_TYPES=pdf,docx,txt
	CHROMA_PERSIST_DIRECTORY=./data/chroma_db
	TEMPERATURE=0.2
	```

	## � Sample Documents

	The project includes professionally-crafted sample legal documents for testing and demonstration:

	\| Document Type \| Filename \| Purpose \|
	\| ---------------------------- \| --------------------------------- \| ---------------------------------------- \|
	\| Employment Contract \| `Employment_Offer_Letter.pdf` \| Test employment-related clause analysis \|
	\| Service Agreement \| `Master_Services_Agreement.pdf` \| Demonstrate commercial contract analysis \|
	\| Non-Disclosure Agreement \| `Mutual_NDA.pdf` \| Show confidentiality clause assessment \|
	\| Lease Agreement \| `Residential_Lease_Agreement.pdf` \| Test rental/property contract analysis \|

	These documents are located in the `sample/` directory and can be uploaded directly through the application to:

	- Experience the complete analysis workflow
	- Test different document types and complexity levels
	- Understand risk assessment capabilities
	- Explore Q&A functionality with real legal content

	## �🚨 Document Types Supported

	Currently optimized for:

	- 🏠 Rental/Lease Agreements
	- 💰 Loan Contracts
	- 💼 Employment Contracts
	- 🤝 Service Agreements
	- 🔒 Non-Disclosure Agreements (NDAs)
	- 📄 General Legal Documents

	## ⚡ Key Features Deep Dive

	### 🔍 Advanced Risk Assessment Engine

	- Multi-dimensional Analysis: Evaluates financial, legal commitment, and rights-related risks
	- Intelligent Severity Classification: Categorizes risks as Low, Medium, High, or Critical
	- Contextual Risk Scoring: Dynamic 0-100 scale based on document type and complexity
	- Actionable Recommendations: Specific suggestions for improving problematic clauses

	### 📝 AI-Powered Plain Language Translation

	- Context-Aware Simplification: Maintains legal accuracy while improving readability
	- Jargon Definition System: Interactive tooltips for complex legal terms
	- Document Type Optimization: Tailored simplification based on contract category
	- Preservation of Legal Intent: Ensures meaning is not lost in translation

	### 🎯 Interactive Clause Analysis

	- Smart Highlighting System: Visual identification of risky and important clauses
	- Hover Tooltips: Immediate access to explanations and suggestions
	- Clause Categorization: Organized by risk type and legal significance
	- Improvement Suggestions: Specific recommendations for clause modifications

	### 🔍 Vector-Powered Document Intelligence

	- Semantic Search: Find similar clauses across your document library
	- Context-Aware Q&A: Answers grounded in actual document content
	- Document Similarity: Compare clauses against known patterns and standards
	- Persistent Knowledge Base: Chroma vector database for fast, accurate retrieval

	### 📊 Advanced Visualization & Analytics

	- Interactive Risk Gauges: Real-time visual risk assessment
	- Risk Distribution Charts: Breakdown of risk categories and severity
	- Financial Terms Extraction: Automatic identification of monetary obligations
	- Timeline Analysis: Key dates and deadline extraction with visualization

	### 💾 Enterprise-Grade Data Management

	- Local Data Persistence: Secure storage of documents and analysis results
	- Document Library: Organized management with search and filtering
	- Analysis History: Complete audit trail of document processing
	- Metadata Extraction: Automatic tagging and categorization

	## 🔒 Privacy & Security

	### 🛡️ Data Protection

	- Local Processing: Documents analyzed locally with secure API calls to Google AI
	- No Data Sharing: Zero third-party data sharing or storage outside your environment
	- Secure Storage: Vector embeddings and analysis results stored locally in Chroma database
	- Environment Security: API keys managed through secure environment variables

	### 🔐 Security Best Practices

	- API Key Protection: Secure credential management with environment-based configuration
	- Local Vector Storage: Document embeddings stored exclusively on your local system
	- Session Management: Configurable session timeouts and secure state management
	- Input Validation: Comprehensive file type and size validation for uploads

	### 📋 Data Handling

	- Temporary Upload Storage: Uploaded files processed and optionally removed from temp storage
	- Persistent Analysis: Analysis results retained locally for document library functionality
	- User Control: Complete control over data retention and deletion
	- Audit Trail: Transparent logging of all document processing activities

	## 🤝 Contributing

	1. Fork the repository
	2. Create a feature branch
	3. Make your changes
	4. Test thoroughly
	5. Submit a pull request

	## 📄 License

	MIT License - see LICENSE file for details.

	## 🆘 Support

	### 📚 Documentation & Resources

	- In-Code Documentation: Comprehensive docstrings and code comments throughout the project
	- Configuration Guide: Detailed environment setup and configuration options above
	- Sample Documents: Use included sample contracts to understand features and capabilities

	### 🐛 Issues & Bug Reports

	- GitHub Issues: Report bugs, request features, or ask questions via [GitHub Issues](https://github.com/your-repo/Lega.AI/issues)
	- Bug Reports: Include system info, error logs, and steps to reproduce
	- Feature Requests: Describe use cases and expected functionality

	### 🛠️ Development & API References

	- Google AI Documentation: [Google AI Developer Guide](https://ai.google.dev/) for Gemini API details
	- LangChain Documentation: [LangChain Docs](https://python.langchain.com/) for framework reference
	- Streamlit Documentation: [Streamlit Docs](https://docs.streamlit.io/) for UI framework guidance
	- Chroma Documentation: [Chroma Docs](https://docs.trychroma.com/) for vector database operations

	### 💡 Getting Help

	1. Check Documentation: Review this README and in-code comments first
	2. Try Sample Documents: Use provided samples to test functionality
	3. Check Logs: Review `data/app.log` for detailed error information
	4. Environment Issues: Verify `.env` configuration and API key validity
	5. Community Support: Open GitHub discussions for general questions

	---

	Made with ❤️ using Streamlit, LangChain, and Google AI