Spaces:

Princess3
/

l

Build error

App Files Files Community

Princess3 commited on Aug 25, 2025

Commit

0be65bd

verified ·

1 Parent(s): ff080b8

Update README.md

Browse files

Files changed (1) hide show

README.md +73 -264

README.md CHANGED Viewed

@@ -1,300 +1,109 @@
-# NZ Legislation Loophole Analysis Streamlit App
-A modern, AI-powered web application for analyzing New Zealand legislation to identify potential loopholes, ambiguities, and unintended consequences.
-## 🌟 Features
-### 🤖 AI-Powered Analysis
-- **Legal Expertise**: Specialized analysis for NZ legislation with Treaty of Waitangi references
-- **Multiple Analysis Types**: Standard, Detailed, and Comprehensive analysis modes
-- **Intelligent Chunking**: Sentence-aware text splitting with overlap for context preservation
-### 🧠 Context Memory Cache System
-- **Smart Caching**: Hash-based chunk identification prevents re-processing identical content
-- **Multi-level Storage**: In-memory LRU cache with optional SQLite persistence
-- **Performance Boost**: Significant speed improvements for large documents and batch processing
-- **Cache Management**: View statistics, export/import cache, and set TTL limits
-### 🎨 Modern Web Interface
-- **Multi-page Layout**: Organized navigation with Home, Upload, Analysis, Settings, and Performance pages
-- **Real-time Progress**: Live progress bars and processing status updates
-- **Interactive Dashboards**: Performance metrics, cache statistics, and analysis results
-- **Responsive Design**: Works on desktop and mobile devices
-### 📊 Advanced Analytics
-- **Quality Metrics**: Confidence scoring and analysis quality assessment
-- **Performance Monitoring**: Memory usage, CPU utilization, and processing times
-- **Batch Processing**: Handle multiple legislation files simultaneously
-- **Export Options**: Multiple formats (JSON, CSV, Excel) with metadata
 ## 🚀 Quick Start
-### Prerequisites
-```bash
-# Python 3.8 or higher
-python --version
-# Install dependencies
-pip install -r requirements.txt
-```
-### Running the Application
-```bash
-# Method 1: Use the run script (recommended)
-python run_streamlit_app.py
-# Method 2: Direct Streamlit command
-cd streamlit_app
-streamlit run app.py
-```
-The app will be available at: **http://localhost:8501**
-## 📁 Project Structure
-```
-streamlit_app/
-├── app.py                 # Main Streamlit application
-├── core/
-│   ├── cache_manager.py  # Context memory cache system
-│   ├── text_processor.py # Text cleaning and chunking
-│   ├── llm_analyzer.py   # LLM integration and analysis
-│   └── dataset_builder.py # Dataset creation and export
-├── utils/
-│   ├── config.py         # Configuration management
-│   ├── performance.py    # Performance monitoring
-│   └── ui_helpers.py     # UI components and formatting
-├── pages/                # Multi-page navigation
-├── assets/               # Custom styling and assets
-└── cache/                # Cache storage directory
-```
-## 🛠️ Configuration
-### Model Configuration
-The app supports both local GGUF models and HuggingFace models:
-```python
-# Local model
-model_path = "path/to/your/model.gguf"
-# HuggingFace model
-repo_id = "DavidAU/Qwen3-Zero-Coder-Reasoning-0.8B-NEO-EX-GGUF"
-filename = "model-file-name.gguf"
-```
-### Cache Configuration
-```python
-cache_config = {
-    'enabled': True,           # Enable/disable caching
-    'max_size_mb': 1024,       # Maximum memory for cache
-    'ttl_hours': 24,          # Time-to-live for cached entries
-    'persistent': True         # Use disk persistence
-}
-```
-### Processing Configuration
-```python
-processing_config = {
-    'chunk_size': 4096,        # Size of text chunks
-    'chunk_overlap': 256,      # Overlap between chunks
-    'batch_size': 16,          # Number of chunks to process at once
-    'clean_text': True         # Apply text cleaning
-}
-```
-## 📖 Usage Guide
-### 1. Home Page
-- Overview of the application capabilities
-- Current configuration status
-- Quick start guide
-### 2. Upload & Process Page
-- **File Upload**: Support for JSON lines, JSON arrays, and raw text files
-- **Configuration**: Adjust model, processing, and analysis parameters
-- **Batch Processing**: Upload multiple files for simultaneous analysis
-- **Real-time Progress**: Monitor processing status and performance
-### 3. Analysis Results Page
-- **Results Overview**: Summary metrics and statistics
-- **Detailed Analysis**: Expandable results with confidence scores
-- **Export Options**: Download results in multiple formats
-- **Quality Metrics**: Analysis quality assessment and recommendations
-### 4. Settings Page
-- **Model Settings**: Configure LLM parameters and model paths
-- **Processing Settings**: Adjust text processing parameters
-- **Cache Settings**: Manage cache behavior and persistence
-- **UI Settings**: Customize interface appearance
-### 5. Performance Dashboard
-- **Real-time Metrics**: Memory usage, CPU utilization, processing speed
-- **Performance History**: Charts showing performance over time
 - **Cache Statistics**: Hit rates, evictions, and cache efficiency
-- **System Information**: Hardware and software details
 - **Performance Recommendations**: Automated suggestions for optimization
-## 🔧 Advanced Features
-### Cache Management
-```python
-from core.cache_manager import get_cache_manager
-# Get cache instance
-cache = get_cache_manager()
-# View statistics
-stats = cache.get_stats()
-print(f"Hit Rate: {stats['hit_rate']:.1f}%")
-# Clear cache
-cache.clear_cache()
-# Export cache
-cache.export_cache('cache_backup.json')
-```
-### Custom Analysis Templates
-The app supports custom analysis templates for different legal domains:
-```python
-# Define custom template
-custom_template = {
-    'name': 'Commercial Law Analysis',
-    'depth': 'Detailed',
-    'focus_areas': [
-        'contractual loopholes',
-        'commercial implications',
-        'regulatory compliance',
-        'enforcement mechanisms'
-    ]
-}
-```
-### Performance Optimization
-- **Memory Management**: Automatic cache eviction based on memory limits
-- **Batch Processing**: Optimized for large document collections
-- **Concurrent Processing**: Thread-safe operations for multi-user scenarios
-- **Progress Callbacks**: Real-time progress updates during long operations
-## 📊 API Reference
-### Core Classes
-#### CacheManager
-```python
-class CacheManager:
-    def get(self, content, model_config, processing_config) -> Optional[Dict]
-    def put(self, content, analysis_result, model_config, processing_config)
-    def get_stats(self) -> Dict[str, Any]
-    def clear_cache(self)
-    def export_cache(self, filepath: str) -> bool
-    def import_cache(self, filepath: str) -> int
-```
-#### TextProcessor
-```python
-class TextProcessor:
-    def clean_text(self, text: str, preserve_structure: bool = True) -> str
-    def chunk_text(self, text: str, chunk_size: int = 4096, overlap: int = 256) -> List[str]
-    def extract_metadata(self, text: str) -> Dict[str, Any]
-    def preprocess_legislation_json(self, json_data: Dict) -> Dict
-```
-#### LLMAnalyzer
-```python
-class LLMAnalyzer:
-    def analyze_chunk(self, chunk: str, analysis_type: str = 'standard') -> Dict[str, Any]
-    def batch_analyze_chunks(self, chunks: List[str], analysis_type: str = 'standard') -> List[Dict]
-    def load_model(self) -> bool
-    def unload_model(self)
-```
-## 🔍 Analysis Output Format
-Each analysis result contains:
-```json
-{
-  "chunk": "original text chunk",
-  "analysis_type": "standard|detailed|comprehensive",
-  "model_config": {...},
-  "structured_analysis": {
-    "text_meaning": "explanation of text purpose",
-    "key_assumptions": ["list of assumptions"],
-    "exploitable_interpretations": ["potential interpretations"],
-    "critical_loopholes": ["identified loopholes"],
-    "circumvention_strategies": ["exploitation methods"],
-    "recommendations": ["suggested fixes"],
-    "confidence_score": 85,
-    "analysis_quality": "high|medium|low"
-  },
-  "processing_time": 2.34,
-  "chunk_size": 4096,
-  "word_count": 512
-}
-```
-## 🐛 Troubleshooting
-### Common Issues
-1. **Model Loading Errors**
-   - Ensure model file exists and is accessible
-   - Check model format (GGUF required)
-   - Verify sufficient RAM for model loading
-2. **Cache Performance Issues**
-   - Clear cache if memory usage is high
-   - Adjust cache size limits in settings
-   - Check persistent cache database integrity
-3. **Processing Slowdowns**
-   - Reduce batch size for large documents
-   - Increase chunk overlap for better context
-   - Consider using a more powerful model
-4. **Memory Errors**
-   - Reduce cache size in settings
-   - Process files individually instead of batch
-   - Monitor memory usage in performance dashboard
-### Debug Mode
-Enable debug mode in settings for detailed logging:
-```python
-# In settings, enable debug mode
-debug_mode = True
-log_level = "DEBUG"
-```
 ## 🤝 Contributing
 1. Fork the repository
-2. Create a feature branch
-3. Make your changes
-4. Add tests if applicable
-5. Submit a pull request
 ## 📄 License
-This project is licensed under the MIT License - see the LICENSE file for details.
-## 🆘 Support
-For support and questions:
-- Check the troubleshooting section above
-- Review the performance recommendations in the app
-- Examine the logs in the `streamlit_app/logs/` directory
-## 🔄 Migration from Original Script
-If you're migrating from the original `trl.py` script:
-1. **Configuration**: Settings are now managed through the UI
-2. **Output**: Results are displayed in the web interface
-3. **Caching**: Automatic caching with no manual intervention needed
-4. **Batch Processing**: Multiple files can be uploaded simultaneously
-5. **Progress Tracking**: Real-time progress bars and status updates
-The new app maintains all functionality of the original script while providing a modern, user-friendly interface and significant performance improvements through intelligent caching.

+---
+license: wtfpl
+sdk: streamlit
+---
+# NZ Legislation Loophole Analyzer
+A powerful AI-powered web application for analyzing New Zealand legislation to identify potential loopholes, ambiguities, and unintended consequences. Built with advanced caching and real-time performance monitoring.
+## 🌟 Key Features
+### 🤖 AI-Powered Legal Analysis
+- **Specialized NZ Legislation Analysis**: Optimized for New Zealand legal texts with Treaty of Waitangi references
+- **Multiple Analysis Depths**: Standard, Detailed, and Comprehensive analysis modes
+- **Intelligent Text Processing**: Sentence-aware chunking with legal document structure preservation
+### 🧠 Advanced Context Memory Cache
+- **Smart Caching System**: Hash-based identification prevents re-processing identical content
+- **Memory-Efficient**: Optimized for cloud environments with automatic cache management
+- **Performance Boost**: Significant speed improvements for large document analysis
+### 🎨 Modern Web Interface
+- **Streamlit-Powered**: Clean, responsive interface that works on any device
+- **Real-Time Progress**: Live progress bars and processing status updates
+- **Interactive Results**: Expandable analysis results with confidence scoring
 ## 🚀 Quick Start
+1. **Upload Legislation**: Use the file uploader to select NZ legislation files (JSON lines, JSON arrays, or raw text)
+2. **Configure Analysis**: Adjust model parameters and analysis settings
+3. **Process & Analyze**: Click "Start Processing" to begin AI-powered analysis
+4. **Review Results**: Explore detailed findings with interactive visualizations
+5. **Export Data**: Download results in JSON, CSV, or Excel formats
+## 📊 Analysis Capabilities
+- **Loophole Detection**: Identify potential legal ambiguities and exploitable interpretations
+- **Risk Assessment**: Evaluate legal risks and unintended consequences
+- **Circumvention Analysis**: Explore potential methods for bypassing legal provisions
+- **Recommendations**: Receive specific suggestions for legislative improvements
+## 🛠️ Technical Features
+- **Memory Optimized**: Designed for cloud deployment with efficient resource usage
+- **Session-Based Caching**: Intelligent caching that works within Spaces limitations
+- **Performance Monitoring**: Real-time metrics and performance recommendations
+- **Batch Processing**: Handle multiple files simultaneously
+- **Quality Metrics**: Confidence scoring and analysis validation
+## 🔧 Configuration
+### Model Settings
+- **Local Models**: Support for GGUF format models
+- **HuggingFace Integration**: Direct model downloads from HuggingFace Hub
+- **Parameter Tuning**: Adjustable temperature, context length, and sampling parameters
+### Processing Options
+- **Chunk Size**: Configurable text chunk sizes (256-8192 characters)
+- **Analysis Depth**: Three levels of analysis detail
+- **Cache Size**: Memory-efficient caching system
+## 📈 Performance & Monitoring
+- **Real-Time Metrics**: Memory usage, CPU utilization, and processing speed
 - **Cache Statistics**: Hit rates, evictions, and cache efficiency
 - **Performance Recommendations**: Automated suggestions for optimization
+## 🔍 Analysis Output
+Each analysis provides:
+- **Text Meaning**: Clear explanation of legal provision intent
+- **Key Assumptions**: Identified assumptions that could be exploited
+- **Critical Findings**: Specific loopholes and ambiguities
+- **Confidence Scores**: AI confidence in analysis results
+- **Recommendations**: Suggested improvements and clarifications
+## 🆘 Limitations & Recommendations
+### Spaces-Specific Considerations
+- **Memory Limits**: Optimized for 2-8GB RAM environments
+- **Session-Based**: Cache persists only during active sessions
+- **Model Size**: Choose appropriately sized models for Spaces constraints
+### Recommended Models
+- **Small Models**: Qwen 0.8B variants for faster processing
+- **Medium Models**: Qwen 1.5B-3B for balanced performance
+- **API Integration**: Consider using external APIs for larger models
+## 📚 Documentation
+For detailed documentation, see:
+- [Application Guide](README_Streamlit_App.md)
+- [Docker Deployment](README_Docker.md)
 ## 🤝 Contributing
+This is a demo application for Hugging Face Spaces. For improvements or modifications:
 1. Fork the repository
+2. Make your changes
+3. Test thoroughly
+4. Submit a pull request
 ## 📄 License
+MIT License - see LICENSE file for details.
+---
+**⚖️ Built with Streamlit & Llama.cpp | Optimized for Hugging Face Spaces**