Princess3 commited on
Commit
0be65bd
Β·
verified Β·
1 Parent(s): ff080b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -264
README.md CHANGED
@@ -1,300 +1,109 @@
1
- # NZ Legislation Loophole Analysis Streamlit App
 
 
 
2
 
3
- A modern, AI-powered web application for analyzing New Zealand legislation to identify potential loopholes, ambiguities, and unintended consequences.
4
 
5
- ## 🌟 Features
6
 
7
- ### πŸ€– AI-Powered Analysis
8
- - **Legal Expertise**: Specialized analysis for NZ legislation with Treaty of Waitangi references
9
- - **Multiple Analysis Types**: Standard, Detailed, and Comprehensive analysis modes
10
- - **Intelligent Chunking**: Sentence-aware text splitting with overlap for context preservation
11
 
12
- ### 🧠 Context Memory Cache System
13
- - **Smart Caching**: Hash-based chunk identification prevents re-processing identical content
14
- - **Multi-level Storage**: In-memory LRU cache with optional SQLite persistence
15
- - **Performance Boost**: Significant speed improvements for large documents and batch processing
16
- - **Cache Management**: View statistics, export/import cache, and set TTL limits
17
 
18
- ### 🎨 Modern Web Interface
19
- - **Multi-page Layout**: Organized navigation with Home, Upload, Analysis, Settings, and Performance pages
20
- - **Real-time Progress**: Live progress bars and processing status updates
21
- - **Interactive Dashboards**: Performance metrics, cache statistics, and analysis results
22
- - **Responsive Design**: Works on desktop and mobile devices
23
 
24
- ### πŸ“Š Advanced Analytics
25
- - **Quality Metrics**: Confidence scoring and analysis quality assessment
26
- - **Performance Monitoring**: Memory usage, CPU utilization, and processing times
27
- - **Batch Processing**: Handle multiple legislation files simultaneously
28
- - **Export Options**: Multiple formats (JSON, CSV, Excel) with metadata
29
 
30
  ## πŸš€ Quick Start
31
 
32
- ### Prerequisites
33
- ```bash
34
- # Python 3.8 or higher
35
- python --version
36
-
37
- # Install dependencies
38
- pip install -r requirements.txt
39
- ```
40
-
41
- ### Running the Application
42
- ```bash
43
- # Method 1: Use the run script (recommended)
44
- python run_streamlit_app.py
45
-
46
- # Method 2: Direct Streamlit command
47
- cd streamlit_app
48
- streamlit run app.py
49
- ```
50
-
51
- The app will be available at: **http://localhost:8501**
52
-
53
- ## πŸ“ Project Structure
54
-
55
- ```
56
- streamlit_app/
57
- β”œβ”€β”€ app.py # Main Streamlit application
58
- β”œβ”€β”€ core/
59
- β”‚ β”œβ”€β”€ cache_manager.py # Context memory cache system
60
- β”‚ β”œβ”€β”€ text_processor.py # Text cleaning and chunking
61
- β”‚ β”œβ”€β”€ llm_analyzer.py # LLM integration and analysis
62
- β”‚ └── dataset_builder.py # Dataset creation and export
63
- β”œβ”€β”€ utils/
64
- β”‚ β”œβ”€β”€ config.py # Configuration management
65
- β”‚ β”œβ”€β”€ performance.py # Performance monitoring
66
- β”‚ └── ui_helpers.py # UI components and formatting
67
- β”œβ”€β”€ pages/ # Multi-page navigation
68
- β”œβ”€β”€ assets/ # Custom styling and assets
69
- └── cache/ # Cache storage directory
70
- ```
71
 
72
- ## πŸ› οΈ Configuration
73
 
74
- ### Model Configuration
75
- The app supports both local GGUF models and HuggingFace models:
 
 
76
 
77
- ```python
78
- # Local model
79
- model_path = "path/to/your/model.gguf"
80
 
81
- # HuggingFace model
82
- repo_id = "DavidAU/Qwen3-Zero-Coder-Reasoning-0.8B-NEO-EX-GGUF"
83
- filename = "model-file-name.gguf"
84
- ```
 
85
 
86
- ### Cache Configuration
87
- ```python
88
- cache_config = {
89
- 'enabled': True, # Enable/disable caching
90
- 'max_size_mb': 1024, # Maximum memory for cache
91
- 'ttl_hours': 24, # Time-to-live for cached entries
92
- 'persistent': True # Use disk persistence
93
- }
94
- ```
95
 
96
- ### Processing Configuration
97
- ```python
98
- processing_config = {
99
- 'chunk_size': 4096, # Size of text chunks
100
- 'chunk_overlap': 256, # Overlap between chunks
101
- 'batch_size': 16, # Number of chunks to process at once
102
- 'clean_text': True # Apply text cleaning
103
- }
104
- ```
105
 
106
- ## πŸ“– Usage Guide
 
 
 
107
 
108
- ### 1. Home Page
109
- - Overview of the application capabilities
110
- - Current configuration status
111
- - Quick start guide
112
 
113
- ### 2. Upload & Process Page
114
- - **File Upload**: Support for JSON lines, JSON arrays, and raw text files
115
- - **Configuration**: Adjust model, processing, and analysis parameters
116
- - **Batch Processing**: Upload multiple files for simultaneous analysis
117
- - **Real-time Progress**: Monitor processing status and performance
118
-
119
- ### 3. Analysis Results Page
120
- - **Results Overview**: Summary metrics and statistics
121
- - **Detailed Analysis**: Expandable results with confidence scores
122
- - **Export Options**: Download results in multiple formats
123
- - **Quality Metrics**: Analysis quality assessment and recommendations
124
-
125
- ### 4. Settings Page
126
- - **Model Settings**: Configure LLM parameters and model paths
127
- - **Processing Settings**: Adjust text processing parameters
128
- - **Cache Settings**: Manage cache behavior and persistence
129
- - **UI Settings**: Customize interface appearance
130
-
131
- ### 5. Performance Dashboard
132
- - **Real-time Metrics**: Memory usage, CPU utilization, processing speed
133
- - **Performance History**: Charts showing performance over time
134
  - **Cache Statistics**: Hit rates, evictions, and cache efficiency
135
- - **System Information**: Hardware and software details
136
  - **Performance Recommendations**: Automated suggestions for optimization
137
 
138
- ## πŸ”§ Advanced Features
139
-
140
- ### Cache Management
141
- ```python
142
- from core.cache_manager import get_cache_manager
143
-
144
- # Get cache instance
145
- cache = get_cache_manager()
146
-
147
- # View statistics
148
- stats = cache.get_stats()
149
- print(f"Hit Rate: {stats['hit_rate']:.1f}%")
150
-
151
- # Clear cache
152
- cache.clear_cache()
153
-
154
- # Export cache
155
- cache.export_cache('cache_backup.json')
156
- ```
157
 
158
- ### Custom Analysis Templates
159
- The app supports custom analysis templates for different legal domains:
 
 
 
 
160
 
161
- ```python
162
- # Define custom template
163
- custom_template = {
164
- 'name': 'Commercial Law Analysis',
165
- 'depth': 'Detailed',
166
- 'focus_areas': [
167
- 'contractual loopholes',
168
- 'commercial implications',
169
- 'regulatory compliance',
170
- 'enforcement mechanisms'
171
- ]
172
- }
173
- ```
174
 
175
- ### Performance Optimization
176
- - **Memory Management**: Automatic cache eviction based on memory limits
177
- - **Batch Processing**: Optimized for large document collections
178
- - **Concurrent Processing**: Thread-safe operations for multi-user scenarios
179
- - **Progress Callbacks**: Real-time progress updates during long operations
180
 
181
- ## πŸ“Š API Reference
 
 
 
182
 
183
- ### Core Classes
184
 
185
- #### CacheManager
186
- ```python
187
- class CacheManager:
188
- def get(self, content, model_config, processing_config) -> Optional[Dict]
189
- def put(self, content, analysis_result, model_config, processing_config)
190
- def get_stats(self) -> Dict[str, Any]
191
- def clear_cache(self)
192
- def export_cache(self, filepath: str) -> bool
193
- def import_cache(self, filepath: str) -> int
194
- ```
195
-
196
- #### TextProcessor
197
- ```python
198
- class TextProcessor:
199
- def clean_text(self, text: str, preserve_structure: bool = True) -> str
200
- def chunk_text(self, text: str, chunk_size: int = 4096, overlap: int = 256) -> List[str]
201
- def extract_metadata(self, text: str) -> Dict[str, Any]
202
- def preprocess_legislation_json(self, json_data: Dict) -> Dict
203
- ```
204
-
205
- #### LLMAnalyzer
206
- ```python
207
- class LLMAnalyzer:
208
- def analyze_chunk(self, chunk: str, analysis_type: str = 'standard') -> Dict[str, Any]
209
- def batch_analyze_chunks(self, chunks: List[str], analysis_type: str = 'standard') -> List[Dict]
210
- def load_model(self) -> bool
211
- def unload_model(self)
212
- ```
213
-
214
- ## πŸ” Analysis Output Format
215
-
216
- Each analysis result contains:
217
-
218
- ```json
219
- {
220
- "chunk": "original text chunk",
221
- "analysis_type": "standard|detailed|comprehensive",
222
- "model_config": {...},
223
- "structured_analysis": {
224
- "text_meaning": "explanation of text purpose",
225
- "key_assumptions": ["list of assumptions"],
226
- "exploitable_interpretations": ["potential interpretations"],
227
- "critical_loopholes": ["identified loopholes"],
228
- "circumvention_strategies": ["exploitation methods"],
229
- "recommendations": ["suggested fixes"],
230
- "confidence_score": 85,
231
- "analysis_quality": "high|medium|low"
232
- },
233
- "processing_time": 2.34,
234
- "chunk_size": 4096,
235
- "word_count": 512
236
- }
237
- ```
238
-
239
- ## πŸ› Troubleshooting
240
-
241
- ### Common Issues
242
-
243
- 1. **Model Loading Errors**
244
- - Ensure model file exists and is accessible
245
- - Check model format (GGUF required)
246
- - Verify sufficient RAM for model loading
247
-
248
- 2. **Cache Performance Issues**
249
- - Clear cache if memory usage is high
250
- - Adjust cache size limits in settings
251
- - Check persistent cache database integrity
252
-
253
- 3. **Processing Slowdowns**
254
- - Reduce batch size for large documents
255
- - Increase chunk overlap for better context
256
- - Consider using a more powerful model
257
-
258
- 4. **Memory Errors**
259
- - Reduce cache size in settings
260
- - Process files individually instead of batch
261
- - Monitor memory usage in performance dashboard
262
-
263
- ### Debug Mode
264
- Enable debug mode in settings for detailed logging:
265
- ```python
266
- # In settings, enable debug mode
267
- debug_mode = True
268
- log_level = "DEBUG"
269
- ```
270
 
271
  ## 🀝 Contributing
272
 
 
273
  1. Fork the repository
274
- 2. Create a feature branch
275
- 3. Make your changes
276
- 4. Add tests if applicable
277
- 5. Submit a pull request
278
 
279
  ## πŸ“„ License
280
 
281
- This project is licensed under the MIT License - see the LICENSE file for details.
282
-
283
- ## πŸ†˜ Support
284
-
285
- For support and questions:
286
- - Check the troubleshooting section above
287
- - Review the performance recommendations in the app
288
- - Examine the logs in the `streamlit_app/logs/` directory
289
-
290
- ## πŸ”„ Migration from Original Script
291
-
292
- If you're migrating from the original `trl.py` script:
293
 
294
- 1. **Configuration**: Settings are now managed through the UI
295
- 2. **Output**: Results are displayed in the web interface
296
- 3. **Caching**: Automatic caching with no manual intervention needed
297
- 4. **Batch Processing**: Multiple files can be uploaded simultaneously
298
- 5. **Progress Tracking**: Real-time progress bars and status updates
299
 
300
- The new app maintains all functionality of the original script while providing a modern, user-friendly interface and significant performance improvements through intelligent caching.
 
1
+ ---
2
+ license: wtfpl
3
+ sdk: streamlit
4
+ ---
5
 
6
+ # NZ Legislation Loophole Analyzer
7
 
8
+ A powerful AI-powered web application for analyzing New Zealand legislation to identify potential loopholes, ambiguities, and unintended consequences. Built with advanced caching and real-time performance monitoring.
9
 
10
+ ## 🌟 Key Features
 
 
 
11
 
12
+ ### πŸ€– AI-Powered Legal Analysis
13
+ - **Specialized NZ Legislation Analysis**: Optimized for New Zealand legal texts with Treaty of Waitangi references
14
+ - **Multiple Analysis Depths**: Standard, Detailed, and Comprehensive analysis modes
15
+ - **Intelligent Text Processing**: Sentence-aware chunking with legal document structure preservation
 
16
 
17
+ ### 🧠 Advanced Context Memory Cache
18
+ - **Smart Caching System**: Hash-based identification prevents re-processing identical content
19
+ - **Memory-Efficient**: Optimized for cloud environments with automatic cache management
20
+ - **Performance Boost**: Significant speed improvements for large document analysis
 
21
 
22
+ ### 🎨 Modern Web Interface
23
+ - **Streamlit-Powered**: Clean, responsive interface that works on any device
24
+ - **Real-Time Progress**: Live progress bars and processing status updates
25
+ - **Interactive Results**: Expandable analysis results with confidence scoring
 
26
 
27
  ## πŸš€ Quick Start
28
 
29
+ 1. **Upload Legislation**: Use the file uploader to select NZ legislation files (JSON lines, JSON arrays, or raw text)
30
+ 2. **Configure Analysis**: Adjust model parameters and analysis settings
31
+ 3. **Process & Analyze**: Click "Start Processing" to begin AI-powered analysis
32
+ 4. **Review Results**: Explore detailed findings with interactive visualizations
33
+ 5. **Export Data**: Download results in JSON, CSV, or Excel formats
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
+ ## πŸ“Š Analysis Capabilities
36
 
37
+ - **Loophole Detection**: Identify potential legal ambiguities and exploitable interpretations
38
+ - **Risk Assessment**: Evaluate legal risks and unintended consequences
39
+ - **Circumvention Analysis**: Explore potential methods for bypassing legal provisions
40
+ - **Recommendations**: Receive specific suggestions for legislative improvements
41
 
42
+ ## πŸ› οΈ Technical Features
 
 
43
 
44
+ - **Memory Optimized**: Designed for cloud deployment with efficient resource usage
45
+ - **Session-Based Caching**: Intelligent caching that works within Spaces limitations
46
+ - **Performance Monitoring**: Real-time metrics and performance recommendations
47
+ - **Batch Processing**: Handle multiple files simultaneously
48
+ - **Quality Metrics**: Confidence scoring and analysis validation
49
 
50
+ ## πŸ”§ Configuration
 
 
 
 
 
 
 
 
51
 
52
+ ### Model Settings
53
+ - **Local Models**: Support for GGUF format models
54
+ - **HuggingFace Integration**: Direct model downloads from HuggingFace Hub
55
+ - **Parameter Tuning**: Adjustable temperature, context length, and sampling parameters
 
 
 
 
 
56
 
57
+ ### Processing Options
58
+ - **Chunk Size**: Configurable text chunk sizes (256-8192 characters)
59
+ - **Analysis Depth**: Three levels of analysis detail
60
+ - **Cache Size**: Memory-efficient caching system
61
 
62
+ ## πŸ“ˆ Performance & Monitoring
 
 
 
63
 
64
+ - **Real-Time Metrics**: Memory usage, CPU utilization, and processing speed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
  - **Cache Statistics**: Hit rates, evictions, and cache efficiency
 
66
  - **Performance Recommendations**: Automated suggestions for optimization
67
 
68
+ ## πŸ” Analysis Output
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
 
70
+ Each analysis provides:
71
+ - **Text Meaning**: Clear explanation of legal provision intent
72
+ - **Key Assumptions**: Identified assumptions that could be exploited
73
+ - **Critical Findings**: Specific loopholes and ambiguities
74
+ - **Confidence Scores**: AI confidence in analysis results
75
+ - **Recommendations**: Suggested improvements and clarifications
76
 
77
+ ## πŸ†˜ Limitations & Recommendations
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
+ ### Spaces-Specific Considerations
80
+ - **Memory Limits**: Optimized for 2-8GB RAM environments
81
+ - **Session-Based**: Cache persists only during active sessions
82
+ - **Model Size**: Choose appropriately sized models for Spaces constraints
 
83
 
84
+ ### Recommended Models
85
+ - **Small Models**: Qwen 0.8B variants for faster processing
86
+ - **Medium Models**: Qwen 1.5B-3B for balanced performance
87
+ - **API Integration**: Consider using external APIs for larger models
88
 
89
+ ## πŸ“š Documentation
90
 
91
+ For detailed documentation, see:
92
+ - [Application Guide](README_Streamlit_App.md)
93
+ - [Docker Deployment](README_Docker.md)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
 
95
  ## 🀝 Contributing
96
 
97
+ This is a demo application for Hugging Face Spaces. For improvements or modifications:
98
  1. Fork the repository
99
+ 2. Make your changes
100
+ 3. Test thoroughly
101
+ 4. Submit a pull request
 
102
 
103
  ## πŸ“„ License
104
 
105
+ MIT License - see LICENSE file for details.
 
 
 
 
 
 
 
 
 
 
 
106
 
107
+ ---
 
 
 
 
108
 
109
+ **βš–οΈ Built with Streamlit & Llama.cpp | Optimized for Hugging Face Spaces**