Spaces:

egumasa
/

simple-text-analyzer

Building

App Files Files Community

egumasa commited on Jul 26, 2025

Commit

4b36911

1 Parent(s): eab1374

added memory file handler

Browse files

Files changed (17) hide show

MEMORY_HANDLER_MIGRATION.md +146 -0
apply_memory_handler_fix.py +87 -0
diagnose_upload_error.py +162 -0
test_fix_403.py +90 -0
test_memory_upload.py +90 -0
update_to_memory_handler.py +113 -0
web_app/components/comparison_functions.py +5 -22
web_app/components/ui_components.py +5 -22
web_app/handlers/analysis_handlers.py +544 -367
web_app/handlers/analysis_handlers.py.backup_20250726_162020 +429 -0
web_app/handlers/analysis_handlers_updated.py +606 -0
web_app/handlers/frequency_handlers.py +429 -468
web_app/handlers/frequency_handlers.py.backup_20250726_162020 +733 -0
web_app/handlers/frequency_handlers_memory.py +317 -0
web_app/handlers/frequency_handlers_updated.py +694 -0
web_app/utils/__init__.py +2 -1
web_app/utils/memory_file_handler.py +170 -0

MEMORY_HANDLER_MIGRATION.md ADDED Viewed

	@@ -0,0 +1,146 @@

+# Memory Handler Migration Guide
+## Why Memory-Based File Handling?
+The original `FileUploadHandler` saves files to `/tmp` directory, which can cause 403 Forbidden errors on restricted environments like:
+- Hugging Face Spaces
+- Some cloud platforms with read-only filesystems
+- Containers with security restrictions
+The `MemoryFileHandler` processes files entirely in memory, avoiding filesystem access.
+## Caveats and Limitations
+### 1. **Memory Usage**
+- **Issue**: All file content is loaded into RAM
+- **Impact**: Large files (near 300MB limit) could cause memory issues
+- **Mitigation**: The 300MB file size limit helps prevent OOM errors
+### 2. **ZIP File Handling**
+- **Issue**: ZIP files need special handling as they require file-like objects
+- **Current approach**: Load entire ZIP into memory using BytesIO
+- **Limitation**: Extracting large ZIP files could spike memory usage
+### 3. **Session State Persistence**
+- **Issue**: Streamlit reloads can clear memory
+- **Solution**: Store processed content in `st.session_state`
+- **Limitation**: Session state also uses memory
+### 4. **Multiple File Processing**
+- **Issue**: Batch processing multiple files multiplies memory usage
+- **Example**: 10 files × 30MB each = 300MB in memory
+- **Mitigation**: Process files sequentially, not in parallel
+### 5. **Binary vs Text Files**
+- **Issue**: Binary files (images, etc.) need different handling
+- **Solution**: `as_text` parameter in `process_uploaded_file()`
+## Implementation Status
+### ✅ Completed:
+- `ui_components.py` - Text input file uploads
+- `comparison_functions.py` - Comparison file uploads
+- `frequency_handlers.py` - Created `frequency_handlers_updated.py`
+- `utils/__init__.py` - Exports both handlers
+### ⚠️ Need Updates:
+- `analysis_handlers.py` - Complex due to ZIP file handling
+- `pos_handlers.py` - Batch file processing
+- `reference_manager.py` - Custom reference uploads
+- `config_manager.py` - YAML config uploads
+## Migration Examples
+### Simple File Upload
+```python
+# OLD - FileUploadHandler
+temp_path = FileUploadHandler.save_to_temp(uploaded_file, prefix="text")
+if temp_path:
+    content = FileUploadHandler.read_from_temp(temp_path)
+    # ... process content
+    FileUploadHandler.cleanup_temp_file(temp_path)
+# NEW - MemoryFileHandler
+content = MemoryFileHandler.process_uploaded_file(uploaded_file, as_text=True)
+if content:
+    # ... process content
+    # No cleanup needed!
+```
+### ZIP File Handling
+```python
+# OLD - FileUploadHandler
+zip_file = FileUploadHandler.handle_zip_file(uploaded_file)
+with zip_file as zip_ref:
+    for file_info in zip_ref.infolist():
+        content = zip_ref.read(file_info.filename)
+# NEW - MemoryFileHandler
+file_contents = MemoryFileHandler.handle_zip_file(uploaded_file)
+if file_contents:
+    for filename, content in file_contents.items():
+        # Process each file
+```
+### DataFrame Processing
+```python
+# OLD - Manual CSV parsing
+content = FileUploadHandler.read_from_temp(temp_path)
+df = pd.read_csv(StringIO(content.decode('utf-8')))
+# NEW - Direct DataFrame creation
+df = MemoryFileHandler.process_csv_tsv_file(uploaded_file)
+```
+## When to Use Which Handler
+### Use MemoryFileHandler when:
+- Deploying to restricted environments (Hugging Face Spaces)
+- Files are reasonably sized (<100MB preferred)
+- You need maximum compatibility
+### Consider FileUploadHandler when:
+- Processing very large files (>200MB)
+- Running locally with full filesystem access
+- Need to preserve files across sessions
+## Complete Migration Steps
+1. **Update imports**:
+   ```python
+   from web_app.utils import MemoryFileHandler
+   ```
+2. **Replace file operations**:
+   - Remove `save_to_temp()` calls
+   - Remove `cleanup_temp_file()` calls
+   - Use `process_uploaded_file()` directly
+3. **Update error handling**:
+   - Remove 403-specific error messages
+   - Add memory-related error handling
+4. **Test thoroughly**:
+   - Test with small files first
+   - Test with maximum size files
+   - Test with multiple files
+## Performance Considerations
+### Memory Usage Formula:
+```
+Total Memory = File Size + Processing Overhead + Session State Storage
+```
+### Example for 50MB file:
+- File content: 50MB
+- String conversion: ~50MB (if text)
+- DataFrame creation: ~100-200MB (depends on data)
+- Total: ~200-300MB peak usage
+## Recommendations
+1. **For Hugging Face Spaces**: Use MemoryFileHandler exclusively
+2. **For local deployment**: Either handler works, choose based on file sizes
+3. **For production**: Consider implementing both with automatic fallback
+4. **Monitor memory**: Add memory usage tracking for large deployments

apply_memory_handler_fix.py ADDED Viewed

	@@ -0,0 +1,87 @@

+#!/usr/bin/env python3
+"""
+Apply the memory handler fix to all components
+This script backs up original files and replaces them with memory-based versions
+"""
+import os
+import shutil
+from datetime import datetime
+def backup_and_replace(original_file, new_file):
+    """Backup original file and replace with new version."""
+    if not os.path.exists(original_file):
+        print(f"❌ Original file not found: {original_file}")
+        return False
+    if not os.path.exists(new_file):
+        print(f"❌ New file not found: {new_file}")
+        return False
+    # Create backup
+    backup_file = f"{original_file}.backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+    shutil.copy2(original_file, backup_file)
+    print(f"📋 Backed up: {original_file} → {backup_file}")
+    # Replace with new version
+    shutil.copy2(new_file, original_file)
+    print(f"✅ Replaced: {original_file}")
+    return True
+def main():
+    """Apply memory handler updates to all components."""
+    print("🔄 Applying Memory Handler Fix")
+    print("=" * 60)
+    print("This will backup original files and replace with memory-based versions")
+    print("=" * 60)
+    # Files to update
+    updates = [
+        ("web_app/handlers/frequency_handlers.py", "web_app/handlers/frequency_handlers_updated.py"),
+        ("web_app/handlers/analysis_handlers.py", "web_app/handlers/analysis_handlers_updated.py"),
+    ]
+    # Confirm with user
+    response = input("\nProceed with updates? (y/n): ").lower()
+    if response != 'y':
+        print("❌ Cancelled")
+        return
+    print("\n🚀 Starting updates...")
+    success_count = 0
+    for original, updated in updates:
+        if backup_and_replace(original, updated):
+            success_count += 1
+    print("\n" + "=" * 60)
+    print(f"✅ Successfully updated {success_count}/{len(updates)} files")
+    # Additional components that need manual updates
+    print("\n⚠️  The following files still need manual updates:")
+    manual_updates = [
+        "web_app/handlers/pos_handlers.py",
+        "web_app/reference_manager.py",
+        "web_app/config_manager.py",
+        "web_app/debug_utils.py"
+    ]
+    for file in manual_updates:
+        print(f"  - {file}")
+    print("\n💡 To complete the migration:")
+    print("1. Update the remaining files manually")
+    print("2. Test the application thoroughly")
+    print("3. Remove the *_updated.py files after verification")
+    print("\n📝 Key changes to make in remaining files:")
+    print("- Replace: from web_app.utils import FileUploadHandler")
+    print("  With:    from web_app.utils import MemoryFileHandler")
+    print("- Replace: FileUploadHandler.save_to_temp() + read_from_temp()")
+    print("  With:    MemoryFileHandler.process_uploaded_file()")
+    print("- Remove:  All cleanup_temp_file() calls")
+if __name__ == "__main__":
+    main()

diagnose_upload_error.py ADDED Viewed

	@@ -0,0 +1,162 @@

+"""
+Diagnostic script for file upload 403 errors
+"""
+import streamlit as st
+import os
+import tempfile
+import traceback
+from pathlib import Path
+st.set_page_config(page_title="Upload Diagnostic", layout="wide")
+st.title("File Upload Diagnostic Tool")
+st.write("This tool helps diagnose file upload issues and 403 errors")
+# Check environment
+st.subheader("1. Environment Check")
+col1, col2 = st.columns(2)
+with col1:
+    st.write("**System Info:**")
+    st.write(f"- Platform: {os.name}")
+    st.write(f"- Python: {os.sys.version.split()[0]}")
+    st.write(f"- Streamlit: {st.__version__}")
+    st.write(f"- Working Dir: {os.getcwd()}")
+with col2:
+    st.write("**Temp Directory:**")
+    st.write(f"- Temp Dir: {tempfile.gettempdir()}")
+    st.write(f"- Writable: {os.access(tempfile.gettempdir(), os.W_OK)}")
+    # Check disk space
+    try:
+        stat = os.statvfs(tempfile.gettempdir())
+        free_mb = (stat.f_frsize * stat.f_bavail) / (1024 * 1024)
+        st.write(f"- Free Space: {free_mb:.1f} MB")
+    except:
+        st.write("- Free Space: Unknown")
+# File upload test
+st.subheader("2. File Upload Test")
+uploaded_file = st.file_uploader(
+    "Upload a test file",
+    type=['txt', 'csv', 'tsv'],
+    help="Upload any file to test"
+)
+if uploaded_file:
+    st.write("**File received by Streamlit:**")
+    st.write(f"- Name: {uploaded_file.name}")
+    st.write(f"- Type: {uploaded_file.type}")
+    st.write(f"- Size: {uploaded_file.size} bytes")
+    # Test different read methods
+    st.subheader("3. Testing Read Methods")
+    # Method 1: Direct read
+    with st.expander("Method 1: Direct read()"):
+        try:
+            uploaded_file.seek(0)
+            content = uploaded_file.read()
+            st.success(f"✅ Success! Read {len(content)} bytes")
+            st.code(content[:200].decode('utf-8', errors='ignore') + "...")
+        except Exception as e:
+            st.error(f"❌ Failed: {type(e).__name__}: {str(e)}")
+            st.code(traceback.format_exc())
+    # Method 2: getvalue()
+    with st.expander("Method 2: getvalue()"):
+        try:
+            uploaded_file.seek(0)
+            content = uploaded_file.getvalue()
+            st.success(f"✅ Success! Read {len(content)} bytes")
+            st.code(content[:200].decode('utf-8', errors='ignore') + "...")
+        except Exception as e:
+            st.error(f"❌ Failed: {type(e).__name__}: {str(e)}")
+            st.code(traceback.format_exc())
+    # Method 3: getbuffer()
+    with st.expander("Method 3: getbuffer()"):
+        try:
+            uploaded_file.seek(0)
+            content = uploaded_file.getbuffer()
+            st.success(f"✅ Success! Buffer size: {len(content)} bytes")
+            st.code(str(content[:200]))
+        except Exception as e:
+            st.error(f"❌ Failed: {type(e).__name__}: {str(e)}")
+            st.code(traceback.format_exc())
+    # Method 4: Save to temp
+    with st.expander("Method 4: Save to temp file"):
+        try:
+            # Try different temp locations
+            temp_locations = [
+                ("/tmp", "System /tmp"),
+                (tempfile.gettempdir(), "Python tempdir"),
+                (".", "Current directory"),
+                (str(Path.home() / ".streamlit" / "temp"), "Streamlit temp")
+            ]
+            for temp_dir, desc in temp_locations:
+                st.write(f"\n**Trying {desc}: {temp_dir}**")
+                if not os.path.exists(temp_dir):
+                    try:
+                        os.makedirs(temp_dir, exist_ok=True)
+                        st.write(f"✅ Created directory")
+                    except:
+                        st.write(f"❌ Cannot create directory")
+                        continue
+                if not os.access(temp_dir, os.W_OK):
+                    st.write(f"❌ Not writable")
+                    continue
+                try:
+                    temp_path = os.path.join(temp_dir, f"test_{uploaded_file.name}")
+                    with open(temp_path, 'wb') as f:
+                        uploaded_file.seek(0)
+                        f.write(uploaded_file.getbuffer())
+                    if os.path.exists(temp_path):
+                        size = os.path.getsize(temp_path)
+                        st.success(f"✅ Saved successfully! Size: {size} bytes")
+                        st.code(f"Path: {temp_path}")
+                        # Try to read back
+                        with open(temp_path, 'rb') as f:
+                            content = f.read()
+                            st.write(f"✅ Read back: {len(content)} bytes")
+                        # Cleanup
+                        os.remove(temp_path)
+                        st.write("✅ Cleaned up")
+                        break
+                    else:
+                        st.error("❌ File not found after saving")
+                except Exception as e:
+                    st.error(f"❌ Failed: {type(e).__name__}: {str(e)}")
+                    if "403" in str(e):
+                        st.error("**403 ERROR DETECTED!**")
+                    st.code(traceback.format_exc())
+        except Exception as e:
+            st.error(f"❌ General failure: {str(e)}")
+            st.code(traceback.format_exc())
+# Network test
+st.subheader("4. Network Configuration")
+st.write("**Streamlit Server Config:**")
+st.write(f"- Server Port: {os.environ.get('STREAMLIT_SERVER_PORT', '8501')}")
+st.write(f"- Server Address: {os.environ.get('STREAMLIT_SERVER_ADDRESS', 'Not set')}")
+st.write(f"- Browser Port: {os.environ.get('STREAMLIT_BROWSER_SERVER_PORT', 'Not set')}")
+# Check for proxy
+proxy_vars = ['HTTP_PROXY', 'HTTPS_PROXY', 'http_proxy', 'https_proxy']
+for var in proxy_vars:
+    if var in os.environ:
+        st.write(f"- {var}: {os.environ[var]}")
+st.info("If you see a 403 error above, please share the full error message and traceback.")

test_fix_403.py ADDED Viewed

	@@ -0,0 +1,90 @@

+"""
+Test if the memory-based file handler fixes the 403 error
+"""
+import streamlit as st
+import sys
+import os
+sys.path.append(os.path.dirname(__file__))
+# Test both handlers
+from web_app.utils import FileUploadHandler, MemoryFileHandler
+st.set_page_config(page_title="403 Error Fix Test", layout="wide")
+st.title("Test 403 Error Fix")
+st.write("This test compares the old FileUploadHandler with the new MemoryFileHandler")
+# File upload
+uploaded_file = st.file_uploader(
+    "Upload a test file to check for 403 errors",
+    type=['txt', 'csv', 'tsv'],
+    help="We'll test both handlers with this file"
+)
+if uploaded_file:
+    col1, col2 = st.columns(2)
+    with col1:
+        st.subheader("❌ Old Method (FileUploadHandler)")
+        st.write("This may cause 403 errors on restricted environments")
+        try:
+            # Try the old method
+            temp_path = FileUploadHandler.save_to_temp(uploaded_file, prefix="test")
+            if temp_path:
+                st.success(f"✅ Saved to: {temp_path}")
+                content = FileUploadHandler.read_from_temp(temp_path)
+                if content:
+                    st.success(f"✅ Read {len(content)} bytes")
+                FileUploadHandler.cleanup_temp_file(temp_path)
+            else:
+                st.error("❌ Failed to save to temp")
+        except Exception as e:
+            st.error(f"❌ Error: {str(e)}")
+            if "403" in str(e):
+                st.error("**403 ERROR DETECTED!**")
+    with col2:
+        st.subheader("✅ New Method (MemoryFileHandler)")
+        st.write("This keeps files in memory, avoiding filesystem")
+        try:
+            # Reset file pointer
+            uploaded_file.seek(0)
+            # Try the new method
+            content = MemoryFileHandler.process_uploaded_file(uploaded_file, as_text=False)
+            if content:
+                st.success(f"✅ Successfully read {len(content)} bytes")
+                st.write("No filesystem access needed!")
+                # Also test text mode
+                uploaded_file.seek(0)
+                text_content = MemoryFileHandler.process_uploaded_file(uploaded_file, as_text=True)
+                if text_content:
+                    st.success(f"✅ Text mode: {len(text_content)} characters")
+            else:
+                st.error("❌ Failed to read file")
+        except Exception as e:
+            st.error(f"❌ Error: {str(e)}")
+st.info("""
+**Summary:**
+- The old FileUploadHandler saves files to /tmp which can trigger 403 errors
+- The new MemoryFileHandler processes files entirely in memory
+- To fix your app, replace all FileUploadHandler usage with MemoryFileHandler
+""")
+# Quick implementation guide
+with st.expander("📝 How to implement the fix in your app"):
+    st.code("""
+# Replace this:
+from web_app.utils import FileUploadHandler
+temp_path = FileUploadHandler.save_to_temp(uploaded_file)
+content = FileUploadHandler.read_from_temp(temp_path)
+# With this:
+from web_app.utils import MemoryFileHandler
+content = MemoryFileHandler.process_uploaded_file(uploaded_file)
+    """, language="python")

test_memory_upload.py ADDED Viewed

	@@ -0,0 +1,90 @@

+"""
+Test memory-based file upload approach
+"""
+import streamlit as st
+import sys
+import os
+sys.path.append(os.path.dirname(__file__))
+from web_app.utils.memory_file_handler import MemoryFileHandler
+st.set_page_config(page_title="Memory Upload Test", layout="wide")
+st.title("Memory-Based File Upload Test")
+st.write("This approach keeps files in memory to avoid filesystem 403 errors")
+# File upload
+uploaded_file = st.file_uploader(
+    "Upload a test file",
+    type=['txt', 'csv', 'tsv'],
+    help="Files are processed entirely in memory"
+)
+if uploaded_file:
+    st.write("### File Information")
+    col1, col2 = st.columns(2)
+    with col1:
+        st.write("**File Details:**")
+        st.write(f"- Name: {uploaded_file.name}")
+        st.write(f"- Size: {uploaded_file.size:,} bytes")
+        st.write(f"- Type: {uploaded_file.type}")
+    with col2:
+        st.write("**Processing Status:**")
+    # Test text processing
+    with st.expander("Test 1: Text Processing"):
+        try:
+            content = MemoryFileHandler.process_uploaded_file(uploaded_file, as_text=True)
+            if content:
+                st.success(f"✅ Successfully read {len(content):,} characters")
+                st.text_area("Content Preview", content[:500] + "...", height=200)
+            else:
+                st.error("Failed to read file")
+        except Exception as e:
+            st.error(f"Error: {str(e)}")
+    # Test binary processing
+    with st.expander("Test 2: Binary Processing"):
+        try:
+            content = MemoryFileHandler.process_uploaded_file(uploaded_file, as_text=False)
+            if content:
+                st.success(f"✅ Successfully read {len(content):,} bytes")
+                st.write(f"First 100 bytes: {content[:100]}")
+            else:
+                st.error("Failed to read file")
+        except Exception as e:
+            st.error(f"Error: {str(e)}")
+    # Test DataFrame processing
+    if uploaded_file.name.endswith(('.csv', '.tsv', '.txt')):
+        with st.expander("Test 3: DataFrame Processing"):
+            try:
+                df = MemoryFileHandler.process_csv_tsv_file(uploaded_file)
+                if df is not None:
+                    st.success(f"✅ Successfully parsed {len(df):,} rows")
+                    st.dataframe(df.head())
+                else:
+                    st.error("Failed to parse as DataFrame")
+            except Exception as e:
+                st.error(f"Error: {str(e)}")
+    # Test session storage
+    with st.expander("Test 4: Session Storage"):
+        try:
+            # Store in session
+            MemoryFileHandler.store_in_session(f"test_file_{uploaded_file.name}", uploaded_file.read())
+            st.success("✅ Stored in session")
+            # Retrieve from session
+            retrieved = MemoryFileHandler.retrieve_from_session(f"test_file_{uploaded_file.name}")
+            if retrieved:
+                st.success(f"✅ Retrieved {len(retrieved):,} bytes from session")
+            else:
+                st.error("Failed to retrieve from session")
+        except Exception as e:
+            st.error(f"Error: {str(e)}")
+st.info("💡 This approach processes files entirely in memory without touching the filesystem, avoiding 403 errors.")

update_to_memory_handler.py ADDED Viewed

	@@ -0,0 +1,113 @@

+#!/usr/bin/env python3
+"""
+Script to update all file upload handlers to use MemoryFileHandler
+This will prevent 403 errors on restricted environments like Hugging Face Spaces
+"""
+import os
+import re
+from pathlib import Path
+def update_file(filepath):
+    """Update a single file to use MemoryFileHandler."""
+    with open(filepath, 'r', encoding='utf-8') as f:
+        content = f.read()
+    original_content = content
+    # Update import statements
+    content = re.sub(
+        r'from web_app\.utils import FileUploadHandler',
+        'from web_app.utils import MemoryFileHandler',
+        content
+    )
+    # Update FileUploadHandler calls to MemoryFileHandler
+    content = re.sub(r'FileUploadHandler\.', 'MemoryFileHandler.', content)
+    # Update save_to_temp pattern
+    # Old pattern: temp_path = FileUploadHandler.save_to_temp(uploaded_file, prefix="...")
+    # New pattern: content = MemoryFileHandler.process_uploaded_file(uploaded_file)
+    # Replace save_to_temp and read_from_temp patterns
+    content = re.sub(
+        r'temp_path = MemoryFileHandler\.save_to_temp\(([^,]+), prefix="[^"]+"\)\s*'
+        r'if temp_path:\s*'
+        r'.*?= MemoryFileHandler\.read_from_temp\(temp_path\)',
+        r'content = MemoryFileHandler.process_uploaded_file(\1)',
+        content,
+        flags=re.DOTALL
+    )
+    # Replace validate_file_size (MemoryFileHandler doesn't need this as it checks inline)
+    content = re.sub(
+        r'if not MemoryFileHandler\.validate_file_size\([^)]+\):\s*return',
+        'if uploaded_file.size > 300 * 1024 * 1024:\n'
+        '                        st.error(f"File too large ({uploaded_file.size / 1024 / 1024:.1f} MB). Maximum allowed: 300MB")\n'
+        '                        return',
+        content
+    )
+    # Remove cleanup_temp_file calls
+    content = re.sub(
+        r'MemoryFileHandler\.cleanup_temp_file\([^)]+\)\s*',
+        '',
+        content
+    )
+    # Remove cleanup_old_temp_files calls
+    content = re.sub(
+        r'MemoryFileHandler\.cleanup_old_temp_files\([^)]+\)\s*',
+        '',
+        content
+    )
+    if content != original_content:
+        with open(filepath, 'w', encoding='utf-8') as f:
+            f.write(content)
+        print(f"✅ Updated: {filepath}")
+        return True
+    else:
+        print(f"⏭️  No changes needed: {filepath}")
+        return False
+def main():
+    """Main function to update all relevant files."""
+    # Files to update
+    files_to_update = [
+        'web_app/handlers/analysis_handlers.py',
+        'web_app/handlers/pos_handlers.py',
+        'web_app/handlers/frequency_handlers.py',
+        'web_app/reference_manager.py',
+        'web_app/config_manager.py',
+        'web_app/debug_utils.py'
+    ]
+    updated_count = 0
+    print("🔄 Updating file handlers to use MemoryFileHandler...")
+    print("=" * 60)
+    for file_path in files_to_update:
+        if os.path.exists(file_path):
+            if update_file(file_path):
+                updated_count += 1
+        else:
+            print(f"❌ File not found: {file_path}")
+    print("=" * 60)
+    print(f"✅ Updated {updated_count} files")
+    # Create a backup of the old FileUploadHandler
+    old_handler_path = 'web_app/utils/file_upload_handler.py'
+    backup_path = 'web_app/utils/file_upload_handler.py.backup'
+    if os.path.exists(old_handler_path) and not os.path.exists(backup_path):
+        import shutil
+        shutil.copy2(old_handler_path, backup_path)
+        print(f"📋 Created backup: {backup_path}")
+if __name__ == "__main__":
+    main()

web_app/components/comparison_functions.py CHANGED Viewed

@@ -8,7 +8,7 @@ import pandas as pd
 import numpy as np
 import plotly.graph_objects as go
 from scipy import stats
-from web_app.utils import FileUploadHandler
 def get_text_input(label, key_suffix):
@@ -30,29 +30,12 @@ def get_text_input(label, key_suffix):
         )
         if uploaded_file:
             try:
-                # Use temp file approach for HF Spaces compatibility
-                temp_path = FileUploadHandler.save_to_temp(uploaded_file, prefix="compare")
-                if not temp_path:
-                    st.error("Failed to save uploaded file. Please try again.")
                     return ""
-                # Read content from temp file
-                content = FileUploadHandler.read_from_temp(temp_path)
-                if isinstance(content, bytes):
-                    try:
-                        text_content = content.decode('utf-8')
-                    except UnicodeDecodeError:
-                        try:
-                            text_content = content.decode('utf-16')
-                        except UnicodeDecodeError:
-                            st.error("Unable to decode file. Please ensure it's a valid UTF-8 or UTF-16 text file.")
-                            return ""
-                else:
-                    text_content = content
-                # Cleanup temp file
-                FileUploadHandler.cleanup_temp_file(temp_path)
             except Exception as e:
                 st.error(f"Error reading uploaded file: {str(e)}")
                 return ""

 import numpy as np
 import plotly.graph_objects as go
 from scipy import stats
+from web_app.utils import MemoryFileHandler
 def get_text_input(label, key_suffix):
         )
         if uploaded_file:
             try:
+                # Use memory-based approach to avoid filesystem restrictions
+                text_content = MemoryFileHandler.process_uploaded_file(uploaded_file, as_text=True)
+                if not text_content:
+                    st.error("Failed to read uploaded file. Please try again.")
                     return ""
             except Exception as e:
                 st.error(f"Error reading uploaded file: {str(e)}")
                 return ""

web_app/components/ui_components.py CHANGED Viewed

@@ -7,7 +7,7 @@ import streamlit as st
 import pandas as pd
 from typing import Dict, List, Any, Optional, Tuple
 from pathlib import Path
-from web_app.utils import FileUploadHandler
 from web_app.config_manager import ConfigManager
 from web_app.session_manager import SessionManager
@@ -152,29 +152,12 @@ class UIComponents:
             )
             if uploaded_file:
                 try:
-                    # Use temp file approach for HF Spaces compatibility
-                    temp_path = FileUploadHandler.save_to_temp(uploaded_file, prefix="text")
-                    if not temp_path:
-                        st.error("Failed to save uploaded file. Please try again.")
                         return ""
-                    # Read content from temp file
-                    content = FileUploadHandler.read_from_temp(temp_path)
-                    if isinstance(content, bytes):
-                        try:
-                            text_content = content.decode('utf-8')
-                        except UnicodeDecodeError:
-                            try:
-                                text_content = content.decode('utf-16')
-                            except UnicodeDecodeError:
-                                st.error("Unable to decode file. Please ensure it's a valid UTF-8 or UTF-16 text file.")
-                                return ""
-                    else:
-                        text_content = content
-                    # Cleanup temp file
-                    FileUploadHandler.cleanup_temp_file(temp_path)
                 except Exception as e:
                     st.error(f"Error reading uploaded file: {str(e)}")
                     return ""

 import pandas as pd
 from typing import Dict, List, Any, Optional, Tuple
 from pathlib import Path
+from web_app.utils import MemoryFileHandler
 from web_app.config_manager import ConfigManager
 from web_app.session_manager import SessionManager
             )
             if uploaded_file:
                 try:
+                    # Use memory-based approach to avoid filesystem restrictions
+                    text_content = MemoryFileHandler.process_uploaded_file(uploaded_file, as_text=True)
+                    if not text_content:
+                        st.error("Failed to read uploaded file. Please try again.")
                         return ""
                 except Exception as e:
                     st.error(f"Error reading uploaded file: {str(e)}")
                     return ""

web_app/handlers/analysis_handlers.py CHANGED Viewed

@@ -1,112 +1,73 @@
 """
-Analysis handlers module for different types of text analysis.
-Handles single text, batch, and comparison analysis workflows.
 """
 import streamlit as st
 import pandas as pd
-import numpy as np
-import plotly.graph_objects as go
-from scipy import stats
-import tempfile
-import os
-from typing import Dict, List, Any, Optional, Tuple
-from pathlib import Path
-import zipfile
 import time
 from web_app.session_manager import SessionManager
 from web_app.components.ui_components import UIComponents
-from web_app.components.comparison_functions import get_text_input, display_comparison_results
-from web_app.reference_manager import ReferenceManager
-from web_app.utils import FileUploadHandler
 class AnalysisHandlers:
-    """Handles different types of text analysis workflows."""
     @staticmethod
-    def get_analyzer():
-        """Get or create lexical sophistication analyzer."""
-        if (st.session_state.analyzer is None or
-            st.session_state.analyzer.language != st.session_state.language or
-            st.session_state.analyzer.model_size != st.session_state.model_size):
-            try:
-                from text_analyzer.lexical_sophistication import LexicalSophisticationAnalyzer
-                st.session_state.analyzer = LexicalSophisticationAnalyzer(
-                    language=st.session_state.language,
-                    model_size=st.session_state.model_size
                 )
-            except Exception as e:
-                st.error(f"Error loading analyzer: {e}")
-                return None
-        return st.session_state.analyzer
-    @staticmethod
-    def get_pos_parser():
-        """Get or create POS parser."""
-        if (st.session_state.pos_parser is None or
-            st.session_state.pos_parser.language != st.session_state.language or
-            st.session_state.pos_parser.model_size != st.session_state.model_size):
-            try:
-                from text_analyzer.pos_parser import POSParser
-                st.session_state.pos_parser = POSParser(
-                    language=st.session_state.language,
-                    model_size=st.session_state.model_size
                 )
-            except Exception as e:
-                st.error(f"Error loading POS parser: {e}")
-                return None
-        return st.session_state.pos_parser
     @staticmethod
-    def handle_single_text_analysis(analyzer):
-        """Handle single text analysis workflow."""
-        st.subheader("Single Text Analysis")
-        # Text input
-        text_content = UIComponents.render_text_input("text to analyze", "single")
-        if not text_content:
-            st.info("Please provide text to analyze.")
-            return
-        # Reference list configuration
-        st.subheader("Reference Lists")
-        ReferenceManager.configure_reference_lists(analyzer)
-        ReferenceManager.render_custom_upload_section()
-        # Analysis options
-        apply_log, word_type_filter = UIComponents.render_analysis_options()
-        # Analysis button
-        if st.button("Analyze Text", type="primary"):
-            reference_lists = SessionManager.get_reference_lists()
-            if not reference_lists:
-                st.warning("Please select or upload reference lists first.")
-                return
-            with st.spinner("Analyzing text..."):
-                try:
-                    # Load reference lists
-                    analyzer.load_reference_lists(reference_lists)
-                    # Perform analysis
-                    results = analyzer.analyze_text(
-                        text_content,
-                        list(reference_lists.keys()),
-                        apply_log,
-                        word_type_filter
-                    )
-                    # Display results
-                    AnalysisHandlers.display_single_text_results(results)
-                except Exception as e:
-                    st.error(f"Error during analysis: {e}")
     @staticmethod
-    def handle_batch_analysis(analyzer):
-        """Handle batch analysis workflow."""
         st.subheader("Batch Analysis")
         # File upload
@@ -118,312 +79,528 @@ class AnalysisHandlers:
         )
         if not uploaded_files:
-            st.info("Please upload text files for batch analysis.")
             return
-        # Reference list configuration
-        st.subheader("Reference Lists")
-        ReferenceManager.configure_reference_lists(analyzer)
-        ReferenceManager.render_custom_upload_section()
-        # Analysis options
-        apply_log = st.checkbox("Apply log₁₀ transformation", key="batch_log")
-        # Analysis button
-        if st.button("Analyze Batch", type="primary"):
-            reference_lists = SessionManager.get_reference_lists()
-            if not reference_lists:
-                st.warning("Please select or upload reference lists first.")
                 return
-            with st.spinner("Processing files..."):
-                try:
-                    # Extract files
-                    file_contents = AnalysisHandlers.extract_uploaded_files(uploaded_files)
-                    if not file_contents:
-                        st.error("No valid .txt files found in uploaded files.")
-                        return
-                    st.info(f"Found {len(file_contents)} files to process.")
-                    # Load reference lists
-                    analyzer.load_reference_lists(reference_lists)
-                    # Create progress tracking
-                    progress_bar = st.progress(0)
-                    status_text = st.empty()
-                    # Process files in memory
-                    batch_results = []
-                    selected_indices = list(reference_lists.keys())
-                    for i, (filename, text_content) in enumerate(file_contents):
-                        # Update progress
-                        progress = (i + 1) / len(file_contents)
-                        progress_bar.progress(progress)
-                        status_text.text(f"Processing file {i + 1}/{len(file_contents)}: {filename}")
-                        try:
-                            # Analyze for both content and function words
-                            result_row = {'filename': filename}
-                            for word_type in ['CW', 'FW']:
-                                analysis = analyzer.analyze_text(text_content, selected_indices, apply_log, word_type)
-                                # Extract summary scores
-                                if analysis and 'summary' in analysis:
-                                    for index, stats in analysis['summary'].items():
-                                        col_name = f"{index}_{word_type}"
-                                        result_row[col_name] = stats['mean']
-                            batch_results.append(result_row)
-                        except Exception as e:
-                            st.warning(f"Error analyzing {filename}: {e}")
-                            continue
-                    # Convert to DataFrame
-                    results_df = pd.DataFrame(batch_results)
-                    # Display results
-                    st.success(f"Analysis complete! Processed {len(results_df)} files.")
-                    st.subheader("Results")
-                    st.dataframe(results_df, use_container_width=True)
-                    # Download link
-                    csv_data = results_df.to_csv(index=False)
-                    st.download_button(
-                        label="Download Results (CSV)",
-                        data=csv_data,
-                        file_name="lexical_sophistication_results.csv",
-                        mime="text/csv"
-                    )
-                except Exception as e:
-                    st.error(f"Error during batch analysis: {e}")
     @staticmethod
-    def handle_two_text_comparison(analyzer):
-        """Handle two-text comparison analysis."""
-        st.subheader("Two-Text Comparison")
-        # Create two columns for text input
-        col_a, col_b = st.columns(2)
-        with col_a:
-            st.subheader("📄 Text A")
-            text_a = get_text_input("Text A", "a")
-        with col_b:
-            st.subheader("📄 Text B")
-            text_b = get_text_input("Text B", "b")
-        # Check if both texts are provided
-        if not text_a or not text_b:
-            st.info("Please provide both texts to compare.")
-            return
-        # Reference list configuration
-        st.subheader("Reference Lists")
-        ReferenceManager.configure_reference_lists(analyzer)
-        ReferenceManager.render_custom_upload_section()
-        # Analysis options
-        col1, col2 = st.columns(2)
         with col1:
-            apply_log = st.checkbox("Apply log₁₀ transformation", key="comparison_log")
         with col2:
-            word_type_filter = st.selectbox(
-                "Word Type Filter",
-                options=[None, 'CW', 'FW'],
-                format_func=lambda x: 'All Words' if x is None else ('Content Words' if x == 'CW' else 'Function Words'),
-                key="comparison_word_type"
             )
-        # Analysis button
-        if st.button("🔍 Compare Texts", type="primary"):
-            reference_lists = SessionManager.get_reference_lists()
-            if not reference_lists:
-                st.warning("Please select or upload reference lists first.")
-                return
-            with st.spinner("Analyzing texts..."):
-                try:
-                    # Load reference lists
-                    analyzer.load_reference_lists(reference_lists)
-                    # Perform analysis on both texts
-                    selected_indices = list(reference_lists.keys())
-                    results_a = analyzer.analyze_text(text_a, selected_indices, apply_log, word_type_filter)
-                    results_b = analyzer.analyze_text(text_b, selected_indices, apply_log, word_type_filter)
-                    # Display comparison results
-                    display_comparison_results(results_a, results_b)
-                except Exception as e:
-                    st.error(f"Error during comparison: {e}")
     @staticmethod
-    def extract_uploaded_files(uploaded_files) -> List[Tuple[str, str]]:
-        """Extract uploaded files and return list of (filename, content) tuples."""
-        file_contents = []
-        temp_paths = []  # Track temp files for cleanup
-        try:
-            for uploaded_file in uploaded_files:
-                if uploaded_file.name.endswith('.zip'):
-                    # Handle ZIP files using temp file approach
-                    zip_file = FileUploadHandler.handle_zip_file(uploaded_file)
-                    if not zip_file:
-                        continue
-                    with zip_file as zip_ref:
-                        for file_info in zip_ref.infolist():
-                            if file_info.filename.endswith('.txt'):
-                                try:
-                                    content = zip_ref.read(file_info.filename)
-                                    # Decode content
-                                    try:
-                                        text_content = content.decode('utf-8')
-                                    except UnicodeDecodeError:
-                                        try:
-                                            text_content = content.decode('utf-16')
-                                        except UnicodeDecodeError:
-                                            st.error(f"Unable to decode file {file_info.filename}. Skipping.")
-                                            continue
-                                    file_contents.append((file_info.filename, text_content))
-                                except Exception as e:
-                                    st.error(f"Cannot read {file_info.filename}: {e}")
-                                    continue
-                elif uploaded_file.name.endswith('.txt'):
-                    # Handle individual text files using temp file approach
-                    try:
-                        # Save to temp and read content
-                        temp_path = FileUploadHandler.save_to_temp(uploaded_file, prefix="analysis")
-                        if not temp_path:
-                            st.error(f"Failed to save file {uploaded_file.name}")
-                            continue
-                        temp_paths.append(temp_path)
-                        # Read content with encoding handling
-                        content = FileUploadHandler.read_from_temp(temp_path)
-                        if isinstance(content, bytes):
-                            try:
-                                text_content = content.decode('utf-8')
-                            except UnicodeDecodeError:
-                                try:
-                                    text_content = content.decode('utf-16')
-                                except UnicodeDecodeError:
-                                    st.error(f"Unable to decode file {uploaded_file.name}. Skipping.")
-                                    continue
-                        else:
-                            text_content = content
-                        file_contents.append((uploaded_file.name, text_content))
-                    except Exception as e:
-                        st.error(f"Cannot read file {uploaded_file.name}: {e}")
-                        continue
-                else:
-                    st.warning(f"Skipping {uploaded_file.name}: Not a .txt or .zip file")
-            return file_contents
-        finally:
-            # Cleanup temp files
-            for temp_path in temp_paths:
-                FileUploadHandler.cleanup_temp_file(temp_path)
     @staticmethod
-    def display_single_text_results(results: Dict[str, Any]):
-        """Display results for single text analysis."""
-        st.subheader("Analysis Results")
-        # Summary results
-        if results['summary']:
-            st.write("**Summary Statistics**")
-            summary_data = []
-            for key, stats in results['summary'].items():
-                summary_data.append({
-                    'Index': key,
-                    'Mean': round(stats['mean'], 3),
-                    'Std Dev': round(stats['std'], 3),
-                    'Count': stats['count'],
-                    'Min': round(stats['min'], 3),
-                    'Max': round(stats['max'], 3)
-                })
-            summary_df = pd.DataFrame(summary_data)
-            st.dataframe(summary_df, use_container_width=True)
-        # Token details
-        if results['token_details']:
-            st.write("**Token Analysis**")
-            token_df = pd.DataFrame(results['token_details'])
-            st.dataframe(token_df, use_container_width=True)
-            # Download token details
-            csv_data = token_df.to_csv(index=False)
             st.download_button(
-                label="Download Token Details (CSV)",
-                data=csv_data,
-                file_name="token_analysis.csv",
                 mime="text/csv"
             )
-        # Bigram and trigram details
-        for detail_type in ['bigram_details', 'trigram_details']:
-            if results.get(detail_type):
-                st.write(f"**{detail_type.replace('_', ' ').title()}**")
-                detail_df = pd.DataFrame(results[detail_type])
-                st.dataframe(detail_df, use_container_width=True)
-        # Density plots
-        if results['summary']:
-            st.write("**Score Distribution Plots**")
-            AnalysisHandlers.create_density_plots(results)
     @staticmethod
-    def create_density_plots(results: Dict[str, Any]):
-        """Create density plots for score distributions."""
-        if 'raw_scores' not in results:
             return
-        for key, scores in results['raw_scores'].items():
-            if len(scores) > 1:  # Need at least 2 points for density
-                # Create histogram with density curve
-                fig = go.Figure()
-                # Add histogram
-                fig.add_trace(go.Histogram(
-                    x=scores,
-                    nbinsx=min(30, len(scores)),
-                    name='Histogram',
-                    opacity=0.7,
-                    histnorm='probability density'
-                ))
-                # Calculate and add KDE curve
-                kde = stats.gaussian_kde(scores)
-                x_range = np.linspace(min(scores), max(scores), 100)
-                kde_values = kde(x_range)
-                fig.add_trace(go.Scatter(
-                    x=x_range,
-                    y=kde_values,
-                    mode='lines',
-                    name='Density',
-                    line=dict(color='red', width=2)
-                ))
-                # Update layout
-                fig.update_layout(
-                    title=f"Distribution of {key}",
-                    xaxis_title="Score",
-                    yaxis_title="Density",
-                    template='plotly_white',
-                    showlegend=True,
-                    bargap=0.05
                 )
-                st.plotly_chart(fig, use_container_width=True)

 """
+Analysis Handlers for Streamlit Interface - Updated with MemoryFileHandler
 """
 import streamlit as st
 import pandas as pd
+from typing import List, Tuple, Dict, Optional
 import time
+from datetime import datetime
+import zipfile
+from io import BytesIO, StringIO
+import sys
+import os
+# Add parent directory to path for imports
+sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
+from text_analyzer.lexical_sophistication import LexicalSophisticationAnalyzer
 from web_app.session_manager import SessionManager
 from web_app.components.ui_components import UIComponents
+from web_app.utils import MemoryFileHandler
 class AnalysisHandlers:
+    """
+    Handles analysis-related UI components and workflows.
+    Updated to use MemoryFileHandler for better compatibility.
+    """
     @staticmethod
+    def handle_single_text_analysis(analyzer: LexicalSophisticationAnalyzer):
+        """Handle single text analysis workflow."""
+        # Get text input
+        text_content = UIComponents.render_text_input_section("Text for Analysis", "single_text")
+        if text_content and st.button("Analyze Text", type="primary", key="analyze_single"):
+            with st.spinner("Analyzing text..."):
+                start_time = time.time()
+                # Get analysis parameters
+                params = SessionManager.get_analysis_params()
+                # Run analysis
+                metrics = analyzer.analyze_text(
+                    text_content,
+                    include_pos_info=params['include_pos'],
+                    min_word_length=params['min_word_length']
                 )
+                analysis_time = time.time() - start_time
+                # Store results
+                SessionManager.store_analysis_results(
+                    'single_text',
+                    metrics,
+                    {'analysis_time': analysis_time}
                 )
+                # Display results
+                AnalysisHandlers._display_analysis_results(metrics, analysis_time)
     @staticmethod
+    def handle_comparison_analysis(analyzer: LexicalSophisticationAnalyzer):
+        """Handle text comparison workflow."""
+        from web_app.components.comparison_functions import render_comparison_interface
+        render_comparison_interface(analyzer)
     @staticmethod
+    def handle_batch_analysis(analyzer: LexicalSophisticationAnalyzer):
+        """Handle batch analysis workflow with memory-based file handling."""
         st.subheader("Batch Analysis")
         # File upload
         )
         if not uploaded_files:
+            st.info("Please upload text files or a ZIP archive to begin batch analysis.")
             return
+        # Analysis parameters
+        with st.expander("Analysis Parameters", expanded=True):
+            params = AnalysisHandlers._render_batch_analysis_params()
+        if st.button("Start Batch Analysis", type="primary"):
+            # Process files
+            file_contents = AnalysisHandlers._process_batch_files_memory(uploaded_files)
+            if not file_contents:
+                st.error("No valid text files found.")
                 return
+            # Run batch analysis
+            AnalysisHandlers._run_batch_analysis(analyzer, file_contents, params)
     @staticmethod
+    def _process_batch_files_memory(uploaded_files) -> List[Tuple[str, str]]:
+        """
+        Process uploaded files for batch analysis using memory-based approach.
+        Returns:
+            List of tuples (filename, content)
+        """
+        file_contents = []
+        with st.spinner("Processing uploaded files..."):
+            progress_bar = st.progress(0)
+            total_files = len(uploaded_files)
+            for idx, uploaded_file in enumerate(uploaded_files):
+                try:
+                    if uploaded_file.name.endswith('.zip'):
+                        # Handle ZIP files
+                        zip_contents = MemoryFileHandler.handle_zip_file(uploaded_file)
+                        if zip_contents:
+                            for filename, content in zip_contents.items():
+                                if filename.endswith('.txt'):
+                                    try:
+                                        # Decode bytes to text
+                                        if isinstance(content, bytes):
+                                            text_content = content.decode('utf-8')
+                                        else:
+                                            text_content = content
+                                        file_contents.append((filename, text_content))
+                                    except UnicodeDecodeError:
+                                        st.warning(f"Skipping {filename}: Unable to decode as UTF-8")
+                    elif uploaded_file.name.endswith('.txt'):
+                        # Handle individual text files
+                        text_content = MemoryFileHandler.process_uploaded_file(uploaded_file, as_text=True)
+                        if text_content:
+                            file_contents.append((uploaded_file.name, text_content))
+                        else:
+                            st.warning(f"Could not read {uploaded_file.name}")
+                except Exception as e:
+                    st.error(f"Error processing {uploaded_file.name}: {str(e)}")
+                # Update progress
+                progress_bar.progress((idx + 1) / total_files)
+            progress_bar.empty()
+        st.success(f"Processed {len(file_contents)} text files")
+        return file_contents
+    @staticmethod
+    def _render_batch_analysis_params() -> dict:
+        """Render batch analysis parameters."""
+        col1, col2, col3 = st.columns(3)
         with col1:
+            include_pos = st.checkbox(
+                "Include POS Analysis",
+                value=True,
+                help="Include part-of-speech tagging (slower but more detailed)"
+            )
+            min_word_length = st.number_input(
+                "Minimum Word Length",
+                min_value=1,
+                max_value=10,
+                value=1,
+                help="Exclude words shorter than this length"
+            )
         with col2:
+            analyze_readability = st.checkbox(
+                "Analyze Readability",
+                value=True,
+                help="Include readability metrics"
             )
+            analyze_diversity = st.checkbox(
+                "Analyze Diversity",
+                value=True,
+                help="Include lexical diversity metrics"
+            )
+        with col3:
+            export_format = st.selectbox(
+                "Export Format",
+                ["CSV", "Excel", "JSON"],
+                help="Format for results export"
+            )
+            include_raw_data = st.checkbox(
+                "Include Raw Data",
+                value=False,
+                help="Include word lists in export"
+            )
+        return {
+            'include_pos': include_pos,
+            'min_word_length': min_word_length,
+            'analyze_readability': analyze_readability,
+            'analyze_diversity': analyze_diversity,
+            'export_format': export_format,
+            'include_raw_data': include_raw_data
+        }
     @staticmethod
+    def _run_batch_analysis(analyzer: LexicalSophisticationAnalyzer,
+                          file_contents: List[Tuple[str, str]],
+                          params: dict):
+        """Run batch analysis on multiple files."""
+        results = []
+        # Progress tracking
+        progress_bar = st.progress(0)
+        status_text = st.empty()
+        start_time = time.time()
+        for idx, (filename, content) in enumerate(file_contents):
+            status_text.text(f"Analyzing {filename}...")
+            try:
+                # Analyze text
+                metrics = analyzer.analyze_text(
+                    content,
+                    include_pos_info=params['include_pos'],
+                    min_word_length=params['min_word_length']
+                )
+                # Add filename to results
+                metrics['filename'] = filename
+                results.append(metrics)
+            except Exception as e:
+                st.error(f"Error analyzing {filename}: {str(e)}")
+                continue
+            # Update progress
+            progress_bar.progress((idx + 1) / len(file_contents))
+        # Clear progress indicators
+        progress_bar.empty()
+        status_text.empty()
+        total_time = time.time() - start_time
+        # Display results
+        AnalysisHandlers._display_batch_results(results, params, total_time)
     @staticmethod
+    def _display_batch_results(results: List[dict], params: dict, total_time: float):
+        """Display batch analysis results."""
+        st.success(f"✅ Analyzed {len(results)} files in {total_time:.1f} seconds")
+        if not results:
+            return
+        # Create results DataFrame
+        df_results = pd.DataFrame(results)
+        # Reorder columns for better display
+        priority_cols = ['filename', 'total_words', 'unique_words', 'avg_word_length',
+                        'lexical_diversity', 'avg_word_frequency']
+        other_cols = [col for col in df_results.columns if col not in priority_cols]
+        ordered_cols = [col for col in priority_cols if col in df_results.columns] + other_cols
+        df_results = df_results[ordered_cols]
+        # Display options
+        col1, col2 = st.columns([3, 1])
+        with col1:
+            st.subheader("Analysis Results")
+        with col2:
+            display_mode = st.radio("Display", ["Table", "Charts"], horizontal=True)
+        if display_mode == "Table":
+            # Display as table
+            st.dataframe(
+                df_results,
+                use_container_width=True,
+                hide_index=True,
+                height=400
+            )
+            # Summary statistics
+            with st.expander("Summary Statistics"):
+                st.write(df_results.describe())
+        else:
+            # Display as charts
+            AnalysisHandlers._render_batch_charts(df_results)
+        # Export results
+        st.subheader("Export Results")
+        col1, col2, col3 = st.columns(3)
+        with col1:
+            # CSV export
+            csv = df_results.to_csv(index=False)
             st.download_button(
+                label="📥 Download as CSV",
+                data=csv,
+                file_name=f"batch_analysis_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv",
                 mime="text/csv"
             )
+        with col2:
+            # Excel export (using in-memory buffer)
+            if params['export_format'] == "Excel":
+                excel_buffer = BytesIO()
+                with pd.ExcelWriter(excel_buffer, engine='openpyxl') as writer:
+                    df_results.to_excel(writer, sheet_name='Results', index=False)
+                    # Add summary sheet
+                    df_summary = df_results.describe()
+                    df_summary.to_excel(writer, sheet_name='Summary')
+                excel_data = excel_buffer.getvalue()
+                st.download_button(
+                    label="📥 Download as Excel",
+                    data=excel_data,
+                    file_name=f"batch_analysis_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.xlsx",
+                    mime="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+                )
+        with col3:
+            # JSON export
+            if params['export_format'] == "JSON":
+                json_str = df_results.to_json(orient='records', indent=2)
+                st.download_button(
+                    label="📥 Download as JSON",
+                    data=json_str,
+                    file_name=f"batch_analysis_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json",
+                    mime="application/json"
+                )
     @staticmethod
+    def _render_batch_charts(df_results: pd.DataFrame):
+        """Render charts for batch analysis results."""
+        import plotly.express as px
+        # Select metrics for visualization
+        numeric_cols = df_results.select_dtypes(include=['float64', 'int64']).columns
+        numeric_cols = [col for col in numeric_cols if col not in ['index']]
+        if len(numeric_cols) == 0:
+            st.warning("No numeric data available for visualization")
             return
+        # Metric selection
+        col1, col2 = st.columns(2)
+        with col1:
+            x_metric = st.selectbox("X-axis metric", numeric_cols, index=0)
+        with col2:
+            y_metric = st.selectbox("Y-axis metric", numeric_cols,
+                                  index=min(1, len(numeric_cols)-1))
+        # Create scatter plot
+        fig = px.scatter(
+            df_results,
+            x=x_metric,
+            y=y_metric,
+            hover_data=['filename'],
+            title=f"{y_metric} vs {x_metric}",
+            labels={x_metric: x_metric.replace('_', ' ').title(),
+                   y_metric: y_metric.replace('_', ' ').title()}
+        )
+        fig.update_traces(marker=dict(size=10))
+        fig.update_layout(height=500)
+        st.plotly_chart(fig, use_container_width=True)
+        # Distribution plots
+        st.subheader("Metric Distributions")
+        selected_metric = st.selectbox(
+            "Select metric for distribution",
+            numeric_cols,
+            key="dist_metric"
+        )
+        col1, col2 = st.columns(2)
+        with col1:
+            # Histogram
+            fig_hist = px.histogram(
+                df_results,
+                x=selected_metric,
+                nbins=20,
+                title=f"Distribution of {selected_metric.replace('_', ' ').title()}"
+            )
+            fig_hist.update_layout(height=400)
+            st.plotly_chart(fig_hist, use_container_width=True)
+        with col2:
+            # Box plot
+            fig_box = px.box(
+                df_results,
+                y=selected_metric,
+                title=f"Box Plot of {selected_metric.replace('_', ' ').title()}",
+                points="all"
+            )
+            fig_box.update_layout(height=400)
+            st.plotly_chart(fig_box, use_container_width=True)
+    @staticmethod
+    def _display_analysis_results(metrics: dict, analysis_time: float):
+        """Display single text analysis results."""
+        st.success(f"✅ Analysis completed in {analysis_time:.2f} seconds")
+        # Render results in tabs
+        tab1, tab2, tab3, tab4 = st.tabs([
+            "📊 Overview",
+            "📈 Frequency Analysis",
+            "🎯 Advanced Metrics",
+            "📋 Raw Data"
+        ])
+        with tab1:
+            AnalysisHandlers._render_overview_metrics(metrics)
+        with tab2:
+            AnalysisHandlers._render_frequency_analysis(metrics)
+        with tab3:
+            AnalysisHandlers._render_advanced_metrics(metrics)
+        with tab4:
+            AnalysisHandlers._render_raw_data(metrics)
+    @staticmethod
+    def _render_overview_metrics(metrics: dict):
+        """Render overview metrics."""
+        col1, col2, col3, col4 = st.columns(4)
+        with col1:
+            st.metric("Total Words", f"{metrics.get('total_words', 0):,}")
+            st.metric("Sentences", f"{metrics.get('sentence_count', 0):,}")
+        with col2:
+            st.metric("Unique Words", f"{metrics.get('unique_words', 0):,}")
+            st.metric("Avg Sentence Length", f"{metrics.get('avg_sentence_length', 0):.1f}")
+        with col3:
+            st.metric("Lexical Diversity", f"{metrics.get('lexical_diversity', 0):.3f}")
+            st.metric("Avg Word Length", f"{metrics.get('avg_word_length', 0):.2f}")
+        with col4:
+            st.metric("Readability (Flesch)", f"{metrics.get('flesch_reading_ease', 0):.1f}")
+            st.metric("Grade Level", f"{metrics.get('flesch_kincaid_grade', 0):.1f}")
+    @staticmethod
+    def _render_frequency_analysis(metrics: dict):
+        """Render frequency analysis section."""
+        import plotly.graph_objects as go
+        if 'frequency_distribution' not in metrics:
+            st.info("Frequency distribution data not available")
+            return
+        freq_dist = metrics['frequency_distribution']
+        # Prepare data for visualization
+        words = list(freq_dist.keys())[:30]  # Top 30 words
+        frequencies = [freq_dist[word] for word in words]
+        # Create bar chart
+        fig = go.Figure(data=[
+            go.Bar(x=words, y=frequencies)
+        ])
+        fig.update_layout(
+            title="Top 30 Most Frequent Words",
+            xaxis_title="Words",
+            yaxis_title="Frequency",
+            height=500
+        )
+        st.plotly_chart(fig, use_container_width=True)
+        # Word frequency statistics
+        col1, col2 = st.columns(2)
+        with col1:
+            st.write("**Frequency Statistics:**")
+            st.write(f"• Most common word: {words[0]} ({frequencies[0]} times)")
+            st.write(f"• Hapax legomena: {sum(1 for f in freq_dist.values() if f == 1)} words")
+            st.write(f"• Words appearing 2+ times: {sum(1 for f in freq_dist.values() if f >= 2)}")
+        with col2:
+            st.write("**Coverage Analysis:**")
+            total_words = sum(freq_dist.values())
+            top10_coverage = sum(frequencies[:10]) / total_words * 100
+            top30_coverage = sum(frequencies[:30]) / total_words * 100
+            st.write(f"• Top 10 words: {top10_coverage:.1f}% of text")
+            st.write(f"• Top 30 words: {top30_coverage:.1f}% of text")
+    @staticmethod
+    def _render_advanced_metrics(metrics: dict):
+        """Render advanced metrics section."""
+        # POS distribution if available
+        if 'pos_distribution' in metrics:
+            st.subheader("Part-of-Speech Distribution")
+            pos_dist = metrics['pos_distribution']
+            if pos_dist:
+                import plotly.express as px
+                # Prepare data
+                pos_df = pd.DataFrame([
+                    {'POS': pos, 'Count': count}
+                    for pos, count in pos_dist.items()
+                ])
+                pos_df = pos_df.sort_values('Count', ascending=False)
+                # Create pie chart
+                fig = px.pie(
+                    pos_df,
+                    values='Count',
+                    names='POS',
+                    title="Part-of-Speech Distribution"
                 )
+                st.plotly_chart(fig, use_container_width=True)
+        # Sophistication metrics
+        st.subheader("Sophistication Metrics")
+        col1, col2 = st.columns(2)
+        with col1:
+            if 'avg_word_frequency' in metrics:
+                st.metric(
+                    "Average Word Frequency",
+                    f"{metrics['avg_word_frequency']:.2f}",
+                    help="Average frequency of words in reference corpus"
+                )
+            if 'academic_words_ratio' in metrics:
+                st.metric(
+                    "Academic Words Ratio",
+                    f"{metrics['academic_words_ratio']:.2%}",
+                    help="Percentage of academic vocabulary"
+                )
+        with col2:
+            if 'rare_words_ratio' in metrics:
+                st.metric(
+                    "Rare Words Ratio",
+                    f"{metrics['rare_words_ratio']:.2%}",
+                    help="Percentage of infrequent words"
+                )
+            if 'lexical_sophistication_score' in metrics:
+                st.metric(
+                    "Sophistication Score",
+                    f"{metrics['lexical_sophistication_score']:.3f}",
+                    help="Overall lexical sophistication"
+                )
+    @staticmethod
+    def _render_raw_data(metrics: dict):
+        """Render raw data section."""
+        st.write("**Available Metrics:**")
+        # Display all metrics in an expandable format
+        for key, value in metrics.items():
+            if isinstance(value, (dict, list)) and len(str(value)) > 100:
+                with st.expander(f"{key} (complex data)"):
+                    if isinstance(value, dict):
+                        st.json(value)
+                    else:
+                        st.write(value)
+            else:
+                st.write(f"• **{key}:** {value}")
+        # Export options
+        st.subheader("Export Data")
+        # Prepare export data
+        export_data = {k: v for k, v in metrics.items()
+                      if not isinstance(v, (dict, list)) or k in ['pos_distribution']}
+        col1, col2 = st.columns(2)
+        with col1:
+            # JSON export
+            json_str = pd.Series(export_data).to_json(indent=2)
+            st.download_button(
+                label="📥 Download as JSON",
+                data=json_str,
+                file_name=f"analysis_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json",
+                mime="application/json"
+            )
+        with col2:
+            # CSV export
+            df_export = pd.DataFrame([export_data])
+            csv = df_export.to_csv(index=False)
+            st.download_button(
+                label="📥 Download as CSV",
+                data=csv,
+                file_name=f"analysis_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv",
+                mime="text/csv"
+            )

web_app/handlers/analysis_handlers.py.backup_20250726_162020 ADDED Viewed

	@@ -0,0 +1,429 @@

+"""
+Analysis handlers module for different types of text analysis.
+Handles single text, batch, and comparison analysis workflows.
+"""
+import streamlit as st
+import pandas as pd
+import numpy as np
+import plotly.graph_objects as go
+from scipy import stats
+import tempfile
+import os
+from typing import Dict, List, Any, Optional, Tuple
+from pathlib import Path
+import zipfile
+import time
+from web_app.session_manager import SessionManager
+from web_app.components.ui_components import UIComponents
+from web_app.components.comparison_functions import get_text_input, display_comparison_results
+from web_app.reference_manager import ReferenceManager
+from web_app.utils import FileUploadHandler
+class AnalysisHandlers:
+    """Handles different types of text analysis workflows."""
+    @staticmethod
+    def get_analyzer():
+        """Get or create lexical sophistication analyzer."""
+        if (st.session_state.analyzer is None or
+            st.session_state.analyzer.language != st.session_state.language or
+            st.session_state.analyzer.model_size != st.session_state.model_size):
+            try:
+                from text_analyzer.lexical_sophistication import LexicalSophisticationAnalyzer
+                st.session_state.analyzer = LexicalSophisticationAnalyzer(
+                    language=st.session_state.language,
+                    model_size=st.session_state.model_size
+                )
+            except Exception as e:
+                st.error(f"Error loading analyzer: {e}")
+                return None
+        return st.session_state.analyzer
+    @staticmethod
+    def get_pos_parser():
+        """Get or create POS parser."""
+        if (st.session_state.pos_parser is None or
+            st.session_state.pos_parser.language != st.session_state.language or
+            st.session_state.pos_parser.model_size != st.session_state.model_size):
+            try:
+                from text_analyzer.pos_parser import POSParser
+                st.session_state.pos_parser = POSParser(
+                    language=st.session_state.language,
+                    model_size=st.session_state.model_size
+                )
+            except Exception as e:
+                st.error(f"Error loading POS parser: {e}")
+                return None
+        return st.session_state.pos_parser
+    @staticmethod
+    def handle_single_text_analysis(analyzer):
+        """Handle single text analysis workflow."""
+        st.subheader("Single Text Analysis")
+        # Text input
+        text_content = UIComponents.render_text_input("text to analyze", "single")
+        if not text_content:
+            st.info("Please provide text to analyze.")
+            return
+        # Reference list configuration
+        st.subheader("Reference Lists")
+        ReferenceManager.configure_reference_lists(analyzer)
+        ReferenceManager.render_custom_upload_section()
+        # Analysis options
+        apply_log, word_type_filter = UIComponents.render_analysis_options()
+        # Analysis button
+        if st.button("Analyze Text", type="primary"):
+            reference_lists = SessionManager.get_reference_lists()
+            if not reference_lists:
+                st.warning("Please select or upload reference lists first.")
+                return
+            with st.spinner("Analyzing text..."):
+                try:
+                    # Load reference lists
+                    analyzer.load_reference_lists(reference_lists)
+                    # Perform analysis
+                    results = analyzer.analyze_text(
+                        text_content,
+                        list(reference_lists.keys()),
+                        apply_log,
+                        word_type_filter
+                    )
+                    # Display results
+                    AnalysisHandlers.display_single_text_results(results)
+                except Exception as e:
+                    st.error(f"Error during analysis: {e}")
+    @staticmethod
+    def handle_batch_analysis(analyzer):
+        """Handle batch analysis workflow."""
+        st.subheader("Batch Analysis")
+        # File upload
+        uploaded_files = st.file_uploader(
+            "Upload Text Files",
+            type=['txt', 'zip'],
+            accept_multiple_files=True,
+            help="Upload individual .txt files or a .zip archive containing .txt files"
+        )
+        if not uploaded_files:
+            st.info("Please upload text files for batch analysis.")
+            return
+        # Reference list configuration
+        st.subheader("Reference Lists")
+        ReferenceManager.configure_reference_lists(analyzer)
+        ReferenceManager.render_custom_upload_section()
+        # Analysis options
+        apply_log = st.checkbox("Apply log₁₀ transformation", key="batch_log")
+        # Analysis button
+        if st.button("Analyze Batch", type="primary"):
+            reference_lists = SessionManager.get_reference_lists()
+            if not reference_lists:
+                st.warning("Please select or upload reference lists first.")
+                return
+            with st.spinner("Processing files..."):
+                try:
+                    # Extract files
+                    file_contents = AnalysisHandlers.extract_uploaded_files(uploaded_files)
+                    if not file_contents:
+                        st.error("No valid .txt files found in uploaded files.")
+                        return
+                    st.info(f"Found {len(file_contents)} files to process.")
+                    # Load reference lists
+                    analyzer.load_reference_lists(reference_lists)
+                    # Create progress tracking
+                    progress_bar = st.progress(0)
+                    status_text = st.empty()
+                    # Process files in memory
+                    batch_results = []
+                    selected_indices = list(reference_lists.keys())
+                    for i, (filename, text_content) in enumerate(file_contents):
+                        # Update progress
+                        progress = (i + 1) / len(file_contents)
+                        progress_bar.progress(progress)
+                        status_text.text(f"Processing file {i + 1}/{len(file_contents)}: {filename}")
+                        try:
+                            # Analyze for both content and function words
+                            result_row = {'filename': filename}
+                            for word_type in ['CW', 'FW']:
+                                analysis = analyzer.analyze_text(text_content, selected_indices, apply_log, word_type)
+                                # Extract summary scores
+                                if analysis and 'summary' in analysis:
+                                    for index, stats in analysis['summary'].items():
+                                        col_name = f"{index}_{word_type}"
+                                        result_row[col_name] = stats['mean']
+                            batch_results.append(result_row)
+                        except Exception as e:
+                            st.warning(f"Error analyzing {filename}: {e}")
+                            continue
+                    # Convert to DataFrame
+                    results_df = pd.DataFrame(batch_results)
+                    # Display results
+                    st.success(f"Analysis complete! Processed {len(results_df)} files.")
+                    st.subheader("Results")
+                    st.dataframe(results_df, use_container_width=True)
+                    # Download link
+                    csv_data = results_df.to_csv(index=False)
+                    st.download_button(
+                        label="Download Results (CSV)",
+                        data=csv_data,
+                        file_name="lexical_sophistication_results.csv",
+                        mime="text/csv"
+                    )
+                except Exception as e:
+                    st.error(f"Error during batch analysis: {e}")
+    @staticmethod
+    def handle_two_text_comparison(analyzer):
+        """Handle two-text comparison analysis."""
+        st.subheader("Two-Text Comparison")
+        # Create two columns for text input
+        col_a, col_b = st.columns(2)
+        with col_a:
+            st.subheader("📄 Text A")
+            text_a = get_text_input("Text A", "a")
+        with col_b:
+            st.subheader("📄 Text B")
+            text_b = get_text_input("Text B", "b")
+        # Check if both texts are provided
+        if not text_a or not text_b:
+            st.info("Please provide both texts to compare.")
+            return
+        # Reference list configuration
+        st.subheader("Reference Lists")
+        ReferenceManager.configure_reference_lists(analyzer)
+        ReferenceManager.render_custom_upload_section()
+        # Analysis options
+        col1, col2 = st.columns(2)
+        with col1:
+            apply_log = st.checkbox("Apply log₁₀ transformation", key="comparison_log")
+        with col2:
+            word_type_filter = st.selectbox(
+                "Word Type Filter",
+                options=[None, 'CW', 'FW'],
+                format_func=lambda x: 'All Words' if x is None else ('Content Words' if x == 'CW' else 'Function Words'),
+                key="comparison_word_type"
+            )
+        # Analysis button
+        if st.button("🔍 Compare Texts", type="primary"):
+            reference_lists = SessionManager.get_reference_lists()
+            if not reference_lists:
+                st.warning("Please select or upload reference lists first.")
+                return
+            with st.spinner("Analyzing texts..."):
+                try:
+                    # Load reference lists
+                    analyzer.load_reference_lists(reference_lists)
+                    # Perform analysis on both texts
+                    selected_indices = list(reference_lists.keys())
+                    results_a = analyzer.analyze_text(text_a, selected_indices, apply_log, word_type_filter)
+                    results_b = analyzer.analyze_text(text_b, selected_indices, apply_log, word_type_filter)
+                    # Display comparison results
+                    display_comparison_results(results_a, results_b)
+                except Exception as e:
+                    st.error(f"Error during comparison: {e}")
+    @staticmethod
+    def extract_uploaded_files(uploaded_files) -> List[Tuple[str, str]]:
+        """Extract uploaded files and return list of (filename, content) tuples."""
+        file_contents = []
+        temp_paths = []  # Track temp files for cleanup
+        try:
+            for uploaded_file in uploaded_files:
+                if uploaded_file.name.endswith('.zip'):
+                    # Handle ZIP files using temp file approach
+                    zip_file = FileUploadHandler.handle_zip_file(uploaded_file)
+                    if not zip_file:
+                        continue
+                    with zip_file as zip_ref:
+                        for file_info in zip_ref.infolist():
+                            if file_info.filename.endswith('.txt'):
+                                try:
+                                    content = zip_ref.read(file_info.filename)
+                                    # Decode content
+                                    try:
+                                        text_content = content.decode('utf-8')
+                                    except UnicodeDecodeError:
+                                        try:
+                                            text_content = content.decode('utf-16')
+                                        except UnicodeDecodeError:
+                                            st.error(f"Unable to decode file {file_info.filename}. Skipping.")
+                                            continue
+                                    file_contents.append((file_info.filename, text_content))
+                                except Exception as e:
+                                    st.error(f"Cannot read {file_info.filename}: {e}")
+                                    continue
+                elif uploaded_file.name.endswith('.txt'):
+                    # Handle individual text files using temp file approach
+                    try:
+                        # Save to temp and read content
+                        temp_path = FileUploadHandler.save_to_temp(uploaded_file, prefix="analysis")
+                        if not temp_path:
+                            st.error(f"Failed to save file {uploaded_file.name}")
+                            continue
+                        temp_paths.append(temp_path)
+                        # Read content with encoding handling
+                        content = FileUploadHandler.read_from_temp(temp_path)
+                        if isinstance(content, bytes):
+                            try:
+                                text_content = content.decode('utf-8')
+                            except UnicodeDecodeError:
+                                try:
+                                    text_content = content.decode('utf-16')
+                                except UnicodeDecodeError:
+                                    st.error(f"Unable to decode file {uploaded_file.name}. Skipping.")
+                                    continue
+                        else:
+                            text_content = content
+                        file_contents.append((uploaded_file.name, text_content))
+                    except Exception as e:
+                        st.error(f"Cannot read file {uploaded_file.name}: {e}")
+                        continue
+                else:
+                    st.warning(f"Skipping {uploaded_file.name}: Not a .txt or .zip file")
+            return file_contents
+        finally:
+            # Cleanup temp files
+            for temp_path in temp_paths:
+                FileUploadHandler.cleanup_temp_file(temp_path)
+    @staticmethod
+    def display_single_text_results(results: Dict[str, Any]):
+        """Display results for single text analysis."""
+        st.subheader("Analysis Results")
+        # Summary results
+        if results['summary']:
+            st.write("**Summary Statistics**")
+            summary_data = []
+            for key, stats in results['summary'].items():
+                summary_data.append({
+                    'Index': key,
+                    'Mean': round(stats['mean'], 3),
+                    'Std Dev': round(stats['std'], 3),
+                    'Count': stats['count'],
+                    'Min': round(stats['min'], 3),
+                    'Max': round(stats['max'], 3)
+                })
+            summary_df = pd.DataFrame(summary_data)
+            st.dataframe(summary_df, use_container_width=True)
+        # Token details
+        if results['token_details']:
+            st.write("**Token Analysis**")
+            token_df = pd.DataFrame(results['token_details'])
+            st.dataframe(token_df, use_container_width=True)
+            # Download token details
+            csv_data = token_df.to_csv(index=False)
+            st.download_button(
+                label="Download Token Details (CSV)",
+                data=csv_data,
+                file_name="token_analysis.csv",
+                mime="text/csv"
+            )
+        # Bigram and trigram details
+        for detail_type in ['bigram_details', 'trigram_details']:
+            if results.get(detail_type):
+                st.write(f"**{detail_type.replace('_', ' ').title()}**")
+                detail_df = pd.DataFrame(results[detail_type])
+                st.dataframe(detail_df, use_container_width=True)
+        # Density plots
+        if results['summary']:
+            st.write("**Score Distribution Plots**")
+            AnalysisHandlers.create_density_plots(results)
+    @staticmethod
+    def create_density_plots(results: Dict[str, Any]):
+        """Create density plots for score distributions."""
+        if 'raw_scores' not in results:
+            return
+        for key, scores in results['raw_scores'].items():
+            if len(scores) > 1:  # Need at least 2 points for density
+                # Create histogram with density curve
+                fig = go.Figure()
+                # Add histogram
+                fig.add_trace(go.Histogram(
+                    x=scores,
+                    nbinsx=min(30, len(scores)),
+                    name='Histogram',
+                    opacity=0.7,
+                    histnorm='probability density'
+                ))
+                # Calculate and add KDE curve
+                kde = stats.gaussian_kde(scores)
+                x_range = np.linspace(min(scores), max(scores), 100)
+                kde_values = kde(x_range)
+                fig.add_trace(go.Scatter(
+                    x=x_range,
+                    y=kde_values,
+                    mode='lines',
+                    name='Density',
+                    line=dict(color='red', width=2)
+                ))
+                # Update layout
+                fig.update_layout(
+                    title=f"Distribution of {key}",
+                    xaxis_title="Score",
+                    yaxis_title="Density",
+                    template='plotly_white',
+                    showlegend=True,
+                    bargap=0.05
+                )
+                st.plotly_chart(fig, use_container_width=True)

web_app/handlers/analysis_handlers_updated.py ADDED Viewed

	@@ -0,0 +1,606 @@

+"""
+Analysis Handlers for Streamlit Interface - Updated with MemoryFileHandler
+"""
+import streamlit as st
+import pandas as pd
+from typing import List, Tuple, Dict, Optional
+import time
+from datetime import datetime
+import zipfile
+from io import BytesIO, StringIO
+import sys
+import os
+# Add parent directory to path for imports
+sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
+from text_analyzer.lexical_sophistication import LexicalSophisticationAnalyzer
+from web_app.session_manager import SessionManager
+from web_app.components.ui_components import UIComponents
+from web_app.utils import MemoryFileHandler
+class AnalysisHandlers:
+    """
+    Handles analysis-related UI components and workflows.
+    Updated to use MemoryFileHandler for better compatibility.
+    """
+    @staticmethod
+    def handle_single_text_analysis(analyzer: LexicalSophisticationAnalyzer):
+        """Handle single text analysis workflow."""
+        # Get text input
+        text_content = UIComponents.render_text_input_section("Text for Analysis", "single_text")
+        if text_content and st.button("Analyze Text", type="primary", key="analyze_single"):
+            with st.spinner("Analyzing text..."):
+                start_time = time.time()
+                # Get analysis parameters
+                params = SessionManager.get_analysis_params()
+                # Run analysis
+                metrics = analyzer.analyze_text(
+                    text_content,
+                    include_pos_info=params['include_pos'],
+                    min_word_length=params['min_word_length']
+                )
+                analysis_time = time.time() - start_time
+                # Store results
+                SessionManager.store_analysis_results(
+                    'single_text',
+                    metrics,
+                    {'analysis_time': analysis_time}
+                )
+                # Display results
+                AnalysisHandlers._display_analysis_results(metrics, analysis_time)
+    @staticmethod
+    def handle_comparison_analysis(analyzer: LexicalSophisticationAnalyzer):
+        """Handle text comparison workflow."""
+        from web_app.components.comparison_functions import render_comparison_interface
+        render_comparison_interface(analyzer)
+    @staticmethod
+    def handle_batch_analysis(analyzer: LexicalSophisticationAnalyzer):
+        """Handle batch analysis workflow with memory-based file handling."""
+        st.subheader("Batch Analysis")
+        # File upload
+        uploaded_files = st.file_uploader(
+            "Upload Text Files",
+            type=['txt', 'zip'],
+            accept_multiple_files=True,
+            help="Upload individual .txt files or a .zip archive containing .txt files"
+        )
+        if not uploaded_files:
+            st.info("Please upload text files or a ZIP archive to begin batch analysis.")
+            return
+        # Analysis parameters
+        with st.expander("Analysis Parameters", expanded=True):
+            params = AnalysisHandlers._render_batch_analysis_params()
+        if st.button("Start Batch Analysis", type="primary"):
+            # Process files
+            file_contents = AnalysisHandlers._process_batch_files_memory(uploaded_files)
+            if not file_contents:
+                st.error("No valid text files found.")
+                return
+            # Run batch analysis
+            AnalysisHandlers._run_batch_analysis(analyzer, file_contents, params)
+    @staticmethod
+    def _process_batch_files_memory(uploaded_files) -> List[Tuple[str, str]]:
+        """
+        Process uploaded files for batch analysis using memory-based approach.
+        Returns:
+            List of tuples (filename, content)
+        """
+        file_contents = []
+        with st.spinner("Processing uploaded files..."):
+            progress_bar = st.progress(0)
+            total_files = len(uploaded_files)
+            for idx, uploaded_file in enumerate(uploaded_files):
+                try:
+                    if uploaded_file.name.endswith('.zip'):
+                        # Handle ZIP files
+                        zip_contents = MemoryFileHandler.handle_zip_file(uploaded_file)
+                        if zip_contents:
+                            for filename, content in zip_contents.items():
+                                if filename.endswith('.txt'):
+                                    try:
+                                        # Decode bytes to text
+                                        if isinstance(content, bytes):
+                                            text_content = content.decode('utf-8')
+                                        else:
+                                            text_content = content
+                                        file_contents.append((filename, text_content))
+                                    except UnicodeDecodeError:
+                                        st.warning(f"Skipping {filename}: Unable to decode as UTF-8")
+                    elif uploaded_file.name.endswith('.txt'):
+                        # Handle individual text files
+                        text_content = MemoryFileHandler.process_uploaded_file(uploaded_file, as_text=True)
+                        if text_content:
+                            file_contents.append((uploaded_file.name, text_content))
+                        else:
+                            st.warning(f"Could not read {uploaded_file.name}")
+                except Exception as e:
+                    st.error(f"Error processing {uploaded_file.name}: {str(e)}")
+                # Update progress
+                progress_bar.progress((idx + 1) / total_files)
+            progress_bar.empty()
+        st.success(f"Processed {len(file_contents)} text files")
+        return file_contents
+    @staticmethod
+    def _render_batch_analysis_params() -> dict:
+        """Render batch analysis parameters."""
+        col1, col2, col3 = st.columns(3)
+        with col1:
+            include_pos = st.checkbox(
+                "Include POS Analysis",
+                value=True,
+                help="Include part-of-speech tagging (slower but more detailed)"
+            )
+            min_word_length = st.number_input(
+                "Minimum Word Length",
+                min_value=1,
+                max_value=10,
+                value=1,
+                help="Exclude words shorter than this length"
+            )
+        with col2:
+            analyze_readability = st.checkbox(
+                "Analyze Readability",
+                value=True,
+                help="Include readability metrics"
+            )
+            analyze_diversity = st.checkbox(
+                "Analyze Diversity",
+                value=True,
+                help="Include lexical diversity metrics"
+            )
+        with col3:
+            export_format = st.selectbox(
+                "Export Format",
+                ["CSV", "Excel", "JSON"],
+                help="Format for results export"
+            )
+            include_raw_data = st.checkbox(
+                "Include Raw Data",
+                value=False,
+                help="Include word lists in export"
+            )
+        return {
+            'include_pos': include_pos,
+            'min_word_length': min_word_length,
+            'analyze_readability': analyze_readability,
+            'analyze_diversity': analyze_diversity,
+            'export_format': export_format,
+            'include_raw_data': include_raw_data
+        }
+    @staticmethod
+    def _run_batch_analysis(analyzer: LexicalSophisticationAnalyzer,
+                          file_contents: List[Tuple[str, str]],
+                          params: dict):
+        """Run batch analysis on multiple files."""
+        results = []
+        # Progress tracking
+        progress_bar = st.progress(0)
+        status_text = st.empty()
+        start_time = time.time()
+        for idx, (filename, content) in enumerate(file_contents):
+            status_text.text(f"Analyzing {filename}...")
+            try:
+                # Analyze text
+                metrics = analyzer.analyze_text(
+                    content,
+                    include_pos_info=params['include_pos'],
+                    min_word_length=params['min_word_length']
+                )
+                # Add filename to results
+                metrics['filename'] = filename
+                results.append(metrics)
+            except Exception as e:
+                st.error(f"Error analyzing {filename}: {str(e)}")
+                continue
+            # Update progress
+            progress_bar.progress((idx + 1) / len(file_contents))
+        # Clear progress indicators
+        progress_bar.empty()
+        status_text.empty()
+        total_time = time.time() - start_time
+        # Display results
+        AnalysisHandlers._display_batch_results(results, params, total_time)
+    @staticmethod
+    def _display_batch_results(results: List[dict], params: dict, total_time: float):
+        """Display batch analysis results."""
+        st.success(f"✅ Analyzed {len(results)} files in {total_time:.1f} seconds")
+        if not results:
+            return
+        # Create results DataFrame
+        df_results = pd.DataFrame(results)
+        # Reorder columns for better display
+        priority_cols = ['filename', 'total_words', 'unique_words', 'avg_word_length',
+                        'lexical_diversity', 'avg_word_frequency']
+        other_cols = [col for col in df_results.columns if col not in priority_cols]
+        ordered_cols = [col for col in priority_cols if col in df_results.columns] + other_cols
+        df_results = df_results[ordered_cols]
+        # Display options
+        col1, col2 = st.columns([3, 1])
+        with col1:
+            st.subheader("Analysis Results")
+        with col2:
+            display_mode = st.radio("Display", ["Table", "Charts"], horizontal=True)
+        if display_mode == "Table":
+            # Display as table
+            st.dataframe(
+                df_results,
+                use_container_width=True,
+                hide_index=True,
+                height=400
+            )
+            # Summary statistics
+            with st.expander("Summary Statistics"):
+                st.write(df_results.describe())
+        else:
+            # Display as charts
+            AnalysisHandlers._render_batch_charts(df_results)
+        # Export results
+        st.subheader("Export Results")
+        col1, col2, col3 = st.columns(3)
+        with col1:
+            # CSV export
+            csv = df_results.to_csv(index=False)
+            st.download_button(
+                label="📥 Download as CSV",
+                data=csv,
+                file_name=f"batch_analysis_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv",
+                mime="text/csv"
+            )
+        with col2:
+            # Excel export (using in-memory buffer)
+            if params['export_format'] == "Excel":
+                excel_buffer = BytesIO()
+                with pd.ExcelWriter(excel_buffer, engine='openpyxl') as writer:
+                    df_results.to_excel(writer, sheet_name='Results', index=False)
+                    # Add summary sheet
+                    df_summary = df_results.describe()
+                    df_summary.to_excel(writer, sheet_name='Summary')
+                excel_data = excel_buffer.getvalue()
+                st.download_button(
+                    label="📥 Download as Excel",
+                    data=excel_data,
+                    file_name=f"batch_analysis_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.xlsx",
+                    mime="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+                )
+        with col3:
+            # JSON export
+            if params['export_format'] == "JSON":
+                json_str = df_results.to_json(orient='records', indent=2)
+                st.download_button(
+                    label="📥 Download as JSON",
+                    data=json_str,
+                    file_name=f"batch_analysis_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json",
+                    mime="application/json"
+                )
+    @staticmethod
+    def _render_batch_charts(df_results: pd.DataFrame):
+        """Render charts for batch analysis results."""
+        import plotly.express as px
+        # Select metrics for visualization
+        numeric_cols = df_results.select_dtypes(include=['float64', 'int64']).columns
+        numeric_cols = [col for col in numeric_cols if col not in ['index']]
+        if len(numeric_cols) == 0:
+            st.warning("No numeric data available for visualization")
+            return
+        # Metric selection
+        col1, col2 = st.columns(2)
+        with col1:
+            x_metric = st.selectbox("X-axis metric", numeric_cols, index=0)
+        with col2:
+            y_metric = st.selectbox("Y-axis metric", numeric_cols,
+                                  index=min(1, len(numeric_cols)-1))
+        # Create scatter plot
+        fig = px.scatter(
+            df_results,
+            x=x_metric,
+            y=y_metric,
+            hover_data=['filename'],
+            title=f"{y_metric} vs {x_metric}",
+            labels={x_metric: x_metric.replace('_', ' ').title(),
+                   y_metric: y_metric.replace('_', ' ').title()}
+        )
+        fig.update_traces(marker=dict(size=10))
+        fig.update_layout(height=500)
+        st.plotly_chart(fig, use_container_width=True)
+        # Distribution plots
+        st.subheader("Metric Distributions")
+        selected_metric = st.selectbox(
+            "Select metric for distribution",
+            numeric_cols,
+            key="dist_metric"
+        )
+        col1, col2 = st.columns(2)
+        with col1:
+            # Histogram
+            fig_hist = px.histogram(
+                df_results,
+                x=selected_metric,
+                nbins=20,
+                title=f"Distribution of {selected_metric.replace('_', ' ').title()}"
+            )
+            fig_hist.update_layout(height=400)
+            st.plotly_chart(fig_hist, use_container_width=True)
+        with col2:
+            # Box plot
+            fig_box = px.box(
+                df_results,
+                y=selected_metric,
+                title=f"Box Plot of {selected_metric.replace('_', ' ').title()}",
+                points="all"
+            )
+            fig_box.update_layout(height=400)
+            st.plotly_chart(fig_box, use_container_width=True)
+    @staticmethod
+    def _display_analysis_results(metrics: dict, analysis_time: float):
+        """Display single text analysis results."""
+        st.success(f"✅ Analysis completed in {analysis_time:.2f} seconds")
+        # Render results in tabs
+        tab1, tab2, tab3, tab4 = st.tabs([
+            "📊 Overview",
+            "📈 Frequency Analysis",
+            "🎯 Advanced Metrics",
+            "📋 Raw Data"
+        ])
+        with tab1:
+            AnalysisHandlers._render_overview_metrics(metrics)
+        with tab2:
+            AnalysisHandlers._render_frequency_analysis(metrics)
+        with tab3:
+            AnalysisHandlers._render_advanced_metrics(metrics)
+        with tab4:
+            AnalysisHandlers._render_raw_data(metrics)
+    @staticmethod
+    def _render_overview_metrics(metrics: dict):
+        """Render overview metrics."""
+        col1, col2, col3, col4 = st.columns(4)
+        with col1:
+            st.metric("Total Words", f"{metrics.get('total_words', 0):,}")
+            st.metric("Sentences", f"{metrics.get('sentence_count', 0):,}")
+        with col2:
+            st.metric("Unique Words", f"{metrics.get('unique_words', 0):,}")
+            st.metric("Avg Sentence Length", f"{metrics.get('avg_sentence_length', 0):.1f}")
+        with col3:
+            st.metric("Lexical Diversity", f"{metrics.get('lexical_diversity', 0):.3f}")
+            st.metric("Avg Word Length", f"{metrics.get('avg_word_length', 0):.2f}")
+        with col4:
+            st.metric("Readability (Flesch)", f"{metrics.get('flesch_reading_ease', 0):.1f}")
+            st.metric("Grade Level", f"{metrics.get('flesch_kincaid_grade', 0):.1f}")
+    @staticmethod
+    def _render_frequency_analysis(metrics: dict):
+        """Render frequency analysis section."""
+        import plotly.graph_objects as go
+        if 'frequency_distribution' not in metrics:
+            st.info("Frequency distribution data not available")
+            return
+        freq_dist = metrics['frequency_distribution']
+        # Prepare data for visualization
+        words = list(freq_dist.keys())[:30]  # Top 30 words
+        frequencies = [freq_dist[word] for word in words]
+        # Create bar chart
+        fig = go.Figure(data=[
+            go.Bar(x=words, y=frequencies)
+        ])
+        fig.update_layout(
+            title="Top 30 Most Frequent Words",
+            xaxis_title="Words",
+            yaxis_title="Frequency",
+            height=500
+        )
+        st.plotly_chart(fig, use_container_width=True)
+        # Word frequency statistics
+        col1, col2 = st.columns(2)
+        with col1:
+            st.write("**Frequency Statistics:**")
+            st.write(f"• Most common word: {words[0]} ({frequencies[0]} times)")
+            st.write(f"• Hapax legomena: {sum(1 for f in freq_dist.values() if f == 1)} words")
+            st.write(f"• Words appearing 2+ times: {sum(1 for f in freq_dist.values() if f >= 2)}")
+        with col2:
+            st.write("**Coverage Analysis:**")
+            total_words = sum(freq_dist.values())
+            top10_coverage = sum(frequencies[:10]) / total_words * 100
+            top30_coverage = sum(frequencies[:30]) / total_words * 100
+            st.write(f"• Top 10 words: {top10_coverage:.1f}% of text")
+            st.write(f"• Top 30 words: {top30_coverage:.1f}% of text")
+    @staticmethod
+    def _render_advanced_metrics(metrics: dict):
+        """Render advanced metrics section."""
+        # POS distribution if available
+        if 'pos_distribution' in metrics:
+            st.subheader("Part-of-Speech Distribution")
+            pos_dist = metrics['pos_distribution']
+            if pos_dist:
+                import plotly.express as px
+                # Prepare data
+                pos_df = pd.DataFrame([
+                    {'POS': pos, 'Count': count}
+                    for pos, count in pos_dist.items()
+                ])
+                pos_df = pos_df.sort_values('Count', ascending=False)
+                # Create pie chart
+                fig = px.pie(
+                    pos_df,
+                    values='Count',
+                    names='POS',
+                    title="Part-of-Speech Distribution"
+                )
+                st.plotly_chart(fig, use_container_width=True)
+        # Sophistication metrics
+        st.subheader("Sophistication Metrics")
+        col1, col2 = st.columns(2)
+        with col1:
+            if 'avg_word_frequency' in metrics:
+                st.metric(
+                    "Average Word Frequency",
+                    f"{metrics['avg_word_frequency']:.2f}",
+                    help="Average frequency of words in reference corpus"
+                )
+            if 'academic_words_ratio' in metrics:
+                st.metric(
+                    "Academic Words Ratio",
+                    f"{metrics['academic_words_ratio']:.2%}",
+                    help="Percentage of academic vocabulary"
+                )
+        with col2:
+            if 'rare_words_ratio' in metrics:
+                st.metric(
+                    "Rare Words Ratio",
+                    f"{metrics['rare_words_ratio']:.2%}",
+                    help="Percentage of infrequent words"
+                )
+            if 'lexical_sophistication_score' in metrics:
+                st.metric(
+                    "Sophistication Score",
+                    f"{metrics['lexical_sophistication_score']:.3f}",
+                    help="Overall lexical sophistication"
+                )
+    @staticmethod
+    def _render_raw_data(metrics: dict):
+        """Render raw data section."""
+        st.write("**Available Metrics:**")
+        # Display all metrics in an expandable format
+        for key, value in metrics.items():
+            if isinstance(value, (dict, list)) and len(str(value)) > 100:
+                with st.expander(f"{key} (complex data)"):
+                    if isinstance(value, dict):
+                        st.json(value)
+                    else:
+                        st.write(value)
+            else:
+                st.write(f"• **{key}:** {value}")
+        # Export options
+        st.subheader("Export Data")
+        # Prepare export data
+        export_data = {k: v for k, v in metrics.items()
+                      if not isinstance(v, (dict, list)) or k in ['pos_distribution']}
+        col1, col2 = st.columns(2)
+        with col1:
+            # JSON export
+            json_str = pd.Series(export_data).to_json(indent=2)
+            st.download_button(
+                label="📥 Download as JSON",
+                data=json_str,
+                file_name=f"analysis_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json",
+                mime="application/json"
+            )
+        with col2:
+            # CSV export
+            df_export = pd.DataFrame([export_data])
+            csv = df_export.to_csv(index=False)
+            st.download_button(
+                label="📥 Download as CSV",
+                data=csv,
+                file_name=f"analysis_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv",
+                mime="text/csv"
+            )

web_app/handlers/frequency_handlers.py CHANGED Viewed

@@ -4,6 +4,8 @@ Frequency Analysis Handlers for Streamlit Interface
 This module provides Streamlit interface handlers for word frequency visualization,
 including file upload, visualization controls, and results display.
 Supports flexible column mapping for diverse frequency data formats.
 """
 import streamlit as st
@@ -15,13 +17,13 @@ from typing import Dict, List, Optional
 import sys
 import os
 from pathlib import Path
-from io import StringIO
 # Add parent directory to path for imports
 sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
 from text_analyzer.frequency_analyzer import FrequencyAnalyzer
-from web_app.utils import FileUploadHandler
 class FrequencyHandlers:
@@ -30,56 +32,48 @@ class FrequencyHandlers:
     """
     @staticmethod
-    def handle_frequency_analysis():
         """
-        Enhanced frequency analysis interface handler with persistent column selection.
         """
-        st.markdown("Upload a frequency data file (TSV/CSV) with flexible column mapping support. "
-                   "The system will automatically detect columns and let you choose which ones to use for analysis.")
-        # Initialize session state variables
-        if 'uploaded_file_name' not in st.session_state:
-            st.session_state.uploaded_file_name = None
-        if 'column_config' not in st.session_state:
-            st.session_state.column_config = None
         if 'analyzer' not in st.session_state:
             st.session_state.analyzer = None
         if 'format_info' not in st.session_state:
             st.session_state.format_info = None
-        if 'detected_cols' not in st.session_state:
-            st.session_state.detected_cols = None
         if 'uploaded_file_content' not in st.session_state:
             st.session_state.uploaded_file_content = None
-        # File upload section
-        uploaded_file = FrequencyHandlers.render_file_upload()
-        # Check if a new file was uploaded
-        if uploaded_file is not None:
-            current_file_name = uploaded_file.name
-            # Reset state if new file is uploaded
-            if st.session_state.uploaded_file_name != current_file_name:
-                st.session_state.uploaded_file_name = current_file_name
-                st.session_state.column_config = None
                 st.session_state.analyzer = None
                 st.session_state.format_info = None
-                st.session_state.detected_cols = None
-                # Handle file content loading with /tmp approach for HF Spaces compatibility
                 try:
-                    # Validate file size first
-                    if not FileUploadHandler.validate_file_size(uploaded_file, max_size_mb=300):
                         return
-                    # Save to temp and read content
-                    temp_path = FileUploadHandler.save_to_temp(uploaded_file, prefix="freq")
-                    if temp_path:
-                        st.session_state.uploaded_file_content = FileUploadHandler.read_from_temp(temp_path)
-                        st.session_state.temp_file_path = temp_path
-                        st.success(f"✅ File '{current_file_name}' ({len(st.session_state.uploaded_file_content):,} bytes) uploaded successfully")
                     else:
-                        st.error("Failed to save uploaded file. Please try again.")
                         return
                 except Exception as e:
                     st.error(f"❌ Failed to read uploaded file: {str(e)}")
@@ -92,50 +86,48 @@ class FrequencyHandlers:
                 # Initialize analyzer and process file (only if needed)
                 if st.session_state.analyzer is None or st.session_state.format_info is None:
                     st.session_state.analyzer = FrequencyAnalyzer(file_size_limit_mb=300)
-                    # Use the content we already read from temp file
                     st.session_state.format_info = st.session_state.analyzer.detect_file_format(st.session_state.uploaded_file_content)
                     # Show format detection results
                     st.success(f"✅ File format detected: {st.session_state.format_info['separator']}-separated, "
-                              f"{'with' if st.session_state.format_info['has_header'] else 'without'} header, "
-                              f"~{st.session_state.format_info['estimated_columns']} columns")
-                    # Prepare data for column detection (use already loaded content)
-                    content = st.session_state.uploaded_file_content
-                    if isinstance(content, bytes):
-                        content = content.decode('utf-8')
-                    # Read data for preview and column detection
-                    df_preview = pd.read_csv(StringIO(content),
-                                           sep=st.session_state.format_info['separator'],
-                                           header=0 if st.session_state.format_info['has_header'] else None,
-                                           nrows=100)
-                    # Detect available columns
-                    st.session_state.detected_cols = st.session_state.analyzer.detect_columns(df_preview)
-                    # Show data preview
-                    FrequencyHandlers.render_data_preview(df_preview, st.session_state.detected_cols)
-                # ALWAYS show column selection if we have detected columns (persistent interface)
-                if st.session_state.detected_cols is not None:
-                    with st.expander("🎯 Column Selection", expanded=True):
-                        column_config = FrequencyHandlers.render_persistent_column_selection(
-                            st.session_state.detected_cols,
-                            st.session_state.format_info,
-                            st.session_state.column_config
-                        )
-                        # Check if column configuration changed
-                        if column_config != st.session_state.column_config:
-                            st.session_state.column_config = column_config
-                            # Reload data with new configuration
-                            df = st.session_state.analyzer.load_frequency_data(st.session_state.uploaded_file_content, column_config)
-                            st.session_state.loaded_data = df
-                            st.rerun()
-                # ALWAYS show visualization controls if we have a column config
-                if st.session_state.column_config is not None:
                     viz_config = FrequencyHandlers.render_enhanced_visualization_controls(st.session_state.analyzer, st.session_state.column_config)
                     if viz_config:
@@ -158,45 +150,18 @@ class FrequencyHandlers:
                 else:
                     with st.expander("Error Details"):
                         st.code(str(e))
-                        st.write("**Debug Information:**")
-                        st.write(f"- File size: {len(st.session_state.uploaded_file_content) if st.session_state.uploaded_file_content else 'Unknown'} bytes")
-                        st.write(f"- Session state keys: {list(st.session_state.keys())}")
-                st.info("Please ensure your file is a valid TSV/CSV with appropriate columns.")
-        elif st.session_state.column_config is not None and st.session_state.uploaded_file_content is not None:
-            # Show persistent interface even when no file is currently selected (using cached data)
-            with st.expander("🎯 Column Selection", expanded=False):
-                column_config = FrequencyHandlers.render_persistent_column_selection(
-                    st.session_state.detected_cols,
-                    st.session_state.format_info,
-                    st.session_state.column_config
-                )
-                # Check if column configuration changed
-                if column_config != st.session_state.column_config:
-                    st.session_state.column_config = column_config
-                    # Reload data with new configuration
-                    df = st.session_state.analyzer.load_frequency_data(st.session_state.uploaded_file_content, column_config)
-                    st.session_state.loaded_data = df
-                    st.rerun()
-            viz_config = FrequencyHandlers.render_enhanced_visualization_controls(st.session_state.analyzer, st.session_state.column_config)
-            if viz_config:
-                # Generate analysis
-                FrequencyHandlers.render_enhanced_rank_based_analysis(st.session_state.analyzer, viz_config)
-        # Cleanup old temp files periodically
-        try:
-            FileUploadHandler.cleanup_old_temp_files(max_age_hours=1)
-        except:
-            pass
     @staticmethod
-    def render_file_upload():
         """
-        Render enhanced file upload interface with sample files fallback.
         Returns:
             File-like object or None
@@ -297,437 +262,433 @@ and\t28891\t3"""
             df: Preview DataFrame
             detected_cols: Detected column categorization
         """
-        st.subheader("📊 Data Preview")
-        # Basic metrics
-        col1, col2, col3 = st.columns(3)
-        with col1:
-            st.metric("Total Rows", len(df))
-        with col2:
-            st.metric("Total Columns", len(df.columns))
-        with col3:
-            word_cols = len(detected_cols.get('word_columns', []))
-            freq_cols = len(detected_cols.get('frequency_columns', []))
-            st.metric("Detected", f"{word_cols} word, {freq_cols} freq")
-        # Show sample data
-        st.write("**First 5 rows:**")
-        st.dataframe(df.head(), use_container_width=True)
-        # Show detected column categories
-        with st.expander("🔍 Column Detection Results", expanded=True):
-            col1, col2 = st.columns(2)
             with col1:
-                st.write("**Word Columns (text data):**")
-                word_cols = detected_cols.get('word_columns', [])
-                if word_cols:
-                    for col in word_cols:
-                        st.write(f"- `{col}` ({df[col].dtype})")
-                else:
-                    st.write("None detected")
-                st.write("**POS Columns:**")
-                pos_cols = detected_cols.get('pos_columns', [])
-                if pos_cols:
-                    for col in pos_cols:
-                        st.write(f"- `{col}` ({df[col].dtype})")
-                else:
-                    st.write("None detected")
             with col2:
-                st.write("**Frequency Columns (numeric data):**")
-                freq_cols = detected_cols.get('frequency_columns', [])
-                if freq_cols:
-                    for col in freq_cols:
-                        sample_vals = df[col].dropna().head(3).tolist()
-                        st.write(f"- `{col}` ({df[col].dtype}) - e.g., {sample_vals}")
-                else:
-                    st.write("None detected")
                 st.write("**Other Columns:**")
-                other_cols = detected_cols.get('other_columns', [])
-                if other_cols:
-                    for col in other_cols[:5]:  # Show max 5
-                        st.write(f"- `{col}` ({df[col].dtype})")
-                    if len(other_cols) > 5:
-                        st.write(f"... and {len(other_cols) - 5} more")
-                else:
-                    st.write("None")
     @staticmethod
-    def render_column_selection_simplified(detected_cols: Dict[str, List[str]], format_info: Dict) -> Optional[Dict[str, str]]:
         """
-        Render simplified column selection interface without multi-frequency complexity.
         Args:
             detected_cols: Detected column categorization
-            format_info: File format information
         Returns:
-            Column configuration dict or None
         """
-        st.subheader("🎯 Column Mapping")
-        st.write("Select which columns to use for your frequency analysis:")
-        word_cols = detected_cols.get('word_columns', [])
-        freq_cols = detected_cols.get('frequency_columns', [])
-        pos_cols = detected_cols.get('pos_columns', [])
-        if not word_cols or not freq_cols:
-            st.error("❌ Required columns not detected. Please ensure your file has:")
-            st.write("- At least one text column (for words)")
-            st.write("- At least one numeric column (for frequencies)")
-            return None
         col1, col2 = st.columns(2)
         with col1:
-            # Word column selection
-            word_column = st.selectbox(
-                "Word Column",
                 options=word_cols,
-                index=0,
-                help="Column containing word forms or lemmas"
             )
-            # POS column selection (optional)
-            pos_column = None
-            if pos_cols:
-                use_pos = st.checkbox("Include POS column", value=False)
-                if use_pos:
-                    pos_column = st.selectbox(
-                        "POS Column",
-                        options=pos_cols,
-                        index=0,
-                        help="Column containing part-of-speech tags (optional)"
-                    )
         with col2:
-            # Frequency column selection
-            frequency_column = st.selectbox(
                 "Frequency Column",
                 options=freq_cols,
-                index=0,
-                help="Column containing frequency values for analysis"
             )
-        # Confirm button
-        if st.button("🚀 Start Analysis", type="primary"):
-            config = {
-                'word_column': word_column,
-                'frequency_column': frequency_column,
-                'separator': format_info['separator'],
-                'has_header': format_info['has_header']
-            }
-            if pos_column:
-                config['pos_column'] = pos_column
-            return config
         return None
     @staticmethod
-    def render_visualization_controls_simplified(analyzer: FrequencyAnalyzer, column_config: Dict) -> Optional[Dict]:
         """
-        Legacy method - redirects to enhanced controls for backward compatibility.
-        """
-        return FrequencyHandlers.render_enhanced_visualization_controls(analyzer, column_config)
-    @staticmethod
-    def render_rank_based_analysis_simplified(analyzer: FrequencyAnalyzer, viz_config: Dict):
-        """
-        Legacy method - redirects to enhanced analysis for backward compatibility.
-        """
-        return FrequencyHandlers.render_enhanced_rank_based_analysis(analyzer, viz_config)
-    @staticmethod
-    def render_persistent_column_selection(detected_cols: Dict[str, List[str]],
-                                         format_info: Dict,
-                                         current_config: Optional[Dict] = None) -> Dict[str, str]:
-        """
-        Render persistent column selection interface that doesn't disappear.
         Args:
-            detected_cols: Detected column categorization
-            format_info: File format information
-            current_config: Current column configuration (for preserving selections)
         Returns:
-            Column configuration dict
         """
-        st.write("Select which columns to use for your frequency analysis:")
-        word_cols = detected_cols.get('word_columns', [])
-        freq_cols = detected_cols.get('frequency_columns', [])
-        pos_cols = detected_cols.get('pos_columns', [])
-        # Determine default selections
-        default_word_idx = 0
-        default_freq_idx = 0
-        default_use_pos = False
-        default_pos_idx = 0
-        if current_config:
-            # Preserve current selections
-            if current_config['word_column'] in word_cols:
-                default_word_idx = word_cols.index(current_config['word_column'])
-            if current_config['frequency_column'] in freq_cols:
-                default_freq_idx = freq_cols.index(current_config['frequency_column'])
-            if 'pos_column' in current_config and current_config['pos_column'] in pos_cols:
-                default_use_pos = True
-                default_pos_idx = pos_cols.index(current_config['pos_column'])
-        col1, col2 = st.columns(2)
         with col1:
-            word_column = st.selectbox(
-                "Word Column",
-                options=word_cols,
-                index=default_word_idx,
-                help="Column containing word forms or lemmas",
-                key="persistent_word_col"
             )
-            # POS column selection (optional)
-            pos_column = None
-            if pos_cols:
-                use_pos = st.checkbox("Include POS column", value=default_use_pos, key="persistent_use_pos")
-                if use_pos:
-                    pos_column = st.selectbox(
-                        "POS Column",
-                        options=pos_cols,
-                        index=default_pos_idx,
-                        help="Column containing part-of-speech tags (optional)",
-                        key="persistent_pos_col"
-                    )
         with col2:
-            frequency_column = st.selectbox(
-                "Frequency Column",
-                options=freq_cols,
-                index=default_freq_idx,
-                help="Column containing frequency values for analysis",
-                key="persistent_freq_col"
-            )
-            # Show quick info about selected columns
-            st.write("**Selected Configuration:**")
-            st.write(f"• Words: `{word_column}`")
-            st.write(f"• Frequencies: `{frequency_column}`")
-            if pos_column:
-                st.write(f"• POS: `{pos_column}`")
-        # Always return configuration (no button needed)
-        config = {
-            'word_column': word_column,
-            'frequency_column': frequency_column,
-            'separator': format_info['separator'],
-            'has_header': format_info['has_header']
-        }
-        if pos_column:
-            config['pos_column'] = pos_column
-        return config
     @staticmethod
-    def render_enhanced_visualization_controls(analyzer: FrequencyAnalyzer, column_config: Dict) -> Optional[Dict]:
         """
-        Render enhanced visualization controls with max words limit.
         Args:
             analyzer: FrequencyAnalyzer instance with loaded data
-            column_config: Column configuration from user selection
-        Returns:
-            Dict with visualization configuration or None
         """
-        st.subheader("🎛️ Enhanced Visualization Controls")
-        # Get the frequency column
-        frequency_column = column_config['frequency_column']
         col1, col2, col3 = st.columns(3)
         with col1:
-            # Bin size controls
-            bin_size = st.slider(
-                "Bin Size (words per group)",
-                min_value=100,
-                max_value=2000,
-                value=500,
-                step=100,
-                help="Number of words to group together for rank-based analysis"
-            )
         with col2:
-            # Log transformation option
-            log_transform = st.checkbox(
-                "Apply log₁₀ transformation",
-                value=False,
-                help="Transform frequency values using log₁₀ for better visualization"
-            )
         with col3:
-            # Max words control
-            max_words = st.number_input(
-                "Max words to analyze",
-                min_value=1000,
-                max_value=200000,
-                value=None,
-                step=1000,
-                help="Limit analysis to top N most frequent words (leave empty for no limit)",
-                key="max_words_input"
             )
-            # Quick preset buttons
-            st.write("**Quick Presets:**")
-            preset_cols = st.columns(4)
-            if preset_cols[0].button("10K", key="preset_10k"):
-                st.session_state.max_words_preset = 10000
-            if preset_cols[1].button("25K", key="preset_25k"):
-                st.session_state.max_words_preset = 25000
-            if preset_cols[2].button("50K", key="preset_50k"):
-                st.session_state.max_words_preset = 50000
-            if preset_cols[3].button("All", key="preset_all"):
-                st.session_state.max_words_preset = None
-            # Use preset value if set
-            if 'max_words_preset' in st.session_state:
-                max_words = st.session_state.max_words_preset
-                del st.session_state.max_words_preset
-        # Generate visualization button
-        if st.button("📊 Generate Enhanced Visualization", type="primary", key="generate_viz"):
-            return {
-                'frequency_column': frequency_column,
-                'bin_size': bin_size,
-                'log_transform': log_transform,
-                'max_words_to_retain': max_words
-            }
-        return None
     @staticmethod
-    def render_enhanced_rank_based_analysis(analyzer: FrequencyAnalyzer, viz_config: Dict):
-        """
-        Render enhanced rank-based analysis with improved sample words display.
-        Args:
-            analyzer: FrequencyAnalyzer instance with loaded data
-            viz_config: Visualization configuration
-        """
-        st.subheader("📊 Enhanced Rank-Based Frequency Analysis")
-        frequency_column = viz_config['frequency_column']
-        bin_size = viz_config['bin_size']
-        log_transform = viz_config['log_transform']
-        max_words_to_retain = viz_config.get('max_words_to_retain')
-        try:
-            # Calculate statistics
-            stats = analyzer.calculate_statistics(frequency_column)
-            # Display basic statistics with word limit info
-            col1, col2, col3, col4 = st.columns(4)
-            with col1:
-                words_analyzed = max_words_to_retain if max_words_to_retain and max_words_to_retain < stats['count'] else stats['count']
-                st.metric("Words Analyzed", f"{words_analyzed:,}")
-            with col2:
-                st.metric("Mean Frequency", f"{stats['mean']:.2f}")
-            with col3:
-                st.metric("Median Frequency", f"{stats['median']:.2f}")
-            with col4:
-                st.metric("Std Deviation", f"{stats['std']:.2f}")
-            # Show word limit info if applied
-            if max_words_to_retain and max_words_to_retain < stats['count']:
-                st.info(f"📊 Analysis limited to top {max_words_to_retain:,} most frequent words (out of {stats['count']:,} total)")
-            # Create rank-based visualization with enhanced parameters
-            result = analyzer.create_rank_based_visualization_flexible(
-                column=frequency_column,
-                bin_size=bin_size,
-                log_transform=log_transform,
-                max_words_to_retain=max_words_to_retain
-            )
-            # Create the main visualization
-            fig = go.Figure()
-            fig.add_trace(go.Bar(
-                x=result['group_centers'],
-                y=result['avg_frequencies'],
-                name=f"Avg {frequency_column}",
-                marker_color='steelblue',
-                hovertemplate=(
-                    f"<b>Group %{{x}}</b><br>"
-                    f"Avg {'Log₁₀ ' if log_transform else ''}{frequency_column}: %{{y:.3f}}<br>"
-                    "<extra></extra>"
-                )
-            ))
-            fig.update_layout(
-                title=result.get('title_suffix', f"Enhanced Rank-Based Analysis - {frequency_column}"),
-                xaxis_title=result.get('x_label', f"Rank Groups (bin size: {bin_size})"),
-                yaxis_title=result.get('y_label', f"{'Log₁₀ ' if log_transform else ''}Average {frequency_column}"),
-                showlegend=False,
-                height=500
-            )
-            st.plotly_chart(fig, use_container_width=True)
-            # Enhanced sample words display (up to 20 bins with 5 random samples each)
-            st.write("### 🎯 Sample Words by Rank Group (5 Random Samples)")
-            sample_words = result.get('sample_words', {})
-            if sample_words:
-                # Display up to 20 groups in a more organized layout
-                num_groups = min(20, len(sample_words))
-                if num_groups > 0:
-                    st.write(f"Showing sample words from top {num_groups} rank groups:")
-                    # Display in rows of 4 groups each
-                    for row_start in range(0, num_groups, 4):
-                        cols = st.columns(4)
-                        for col_idx in range(4):
-                            group_idx = row_start + col_idx
-                            if group_idx < num_groups and group_idx in sample_words:
-                                with cols[col_idx]:
-                                    group_label = result['group_labels'][group_idx]
-                                    words = sample_words[group_idx]
-                                    st.write(f"**Group {group_label}:**")
-                                    word_list = [w['word'] for w in words]
-                                    # Display as bullet points for better readability
-                                    for word in word_list:
-                                        st.write(f"• {word}")
-                                    # Add spacing between groups
-                                    st.write("")
-            else:
-                st.write("No sample words available")
-            # Show enhanced group statistics
-            with st.expander("📈 Detailed Group Statistics"):
-                group_stats = result.get('group_stats')
-                if group_stats is not None and not group_stats.empty:
-                    display_stats = group_stats.copy()
-                    # Format numeric columns
-                    numeric_cols = display_stats.select_dtypes(include=[np.number]).columns
-                    for col in numeric_cols:
-                        if 'count' not in col.lower():
-                            display_stats[col] = display_stats[col].round(2)
-                    st.dataframe(display_stats, use_container_width=True)
-                else:
-                    st.write("No detailed statistics available")
-        except Exception as e:
-            st.error(f"Error in enhanced rank-based analysis: {str(e)}")
-            with st.expander("Error Details"):
-                st.code(str(e))

 This module provides Streamlit interface handlers for word frequency visualization,
 including file upload, visualization controls, and results display.
 Supports flexible column mapping for diverse frequency data formats.
+Updated to use MemoryFileHandler to avoid 403 errors on restricted environments.
 """
 import streamlit as st
 import sys
 import os
 from pathlib import Path
+from io import StringIO, BytesIO
 # Add parent directory to path for imports
 sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
 from text_analyzer.frequency_analyzer import FrequencyAnalyzer
+from web_app.utils import MemoryFileHandler
 class FrequencyHandlers:
     """
     @staticmethod
+    def render_frequency_visualization_interface():
         """
+        Main interface for frequency visualization analysis.
+        Manages state across multiple interactions.
         """
+        st.subheader("📊 Word Frequency Visualization")
+        # Initialize session state
         if 'analyzer' not in st.session_state:
             st.session_state.analyzer = None
         if 'format_info' not in st.session_state:
             st.session_state.format_info = None
         if 'uploaded_file_content' not in st.session_state:
             st.session_state.uploaded_file_content = None
+        if 'column_config' not in st.session_state:
+            st.session_state.column_config = None
+        # File selection
+        uploaded_file = FrequencyHandlers.render_file_selection_section()
+        if uploaded_file:
+            # Track file changes
+            current_file_name = uploaded_file.name if hasattr(uploaded_file, 'name') else 'sample_file'
+            if st.session_state.get('last_file_name') != current_file_name:
+                st.session_state.last_file_name = current_file_name
                 st.session_state.analyzer = None
                 st.session_state.format_info = None
                 try:
+                    # Check file size
+                    if hasattr(uploaded_file, 'size') and uploaded_file.size > 300 * 1024 * 1024:
+                        st.error(f"File too large ({uploaded_file.size / 1024 / 1024:.1f} MB). Maximum allowed: 300MB")
                         return
+                    # Process file using memory-based approach
+                    content = MemoryFileHandler.process_uploaded_file(uploaded_file, as_text=False)
+                    if content:
+                        st.session_state.uploaded_file_content = content
+                        st.success(f"✅ File '{current_file_name}' ({len(content):,} bytes) uploaded successfully")
                     else:
+                        st.error("Failed to read uploaded file. Please try again.")
                         return
                 except Exception as e:
                     st.error(f"❌ Failed to read uploaded file: {str(e)}")
                 # Initialize analyzer and process file (only if needed)
                 if st.session_state.analyzer is None or st.session_state.format_info is None:
                     st.session_state.analyzer = FrequencyAnalyzer(file_size_limit_mb=300)
+                    # Use the content we already read
                     st.session_state.format_info = st.session_state.analyzer.detect_file_format(st.session_state.uploaded_file_content)
                     # Show format detection results
                     st.success(f"✅ File format detected: {st.session_state.format_info['separator']}-separated, "
+                              f"{st.session_state.format_info['line_count']} lines")
+                # Parse the data if not already done
+                if st.session_state.analyzer.df is None:
+                    with st.spinner("Parsing frequency data..."):
+                        try:
+                            # Create file-like object from content
+                            file_obj = BytesIO(st.session_state.uploaded_file_content)
+                            st.session_state.analyzer.read_frequency_data_from_content(file_obj)
+                            if st.session_state.analyzer.df is None or st.session_state.analyzer.df.empty:
+                                st.error("No data could be parsed from the file. Please check the file format.")
+                                return
+                        except Exception as e:
+                            st.error(f"Error parsing file: {str(e)}")
+                            return
+                # Display results
+                with st.expander("📋 Data Preview", expanded=True):
+                    FrequencyHandlers.render_data_preview(
+                        st.session_state.analyzer.df.head(20),
+                        st.session_state.analyzer.detected_columns
+                    )
+                # Column configuration - always allow user to change
+                st.session_state.column_config = FrequencyHandlers.render_enhanced_column_configuration(
+                    st.session_state.analyzer.detected_columns,
+                    st.session_state.analyzer.df
+                )
+                if st.session_state.column_config:
+                    # Set the analyzer's columns based on user selection
+                    st.session_state.analyzer.word_column = st.session_state.column_config['word_column']
+                    st.session_state.analyzer.frequency_column = st.session_state.column_config['frequency_column']
+                    # Visualization controls
                     viz_config = FrequencyHandlers.render_enhanced_visualization_controls(st.session_state.analyzer, st.session_state.column_config)
                     if viz_config:
                 else:
                     with st.expander("Error Details"):
                         st.code(str(e))
+            # Cleanup session for debugging
+            if st.sidebar.button("🔄 Reset Analysis", help="Clear all cached data and start fresh"):
+                for key in ['analyzer', 'format_info', 'uploaded_file_content', 'column_config', 'last_file_name']:
+                    if key in st.session_state:
+                        del st.session_state[key]
+                st.experimental_rerun()
     @staticmethod
+    def render_file_selection_section():
         """
+        Render file selection section.
         Returns:
             File-like object or None
             df: Preview DataFrame
             detected_cols: Detected column categorization
         """
+        st.write("**File Preview:**")
+        st.dataframe(
+            df,
+            use_container_width=True,
+            hide_index=True,
+            height=400
+        )
+        st.caption(f"Showing first {len(df)} of total entries")
+        # Show detected columns
+        with st.expander("🔍 Detected Columns", expanded=False):
+            col1, col2, col3 = st.columns(3)
             with col1:
+                st.write("**Word Columns:**")
+                for col in detected_cols.get('word_columns', []):
+                    st.write(f"• {col}")
             with col2:
+                st.write("**Frequency Columns:**")
+                for col in detected_cols.get('frequency_columns', []):
+                    st.write(f"• {col}")
+            with col3:
                 st.write("**Other Columns:**")
+                for col in detected_cols.get('other_columns', []):
+                    st.write(f"• {col}")
     @staticmethod
+    def render_enhanced_column_configuration(detected_cols: Dict[str, List[str]], df: pd.DataFrame):
         """
+        Render enhanced column configuration with smart defaults.
         Args:
             detected_cols: Detected column categorization
+            df: The full DataFrame
         Returns:
+            Dictionary with column configuration or None
         """
+        st.subheader("⚙️ Column Configuration")
         col1, col2 = st.columns(2)
         with col1:
+            # Word column selection with smart default
+            word_cols = detected_cols.get('word_columns', [])
+            if not word_cols:
+                word_cols = list(df.columns)
+            default_word = 0
+            # Prioritize columns with 'word', 'token', 'lemma', etc.
+            for i, col in enumerate(word_cols):
+                if any(term in col.lower() for term in ['word', 'token', 'lemma', 'type']):
+                    default_word = i
+                    break
+            word_col = st.selectbox(
+                "Word/Token Column",
                 options=word_cols,
+                index=default_word,
+                help="Select the column containing words or tokens"
             )
         with col2:
+            # Frequency column selection with smart default
+            freq_cols = detected_cols.get('frequency_columns', [])
+            if not freq_cols:
+                # Try to identify numeric columns
+                freq_cols = [col for col in df.columns if pd.api.types.is_numeric_dtype(df[col])]
+            if not freq_cols:
+                freq_cols = list(df.columns)
+            default_freq = 0
+            # Prioritize columns with 'freq', 'count', etc.
+            for i, col in enumerate(freq_cols):
+                if any(term in col.lower() for term in ['freq', 'count', 'occurrences']):
+                    default_freq = i
+                    break
+            freq_col = st.selectbox(
                 "Frequency Column",
                 options=freq_cols,
+                index=default_freq,
+                help="Select the column containing frequency counts"
             )
+        if word_col and freq_col:
+            # Validate configuration
+            if word_col == freq_col:
+                st.error("Word and frequency columns cannot be the same!")
+                return None
+            # Show sample data with selected columns
+            st.write("**Preview with selected columns:**")
+            preview_df = df[[word_col, freq_col]].head(5)
+            st.dataframe(preview_df, use_container_width=True, hide_index=True)
+            return {
+                'word_column': word_col,
+                'frequency_column': freq_col
+            }
         return None
     @staticmethod
+    def render_enhanced_visualization_controls(analyzer: FrequencyAnalyzer, column_config: Dict[str, str]):
         """
+        Render enhanced visualization controls.
         Args:
+            analyzer: FrequencyAnalyzer instance
+            column_config: Column configuration
         Returns:
+            Dictionary with visualization configuration or None
         """
+        st.subheader("📊 Visualization Settings")
+        # Get data statistics
+        total_words = len(analyzer.df)
+        max_freq = analyzer.df[column_config['frequency_column']].max()
+        min_freq = analyzer.df[column_config['frequency_column']].min()
+        col1, col2, col3 = st.columns(3)
         with col1:
+            chart_type = st.selectbox(
+                "Chart Type",
+                ["Bar Chart", "Line Chart", "Area Chart", "Scatter Plot"],
+                help="Select visualization type"
             )
         with col2:
+            # Dynamic range based on data
+            max_words = min(total_words, 1000)
+            default_n = min(50, max_words)
+            top_n = st.slider(
+                "Number of Words",
+                min_value=10,
+                max_value=max_words,
+                value=default_n,
+                step=10,
+                help=f"Display top N words (total: {total_words:,})"
+            )
+        with col3:
+            scale = st.selectbox(
+                "Y-Axis Scale",
+                ["Linear", "Logarithmic"],
+                help="Logarithmic scale is useful for data with large frequency variations"
+            )
+        # Advanced options
+        with st.expander("🎨 Advanced Options", expanded=False):
+            col1, col2 = st.columns(2)
+            with col1:
+                color_scheme = st.selectbox(
+                    "Color Scheme",
+                    ["Viridis", "Blues", "Reds", "Turbo", "Rainbow"],
+                    help="Select color scheme for visualization"
+                )
+                show_values = st.checkbox(
+                    "Show Values on Chart",
+                    value=False,
+                    help="Display frequency values on the chart"
+                )
+            with col2:
+                orientation = st.radio(
+                    "Orientation",
+                    ["Vertical", "Horizontal"],
+                    help="Chart orientation"
+                )
+                show_grid = st.checkbox(
+                    "Show Grid",
+                    value=True,
+                    help="Display grid lines"
+                )
+        # Summary statistics
+        st.write("**Data Statistics:**")
+        stat_col1, stat_col2, stat_col3, stat_col4 = st.columns(4)
+        with stat_col1:
+            st.metric("Total Words", f"{total_words:,}")
+        with stat_col2:
+            st.metric("Max Frequency", f"{max_freq:,}")
+        with stat_col3:
+            st.metric("Min Frequency", f"{min_freq:,}")
+        with stat_col4:
+            mean_freq = analyzer.df[column_config['frequency_column']].mean()
+            st.metric("Mean Frequency", f"{mean_freq:,.1f}")
+        return {
+            'chart_type': chart_type,
+            'top_n': top_n,
+            'scale': scale,
+            'color_scheme': color_scheme.lower(),
+            'show_values': show_values,
+            'orientation': orientation.lower(),
+            'show_grid': show_grid,
+            'word_column': column_config['word_column'],
+            'frequency_column': column_config['frequency_column']
+        }
     @staticmethod
+    def render_enhanced_rank_based_analysis(analyzer: FrequencyAnalyzer, viz_config: dict):
         """
+        Render enhanced rank-based frequency analysis.
         Args:
             analyzer: FrequencyAnalyzer instance with loaded data
+            viz_config: Visualization configuration
         """
+        st.subheader("📈 Frequency Analysis Results")
+        # Get top N words
+        top_n = viz_config['top_n']
+        word_col = viz_config['word_column']
+        freq_col = viz_config['frequency_column']
+        # Sort and get top N
+        df_sorted = analyzer.df.sort_values(by=freq_col, ascending=False).head(top_n).copy()
+        # Add rank column
+        df_sorted['rank'] = range(1, len(df_sorted) + 1)
+        # Create visualization
+        if viz_config['orientation'] == 'horizontal':
+            x_col, y_col = freq_col, word_col
+            # Reverse order for horizontal bar chart
+            df_sorted = df_sorted.iloc[::-1]
+        else:
+            x_col, y_col = word_col, freq_col
+        # Create figure based on chart type
+        if viz_config['chart_type'] == "Bar Chart":
+            fig = px.bar(
+                df_sorted,
+                x=x_col,
+                y=y_col,
+                color=freq_col,
+                color_continuous_scale=viz_config['color_scheme'],
+                title=f"Top {top_n} Most Frequent Words",
+                labels={freq_col: "Frequency", word_col: "Words"},
+                orientation='h' if viz_config['orientation'] == 'horizontal' else 'v'
+            )
+        elif viz_config['chart_type'] == "Line Chart":
+            fig = px.line(
+                df_sorted,
+                x=word_col,
+                y=freq_col,
+                markers=True,
+                title=f"Top {top_n} Most Frequent Words",
+                labels={freq_col: "Frequency", word_col: "Words"}
+            )
+            fig.update_traces(line_color=px.colors.qualitative.Plotly[0], line_width=3)
+        elif viz_config['chart_type'] == "Area Chart":
+            fig = px.area(
+                df_sorted,
+                x=word_col,
+                y=freq_col,
+                title=f"Top {top_n} Most Frequent Words",
+                labels={freq_col: "Frequency", word_col: "Words"}
+            )
+        else:  # Scatter Plot
+            fig = px.scatter(
+                df_sorted,
+                x='rank',
+                y=freq_col,
+                text=word_col,
+                size=freq_col,
+                color=freq_col,
+                color_continuous_scale=viz_config['color_scheme'],
+                title=f"Rank-Frequency Distribution (Top {top_n})",
+                labels={freq_col: "Frequency", 'rank': "Rank"}
+            )
+            fig.update_traces(textposition='top center')
+        # Apply logarithmic scale if selected
+        if viz_config['scale'] == "Logarithmic":
+            if viz_config['orientation'] == 'horizontal':
+                fig.update_xaxes(type="log")
+            else:
+                fig.update_yaxes(type="log")
+        # Show values on chart if selected
+        if viz_config['show_values'] and viz_config['chart_type'] == "Bar Chart":
+            fig.update_traces(texttemplate='%{value:,.0f}', textposition='outside')
+        # Update layout
+        fig.update_layout(
+            showlegend=False,
+            height=600,
+            xaxis_tickangle=-45 if viz_config['orientation'] == 'vertical' else 0,
+            plot_bgcolor='white' if viz_config['show_grid'] else 'rgba(0,0,0,0)',
+            xaxis_showgrid=viz_config['show_grid'],
+            yaxis_showgrid=viz_config['show_grid']
+        )
+        # Display chart
+        st.plotly_chart(fig, use_container_width=True)
+        # Additional analyses
+        tab1, tab2, tab3 = st.tabs(["📊 Statistics", "📋 Data Table", "📈 Distribution Analysis"])
+        with tab1:
+            FrequencyHandlers.render_statistics_summary(df_sorted, freq_col, word_col)
+        with tab2:
+            FrequencyHandlers.render_data_table(df_sorted, word_col, freq_col)
+        with tab3:
+            FrequencyHandlers.render_distribution_analysis(analyzer, freq_col, viz_config)
+    @staticmethod
+    def render_statistics_summary(df: pd.DataFrame, freq_col: str, word_col: str):
+        """Render statistical summary of the frequency data."""
         col1, col2, col3 = st.columns(3)
         with col1:
+            st.write("**Frequency Statistics:**")
+            st.write(f"• Total frequency: {df[freq_col].sum():,}")
+            st.write(f"• Mean frequency: {df[freq_col].mean():,.1f}")
+            st.write(f"• Median frequency: {df[freq_col].median():,.1f}")
         with col2:
+            st.write("**Coverage Analysis:**")
+            total_freq = df[freq_col].sum()
+            cumsum = df[freq_col].cumsum()
+            coverage_50 = len(cumsum[cumsum <= total_freq * 0.5])
+            coverage_80 = len(cumsum[cumsum <= total_freq * 0.8])
+            st.write(f"• Words for 50% coverage: {coverage_50}")
+            st.write(f"• Words for 80% coverage: {coverage_80}")
+            st.write(f"• Top 10 words: {(df[freq_col].head(10).sum() / total_freq * 100):.1f}%")
         with col3:
+            st.write("**Diversity Metrics:**")
+            st.write(f"• Unique words shown: {len(df)}")
+            st.write(f"• Hapax legomena: {len(df[df[freq_col] == 1])}")
+            st.write(f"• Type-token ratio: {len(df) / df[freq_col].sum():.4f}")
+    @staticmethod
+    def render_data_table(df: pd.DataFrame, word_col: str, freq_col: str):
+        """Render interactive data table."""
+        # Add percentage column
+        df_display = df.copy()
+        df_display['percentage'] = (df_display[freq_col] / df_display[freq_col].sum() * 100).round(2)
+        df_display['cumulative_%'] = (df_display[freq_col].cumsum() / df_display[freq_col].sum() * 100).round(2)
+        # Display options
+        col1, col2 = st.columns([1, 3])
+        with col1:
+            show_cols = st.multiselect(
+                "Columns to show:",
+                options=df_display.columns.tolist(),
+                default=['rank', word_col, freq_col, 'percentage', 'cumulative_%']
             )
+        # Display table
+        st.dataframe(
+            df_display[show_cols],
+            use_container_width=True,
+            hide_index=True,
+            height=400
+        )
+        # Download button
+        csv = df_display[show_cols].to_csv(index=False)
+        st.download_button(
+            label="📥 Download as CSV",
+            data=csv,
+            file_name=f"frequency_analysis_top_{len(df)}.csv",
+            mime="text/csv"
+        )
     @staticmethod
+    def render_distribution_analysis(analyzer: FrequencyAnalyzer, freq_col: str, viz_config: dict):
+        """Render frequency distribution analysis."""
+        # Zipf's law analysis
+        st.write("**Zipf's Law Analysis:**")
+        df_full = analyzer.df.sort_values(by=freq_col, ascending=False).copy()
+        df_full['rank'] = range(1, len(df_full) + 1)
+        df_full['log_rank'] = np.log10(df_full['rank'])
+        df_full['log_freq'] = np.log10(df_full[freq_col])
+        # Create Zipf plot
+        fig_zipf = px.scatter(
+            df_full.head(min(1000, len(df_full))),
+            x='log_rank',
+            y='log_freq',
+            title="Zipf's Law Distribution (Log-Log Plot)",
+            labels={'log_rank': 'log₁₀(Rank)', 'log_freq': 'log₁₀(Frequency)'},
+            trendline="ols"
+        )
+        fig_zipf.update_layout(height=400)
+        st.plotly_chart(fig_zipf, use_container_width=True)
+        # Frequency bands analysis
+        st.write("**Frequency Bands:**")
+        bands = pd.cut(df_full[freq_col],
+                      bins=[0, 1, 10, 100, 1000, 10000, float('inf')],
+                      labels=['1', '2-10', '11-100', '101-1000', '1001-10000', '10000+'])
+        band_counts = bands.value_counts().sort_index()
+        col1, col2 = st.columns(2)
+        with col1:
+            st.write("Words per frequency band:")
+            for band, count in band_counts.items():
+                st.write(f"• {band}: {count:,} words")
+        with col2:
+            # Pie chart of frequency bands
+            fig_pie = px.pie(
+                values=band_counts.values,
+                names=band_counts.index,
+                title="Distribution of Words by Frequency Band"
+            )
+            fig_pie.update_layout(height=300)
+            st.plotly_chart(fig_pie, use_container_width=True)

web_app/handlers/frequency_handlers.py.backup_20250726_162020 ADDED Viewed

	@@ -0,0 +1,733 @@

+"""
+Frequency Analysis Handlers for Streamlit Interface
+This module provides Streamlit interface handlers for word frequency visualization,
+including file upload, visualization controls, and results display.
+Supports flexible column mapping for diverse frequency data formats.
+"""
+import streamlit as st
+import pandas as pd
+import plotly.graph_objects as go
+import plotly.express as px
+import numpy as np
+from typing import Dict, List, Optional
+import sys
+import os
+from pathlib import Path
+from io import StringIO
+# Add parent directory to path for imports
+sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
+from text_analyzer.frequency_analyzer import FrequencyAnalyzer
+from web_app.utils import FileUploadHandler
+class FrequencyHandlers:
+    """
+    Streamlit interface handlers for frequency analysis functionality.
+    """
+    @staticmethod
+    def handle_frequency_analysis():
+        """
+        Enhanced frequency analysis interface handler with persistent column selection.
+        """
+        st.markdown("Upload a frequency data file (TSV/CSV) with flexible column mapping support. "
+                   "The system will automatically detect columns and let you choose which ones to use for analysis.")
+        # Initialize session state variables
+        if 'uploaded_file_name' not in st.session_state:
+            st.session_state.uploaded_file_name = None
+        if 'column_config' not in st.session_state:
+            st.session_state.column_config = None
+        if 'analyzer' not in st.session_state:
+            st.session_state.analyzer = None
+        if 'format_info' not in st.session_state:
+            st.session_state.format_info = None
+        if 'detected_cols' not in st.session_state:
+            st.session_state.detected_cols = None
+        if 'uploaded_file_content' not in st.session_state:
+            st.session_state.uploaded_file_content = None
+        # File upload section
+        uploaded_file = FrequencyHandlers.render_file_upload()
+        # Check if a new file was uploaded
+        if uploaded_file is not None:
+            current_file_name = uploaded_file.name
+            # Reset state if new file is uploaded
+            if st.session_state.uploaded_file_name != current_file_name:
+                st.session_state.uploaded_file_name = current_file_name
+                st.session_state.column_config = None
+                st.session_state.analyzer = None
+                st.session_state.format_info = None
+                st.session_state.detected_cols = None
+                # Handle file content loading with /tmp approach for HF Spaces compatibility
+                try:
+                    # Validate file size first
+                    if not FileUploadHandler.validate_file_size(uploaded_file, max_size_mb=300):
+                        return
+                    # Save to temp and read content
+                    temp_path = FileUploadHandler.save_to_temp(uploaded_file, prefix="freq")
+                    if temp_path:
+                        st.session_state.uploaded_file_content = FileUploadHandler.read_from_temp(temp_path)
+                        st.session_state.temp_file_path = temp_path
+                        st.success(f"✅ File '{current_file_name}' ({len(st.session_state.uploaded_file_content):,} bytes) uploaded successfully")
+                    else:
+                        st.error("Failed to save uploaded file. Please try again.")
+                        return
+                except Exception as e:
+                    st.error(f"❌ Failed to read uploaded file: {str(e)}")
+                    if "403" in str(e) or "Forbidden" in str(e):
+                        st.error("**Upload Error**: File upload was blocked. This is a known issue on Hugging Face Spaces. "
+                                "Please try using the sample files option or deploy locally.")
+                    return
+            try:
+                # Initialize analyzer and process file (only if needed)
+                if st.session_state.analyzer is None or st.session_state.format_info is None:
+                    st.session_state.analyzer = FrequencyAnalyzer(file_size_limit_mb=300)
+                    # Use the content we already read from temp file
+                    st.session_state.format_info = st.session_state.analyzer.detect_file_format(st.session_state.uploaded_file_content)
+                    # Show format detection results
+                    st.success(f"✅ File format detected: {st.session_state.format_info['separator']}-separated, "
+                              f"{'with' if st.session_state.format_info['has_header'] else 'without'} header, "
+                              f"~{st.session_state.format_info['estimated_columns']} columns")
+                    # Prepare data for column detection (use already loaded content)
+                    content = st.session_state.uploaded_file_content
+                    if isinstance(content, bytes):
+                        content = content.decode('utf-8')
+                    # Read data for preview and column detection
+                    df_preview = pd.read_csv(StringIO(content),
+                                           sep=st.session_state.format_info['separator'],
+                                           header=0 if st.session_state.format_info['has_header'] else None,
+                                           nrows=100)
+                    # Detect available columns
+                    st.session_state.detected_cols = st.session_state.analyzer.detect_columns(df_preview)
+                    # Show data preview
+                    FrequencyHandlers.render_data_preview(df_preview, st.session_state.detected_cols)
+                # ALWAYS show column selection if we have detected columns (persistent interface)
+                if st.session_state.detected_cols is not None:
+                    with st.expander("🎯 Column Selection", expanded=True):
+                        column_config = FrequencyHandlers.render_persistent_column_selection(
+                            st.session_state.detected_cols,
+                            st.session_state.format_info,
+                            st.session_state.column_config
+                        )
+                        # Check if column configuration changed
+                        if column_config != st.session_state.column_config:
+                            st.session_state.column_config = column_config
+                            # Reload data with new configuration
+                            df = st.session_state.analyzer.load_frequency_data(st.session_state.uploaded_file_content, column_config)
+                            st.session_state.loaded_data = df
+                            st.rerun()
+                # ALWAYS show visualization controls if we have a column config
+                if st.session_state.column_config is not None:
+                    viz_config = FrequencyHandlers.render_enhanced_visualization_controls(st.session_state.analyzer, st.session_state.column_config)
+                    if viz_config:
+                        # Generate analysis
+                        FrequencyHandlers.render_enhanced_rank_based_analysis(st.session_state.analyzer, viz_config)
+            except Exception as e:
+                st.error(f"Error processing file: {str(e)}")
+                # Provide specific error guidance
+                if "403" in str(e) or "Forbidden" in str(e):
+                    st.error("**HTTP 403 Error**: File upload was blocked by the server.")
+                    st.info("This is a known limitation on Hugging Face Spaces. Please use the sample files option or deploy the app locally for full functionality.")
+                elif "timeout" in str(e).lower():
+                    st.error("**Timeout Error**: File processing took too long")
+                    st.info("Try uploading a smaller file or check your internet connection")
+                elif "memory" in str(e).lower() or "RAM" in str(e).upper():
+                    st.error("**Memory Error**: Not enough memory to process this file")
+                    st.info("Try uploading a smaller file")
+                else:
+                    with st.expander("Error Details"):
+                        st.code(str(e))
+                        st.write("**Debug Information:**")
+                        st.write(f"- File size: {len(st.session_state.uploaded_file_content) if st.session_state.uploaded_file_content else 'Unknown'} bytes")
+                        st.write(f"- Session state keys: {list(st.session_state.keys())}")
+                st.info("Please ensure your file is a valid TSV/CSV with appropriate columns.")
+        elif st.session_state.column_config is not None and st.session_state.uploaded_file_content is not None:
+            # Show persistent interface even when no file is currently selected (using cached data)
+            with st.expander("🎯 Column Selection", expanded=False):
+                column_config = FrequencyHandlers.render_persistent_column_selection(
+                    st.session_state.detected_cols,
+                    st.session_state.format_info,
+                    st.session_state.column_config
+                )
+                # Check if column configuration changed
+                if column_config != st.session_state.column_config:
+                    st.session_state.column_config = column_config
+                    # Reload data with new configuration
+                    df = st.session_state.analyzer.load_frequency_data(st.session_state.uploaded_file_content, column_config)
+                    st.session_state.loaded_data = df
+                    st.rerun()
+            viz_config = FrequencyHandlers.render_enhanced_visualization_controls(st.session_state.analyzer, st.session_state.column_config)
+            if viz_config:
+                # Generate analysis
+                FrequencyHandlers.render_enhanced_rank_based_analysis(st.session_state.analyzer, viz_config)
+        # Cleanup old temp files periodically
+        try:
+            FileUploadHandler.cleanup_old_temp_files(max_age_hours=1)
+        except:
+            pass
+    @staticmethod
+    def render_file_upload():
+        """
+        Render enhanced file upload interface with sample files fallback.
+        Returns:
+            File-like object or None
+        """
+        st.subheader("📄 Select Frequency Data")
+        # Data source selection
+        data_source = st.radio(
+            "Choose data source:",
+            ["Upload file", "Use sample files"],
+            help="Note: File uploads may experience issues on Hugging Face Spaces. Use sample files as a reliable alternative."
+        )
+        if data_source == "Upload file":
+            uploaded_file = st.file_uploader(
+                "Choose a frequency data file",
+                type=['tsv', 'csv', 'txt'],
+                help="Upload a TSV or CSV file with frequency data. Supports flexible column mapping.\n⚠️ If upload fails, try using sample files instead.",
+                accept_multiple_files=False
+            )
+        else:
+            # Sample files selection
+            sample_files = {
+                "word_freq.txt": "data/word_freq.txt",
+                "COCA_5000.txt": "data/COCA_5000.txt",
+                "jpn_word_freq.txt": "data/jpn_word_freq.txt"
+            }
+            selected_sample = st.selectbox(
+                "Choose a sample file:",
+                options=list(sample_files.keys()),
+                help="Pre-loaded frequency data files for testing"
+            )
+            if st.button("Load Sample File", type="primary"):
+                sample_path = sample_files[selected_sample]
+                if os.path.exists(sample_path):
+                    try:
+                        # Create a file-like object from sample file
+                        from io import BytesIO
+                        with open(sample_path, 'rb') as f:
+                            content = f.read()
+                        # Create BytesIO object that mimics uploaded file
+                        uploaded_file = BytesIO(content)
+                        uploaded_file.name = selected_sample
+                        uploaded_file.type = 'text/tab-separated-values' if selected_sample.endswith('.txt') else 'text/csv'
+                        uploaded_file.size = len(content)
+                        # Store in session state to persist across reruns
+                        st.session_state.sample_file = uploaded_file
+                        st.session_state.sample_file_name = selected_sample
+                        st.success(f"Loaded sample file: {selected_sample}")
+                    except Exception as e:
+                        st.error(f"Error loading sample file: {str(e)}")
+                        uploaded_file = None
+                else:
+                    st.error(f"Sample file not found: {sample_path}")
+                    uploaded_file = None
+            else:
+                # Check if sample file was previously loaded
+                uploaded_file = st.session_state.get('sample_file', None)
+                if uploaded_file and 'sample_file_name' in st.session_state:
+                    st.info(f"Using loaded sample file: {st.session_state.sample_file_name}")
+        if uploaded_file is None and data_source == "Upload file":
+            # Show example formats
+            st.info("**Supported formats:**")
+            col1, col2 = st.columns(2)
+            with col1:
+                st.write("**Traditional format:**")
+                example_traditional = """Type\tFreq\tRank
+the\t69868\t1
+of\t36426\t2
+and\t28891\t3"""
+                st.code(example_traditional, language="text")
+            with col2:
+                st.write("**Rich corpus format:**")
+                example_rich = """rank\tlForm\tlemma\tpos\tfrequency\tpmw
+1\tノ\tの\t助詞\t5061558\t48383.9
+2\tニ\tに\t助詞\t3576558\t34188.7
+3\tテ\tて\t助詞\t3493117\t33391.0"""
+                st.code(example_rich, language="text")
+            st.write("**File size limit:** 300MB")
+        return uploaded_file
+    @staticmethod
+    def render_data_preview(df: pd.DataFrame, detected_cols: Dict[str, List[str]]):
+        """
+        Render enhanced data preview section with column detection results.
+        Args:
+            df: Preview DataFrame
+            detected_cols: Detected column categorization
+        """
+        st.subheader("📊 Data Preview")
+        # Basic metrics
+        col1, col2, col3 = st.columns(3)
+        with col1:
+            st.metric("Total Rows", len(df))
+        with col2:
+            st.metric("Total Columns", len(df.columns))
+        with col3:
+            word_cols = len(detected_cols.get('word_columns', []))
+            freq_cols = len(detected_cols.get('frequency_columns', []))
+            st.metric("Detected", f"{word_cols} word, {freq_cols} freq")
+        # Show sample data
+        st.write("**First 5 rows:**")
+        st.dataframe(df.head(), use_container_width=True)
+        # Show detected column categories
+        with st.expander("🔍 Column Detection Results", expanded=True):
+            col1, col2 = st.columns(2)
+            with col1:
+                st.write("**Word Columns (text data):**")
+                word_cols = detected_cols.get('word_columns', [])
+                if word_cols:
+                    for col in word_cols:
+                        st.write(f"- `{col}` ({df[col].dtype})")
+                else:
+                    st.write("None detected")
+                st.write("**POS Columns:**")
+                pos_cols = detected_cols.get('pos_columns', [])
+                if pos_cols:
+                    for col in pos_cols:
+                        st.write(f"- `{col}` ({df[col].dtype})")
+                else:
+                    st.write("None detected")
+            with col2:
+                st.write("**Frequency Columns (numeric data):**")
+                freq_cols = detected_cols.get('frequency_columns', [])
+                if freq_cols:
+                    for col in freq_cols:
+                        sample_vals = df[col].dropna().head(3).tolist()
+                        st.write(f"- `{col}` ({df[col].dtype}) - e.g., {sample_vals}")
+                else:
+                    st.write("None detected")
+                st.write("**Other Columns:**")
+                other_cols = detected_cols.get('other_columns', [])
+                if other_cols:
+                    for col in other_cols[:5]:  # Show max 5
+                        st.write(f"- `{col}` ({df[col].dtype})")
+                    if len(other_cols) > 5:
+                        st.write(f"... and {len(other_cols) - 5} more")
+                else:
+                    st.write("None")
+    @staticmethod
+    def render_column_selection_simplified(detected_cols: Dict[str, List[str]], format_info: Dict) -> Optional[Dict[str, str]]:
+        """
+        Render simplified column selection interface without multi-frequency complexity.
+        Args:
+            detected_cols: Detected column categorization
+            format_info: File format information
+        Returns:
+            Column configuration dict or None
+        """
+        st.subheader("🎯 Column Mapping")
+        st.write("Select which columns to use for your frequency analysis:")
+        word_cols = detected_cols.get('word_columns', [])
+        freq_cols = detected_cols.get('frequency_columns', [])
+        pos_cols = detected_cols.get('pos_columns', [])
+        if not word_cols or not freq_cols:
+            st.error("❌ Required columns not detected. Please ensure your file has:")
+            st.write("- At least one text column (for words)")
+            st.write("- At least one numeric column (for frequencies)")
+            return None
+        col1, col2 = st.columns(2)
+        with col1:
+            # Word column selection
+            word_column = st.selectbox(
+                "Word Column",
+                options=word_cols,
+                index=0,
+                help="Column containing word forms or lemmas"
+            )
+            # POS column selection (optional)
+            pos_column = None
+            if pos_cols:
+                use_pos = st.checkbox("Include POS column", value=False)
+                if use_pos:
+                    pos_column = st.selectbox(
+                        "POS Column",
+                        options=pos_cols,
+                        index=0,
+                        help="Column containing part-of-speech tags (optional)"
+                    )
+        with col2:
+            # Frequency column selection
+            frequency_column = st.selectbox(
+                "Frequency Column",
+                options=freq_cols,
+                index=0,
+                help="Column containing frequency values for analysis"
+            )
+        # Confirm button
+        if st.button("🚀 Start Analysis", type="primary"):
+            config = {
+                'word_column': word_column,
+                'frequency_column': frequency_column,
+                'separator': format_info['separator'],
+                'has_header': format_info['has_header']
+            }
+            if pos_column:
+                config['pos_column'] = pos_column
+            return config
+        return None
+    @staticmethod
+    def render_visualization_controls_simplified(analyzer: FrequencyAnalyzer, column_config: Dict) -> Optional[Dict]:
+        """
+        Legacy method - redirects to enhanced controls for backward compatibility.
+        """
+        return FrequencyHandlers.render_enhanced_visualization_controls(analyzer, column_config)
+    @staticmethod
+    def render_rank_based_analysis_simplified(analyzer: FrequencyAnalyzer, viz_config: Dict):
+        """
+        Legacy method - redirects to enhanced analysis for backward compatibility.
+        """
+        return FrequencyHandlers.render_enhanced_rank_based_analysis(analyzer, viz_config)
+    @staticmethod
+    def render_persistent_column_selection(detected_cols: Dict[str, List[str]],
+                                         format_info: Dict,
+                                         current_config: Optional[Dict] = None) -> Dict[str, str]:
+        """
+        Render persistent column selection interface that doesn't disappear.
+        Args:
+            detected_cols: Detected column categorization
+            format_info: File format information
+            current_config: Current column configuration (for preserving selections)
+        Returns:
+            Column configuration dict
+        """
+        st.write("Select which columns to use for your frequency analysis:")
+        word_cols = detected_cols.get('word_columns', [])
+        freq_cols = detected_cols.get('frequency_columns', [])
+        pos_cols = detected_cols.get('pos_columns', [])
+        # Determine default selections
+        default_word_idx = 0
+        default_freq_idx = 0
+        default_use_pos = False
+        default_pos_idx = 0
+        if current_config:
+            # Preserve current selections
+            if current_config['word_column'] in word_cols:
+                default_word_idx = word_cols.index(current_config['word_column'])
+            if current_config['frequency_column'] in freq_cols:
+                default_freq_idx = freq_cols.index(current_config['frequency_column'])
+            if 'pos_column' in current_config and current_config['pos_column'] in pos_cols:
+                default_use_pos = True
+                default_pos_idx = pos_cols.index(current_config['pos_column'])
+        col1, col2 = st.columns(2)
+        with col1:
+            word_column = st.selectbox(
+                "Word Column",
+                options=word_cols,
+                index=default_word_idx,
+                help="Column containing word forms or lemmas",
+                key="persistent_word_col"
+            )
+            # POS column selection (optional)
+            pos_column = None
+            if pos_cols:
+                use_pos = st.checkbox("Include POS column", value=default_use_pos, key="persistent_use_pos")
+                if use_pos:
+                    pos_column = st.selectbox(
+                        "POS Column",
+                        options=pos_cols,
+                        index=default_pos_idx,
+                        help="Column containing part-of-speech tags (optional)",
+                        key="persistent_pos_col"
+                    )
+        with col2:
+            frequency_column = st.selectbox(
+                "Frequency Column",
+                options=freq_cols,
+                index=default_freq_idx,
+                help="Column containing frequency values for analysis",
+                key="persistent_freq_col"
+            )
+            # Show quick info about selected columns
+            st.write("**Selected Configuration:**")
+            st.write(f"• Words: `{word_column}`")
+            st.write(f"• Frequencies: `{frequency_column}`")
+            if pos_column:
+                st.write(f"• POS: `{pos_column}`")
+        # Always return configuration (no button needed)
+        config = {
+            'word_column': word_column,
+            'frequency_column': frequency_column,
+            'separator': format_info['separator'],
+            'has_header': format_info['has_header']
+        }
+        if pos_column:
+            config['pos_column'] = pos_column
+        return config
+    @staticmethod
+    def render_enhanced_visualization_controls(analyzer: FrequencyAnalyzer, column_config: Dict) -> Optional[Dict]:
+        """
+        Render enhanced visualization controls with max words limit.
+        Args:
+            analyzer: FrequencyAnalyzer instance with loaded data
+            column_config: Column configuration from user selection
+        Returns:
+            Dict with visualization configuration or None
+        """
+        st.subheader("🎛️ Enhanced Visualization Controls")
+        # Get the frequency column
+        frequency_column = column_config['frequency_column']
+        col1, col2, col3 = st.columns(3)
+        with col1:
+            # Bin size controls
+            bin_size = st.slider(
+                "Bin Size (words per group)",
+                min_value=100,
+                max_value=2000,
+                value=500,
+                step=100,
+                help="Number of words to group together for rank-based analysis"
+            )
+        with col2:
+            # Log transformation option
+            log_transform = st.checkbox(
+                "Apply log₁₀ transformation",
+                value=False,
+                help="Transform frequency values using log₁₀ for better visualization"
+            )
+        with col3:
+            # Max words control
+            max_words = st.number_input(
+                "Max words to analyze",
+                min_value=1000,
+                max_value=200000,
+                value=None,
+                step=1000,
+                help="Limit analysis to top N most frequent words (leave empty for no limit)",
+                key="max_words_input"
+            )
+            # Quick preset buttons
+            st.write("**Quick Presets:**")
+            preset_cols = st.columns(4)
+            if preset_cols[0].button("10K", key="preset_10k"):
+                st.session_state.max_words_preset = 10000
+            if preset_cols[1].button("25K", key="preset_25k"):
+                st.session_state.max_words_preset = 25000
+            if preset_cols[2].button("50K", key="preset_50k"):
+                st.session_state.max_words_preset = 50000
+            if preset_cols[3].button("All", key="preset_all"):
+                st.session_state.max_words_preset = None
+            # Use preset value if set
+            if 'max_words_preset' in st.session_state:
+                max_words = st.session_state.max_words_preset
+                del st.session_state.max_words_preset
+        # Generate visualization button
+        if st.button("📊 Generate Enhanced Visualization", type="primary", key="generate_viz"):
+            return {
+                'frequency_column': frequency_column,
+                'bin_size': bin_size,
+                'log_transform': log_transform,
+                'max_words_to_retain': max_words
+            }
+        return None
+    @staticmethod
+    def render_enhanced_rank_based_analysis(analyzer: FrequencyAnalyzer, viz_config: Dict):
+        """
+        Render enhanced rank-based analysis with improved sample words display.
+        Args:
+            analyzer: FrequencyAnalyzer instance with loaded data
+            viz_config: Visualization configuration
+        """
+        st.subheader("📊 Enhanced Rank-Based Frequency Analysis")
+        frequency_column = viz_config['frequency_column']
+        bin_size = viz_config['bin_size']
+        log_transform = viz_config['log_transform']
+        max_words_to_retain = viz_config.get('max_words_to_retain')
+        try:
+            # Calculate statistics
+            stats = analyzer.calculate_statistics(frequency_column)
+            # Display basic statistics with word limit info
+            col1, col2, col3, col4 = st.columns(4)
+            with col1:
+                words_analyzed = max_words_to_retain if max_words_to_retain and max_words_to_retain < stats['count'] else stats['count']
+                st.metric("Words Analyzed", f"{words_analyzed:,}")
+            with col2:
+                st.metric("Mean Frequency", f"{stats['mean']:.2f}")
+            with col3:
+                st.metric("Median Frequency", f"{stats['median']:.2f}")
+            with col4:
+                st.metric("Std Deviation", f"{stats['std']:.2f}")
+            # Show word limit info if applied
+            if max_words_to_retain and max_words_to_retain < stats['count']:
+                st.info(f"📊 Analysis limited to top {max_words_to_retain:,} most frequent words (out of {stats['count']:,} total)")
+            # Create rank-based visualization with enhanced parameters
+            result = analyzer.create_rank_based_visualization_flexible(
+                column=frequency_column,
+                bin_size=bin_size,
+                log_transform=log_transform,
+                max_words_to_retain=max_words_to_retain
+            )
+            # Create the main visualization
+            fig = go.Figure()
+            fig.add_trace(go.Bar(
+                x=result['group_centers'],
+                y=result['avg_frequencies'],
+                name=f"Avg {frequency_column}",
+                marker_color='steelblue',
+                hovertemplate=(
+                    f"<b>Group %{{x}}</b><br>"
+                    f"Avg {'Log₁₀ ' if log_transform else ''}{frequency_column}: %{{y:.3f}}<br>"
+                    "<extra></extra>"
+                )
+            ))
+            fig.update_layout(
+                title=result.get('title_suffix', f"Enhanced Rank-Based Analysis - {frequency_column}"),
+                xaxis_title=result.get('x_label', f"Rank Groups (bin size: {bin_size})"),
+                yaxis_title=result.get('y_label', f"{'Log₁₀ ' if log_transform else ''}Average {frequency_column}"),
+                showlegend=False,
+                height=500
+            )
+            st.plotly_chart(fig, use_container_width=True)
+            # Enhanced sample words display (up to 20 bins with 5 random samples each)
+            st.write("### 🎯 Sample Words by Rank Group (5 Random Samples)")
+            sample_words = result.get('sample_words', {})
+            if sample_words:
+                # Display up to 20 groups in a more organized layout
+                num_groups = min(20, len(sample_words))
+                if num_groups > 0:
+                    st.write(f"Showing sample words from top {num_groups} rank groups:")
+                    # Display in rows of 4 groups each
+                    for row_start in range(0, num_groups, 4):
+                        cols = st.columns(4)
+                        for col_idx in range(4):
+                            group_idx = row_start + col_idx
+                            if group_idx < num_groups and group_idx in sample_words:
+                                with cols[col_idx]:
+                                    group_label = result['group_labels'][group_idx]
+                                    words = sample_words[group_idx]
+                                    st.write(f"**Group {group_label}:**")
+                                    word_list = [w['word'] for w in words]
+                                    # Display as bullet points for better readability
+                                    for word in word_list:
+                                        st.write(f"• {word}")
+                                    # Add spacing between groups
+                                    st.write("")
+            else:
+                st.write("No sample words available")
+            # Show enhanced group statistics
+            with st.expander("📈 Detailed Group Statistics"):
+                group_stats = result.get('group_stats')
+                if group_stats is not None and not group_stats.empty:
+                    display_stats = group_stats.copy()
+                    # Format numeric columns
+                    numeric_cols = display_stats.select_dtypes(include=[np.number]).columns
+                    for col in numeric_cols:
+                        if 'count' not in col.lower():
+                            display_stats[col] = display_stats[col].round(2)
+                    st.dataframe(display_stats, use_container_width=True)
+                else:
+                    st.write("No detailed statistics available")
+        except Exception as e:
+            st.error(f"Error in enhanced rank-based analysis: {str(e)}")
+            with st.expander("Error Details"):
+                st.code(str(e))

web_app/handlers/frequency_handlers_memory.py ADDED Viewed

	@@ -0,0 +1,317 @@

+"""
+Memory-based Frequency Handlers for file upload without filesystem access
+"""
+import streamlit as st
+import pandas as pd
+from typing import Optional, Dict, List, Any
+from io import BytesIO, StringIO
+# Import from parent directory
+import sys
+import os
+sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
+from text_analyzer.frequency_analyzer import FrequencyAnalyzer
+from web_app.utils.memory_file_handler import MemoryFileHandler
+class FrequencyHandlersMemory:
+    """Handlers for frequency analysis interface using memory-based file handling."""
+    @staticmethod
+    def render_frequency_visualization_interface():
+        """Main interface for frequency visualization with memory-based file handling."""
+        st.subheader("📊 Word Frequency Visualization")
+        # File selection
+        uploaded_file = FrequencyHandlersMemory.render_file_selection_section()
+        if uploaded_file:
+            # Process file using memory-based approach
+            FrequencyHandlersMemory.process_uploaded_file_memory(uploaded_file)
+    @staticmethod
+    def render_file_selection_section():
+        """
+        Render file selection section with memory-based handling.
+        Returns:
+            File-like object or None
+        """
+        st.subheader("📄 Select Frequency Data")
+        # Data source selection
+        data_source = st.radio(
+            "Choose data source:",
+            ["Upload file", "Use sample files"],
+            help="Upload your own frequency data or use pre-loaded samples"
+        )
+        if data_source == "Upload file":
+            uploaded_file = st.file_uploader(
+                "Choose a frequency data file",
+                type=['tsv', 'csv', 'txt'],
+                help="Upload a TSV or CSV file with frequency data. Maximum size: 300MB",
+                accept_multiple_files=False
+            )
+            if uploaded_file and uploaded_file.size > 300 * 1024 * 1024:
+                st.error(f"File too large ({uploaded_file.size / 1024 / 1024:.1f} MB). Maximum allowed: 300MB")
+                return None
+            return uploaded_file
+        else:
+            # Sample files selection (existing code)
+            return FrequencyHandlersMemory.handle_sample_files()
+    @staticmethod
+    def process_uploaded_file_memory(uploaded_file):
+        """Process uploaded file using memory-based approach."""
+        # Initialize session state
+        if 'analyzer' not in st.session_state:
+            st.session_state.analyzer = None
+        if 'format_info' not in st.session_state:
+            st.session_state.format_info = None
+        if 'file_content' not in st.session_state:
+            st.session_state.file_content = None
+        # Check if this is a new file
+        current_file_name = uploaded_file.name
+        if st.session_state.get('last_file_name') != current_file_name:
+            st.session_state.last_file_name = current_file_name
+            st.session_state.analyzer = None
+            st.session_state.format_info = None
+            try:
+                # Read file content directly into memory
+                st.info("📖 Reading file content...")
+                uploaded_file.seek(0)
+                content = uploaded_file.read()
+                # Store in session state
+                st.session_state.file_content = content
+                st.success(f"✅ File '{current_file_name}' ({len(content):,} bytes) loaded successfully")
+            except Exception as e:
+                st.error(f"❌ Failed to read file: {str(e)}")
+                return
+        # Process the file content
+        if st.session_state.file_content:
+            try:
+                # Initialize analyzer if needed
+                if st.session_state.analyzer is None:
+                    st.session_state.analyzer = FrequencyAnalyzer(file_size_limit_mb=300)
+                    st.session_state.format_info = st.session_state.analyzer.detect_file_format(
+                        st.session_state.file_content
+                    )
+                    # Show format detection results
+                    st.success(f"✅ File format detected: {st.session_state.format_info['separator']}-separated, "
+                              f"{st.session_state.format_info['line_count']} lines")
+                # Parse the data
+                if 'data' not in st.session_state or st.session_state.data is None:
+                    with st.spinner("Parsing frequency data..."):
+                        # Create a file-like object from the content
+                        file_obj = BytesIO(st.session_state.file_content)
+                        df = pd.read_csv(
+                            file_obj,
+                            delimiter=st.session_state.format_info['separator'],
+                            encoding='utf-8'
+                        )
+                        st.session_state.data = df
+                        st.session_state.analyzer.df = df
+                # Continue with visualization
+                if st.session_state.data is not None:
+                    # Data preview
+                    with st.expander("📋 Data Preview", expanded=True):
+                        st.dataframe(st.session_state.data.head(20))
+                        st.caption(f"Showing first 20 of {len(st.session_state.data):,} entries")
+                    # Column configuration
+                    st.session_state.column_config = FrequencyHandlersMemory.render_column_configuration(
+                        st.session_state.analyzer
+                    )
+                    if st.session_state.column_config:
+                        # Visualization controls
+                        viz_config = FrequencyHandlersMemory.render_visualization_controls(
+                            st.session_state.analyzer,
+                            st.session_state.column_config
+                        )
+                        if viz_config:
+                            # Generate visualization
+                            FrequencyHandlersMemory.render_visualization(
+                                st.session_state.analyzer,
+                                viz_config
+                            )
+            except Exception as e:
+                st.error(f"Error processing file: {str(e)}")
+                with st.expander("Error Details"):
+                    st.code(str(e))
+    @staticmethod
+    def handle_sample_files():
+        """Handle sample file selection (existing implementation)."""
+        sample_files = {
+            "word_freq.txt": "data/word_freq.txt",
+            "COCA_5000.txt": "data/COCA_5000.txt",
+            "jpn_word_freq.txt": "data/jpn_word_freq.txt"
+        }
+        selected_sample = st.selectbox(
+            "Choose a sample file:",
+            options=list(sample_files.keys()),
+            help="Pre-loaded frequency data files for testing"
+        )
+        if st.button("Load Sample File", type="primary"):
+            sample_path = sample_files[selected_sample]
+            if os.path.exists(sample_path):
+                try:
+                    with open(sample_path, 'rb') as f:
+                        content = f.read()
+                    # Create a mock uploaded file object
+                    from io import BytesIO
+                    mock_file = BytesIO(content)
+                    mock_file.name = selected_sample
+                    mock_file.size = len(content)
+                    return mock_file
+                except Exception as e:
+                    st.error(f"Error loading sample file: {str(e)}")
+        return None
+    @staticmethod
+    def render_column_configuration(analyzer):
+        """Render column configuration section."""
+        st.subheader("⚙️ Column Configuration")
+        detected_cols = analyzer.detected_columns
+        col1, col2 = st.columns(2)
+        with col1:
+            word_col = st.selectbox(
+                "Word/Token Column",
+                options=detected_cols.get('word_columns', []),
+                help="Column containing words or tokens"
+            )
+        with col2:
+            freq_col = st.selectbox(
+                "Frequency Column",
+                options=detected_cols.get('frequency_columns', []),
+                help="Column containing frequency counts"
+            )
+        if word_col and freq_col:
+            return {'word_column': word_col, 'frequency_column': freq_col}
+        return None
+    @staticmethod
+    def render_visualization_controls(analyzer, column_config):
+        """Render visualization controls."""
+        st.subheader("📊 Visualization Settings")
+        col1, col2, col3 = st.columns(3)
+        with col1:
+            chart_type = st.selectbox(
+                "Chart Type",
+                ["Bar Chart", "Line Chart", "Area Chart"],
+                help="Select visualization type"
+            )
+        with col2:
+            top_n = st.slider(
+                "Number of Words",
+                min_value=10,
+                max_value=100,
+                value=30,
+                step=5,
+                help="Number of top words to display"
+            )
+        with col3:
+            scale = st.selectbox(
+                "Y-Axis Scale",
+                ["Linear", "Logarithmic"],
+                help="Scale for frequency axis"
+            )
+        return {
+            'chart_type': chart_type,
+            'top_n': top_n,
+            'scale': scale,
+            'word_column': column_config['word_column'],
+            'frequency_column': column_config['frequency_column']
+        }
+    @staticmethod
+    def render_visualization(analyzer, viz_config):
+        """Render the actual visualization."""
+        import plotly.express as px
+        import plotly.graph_objects as go
+        # Get top N words
+        df = analyzer.df.nlargest(viz_config['top_n'], viz_config['frequency_column'])
+        # Create figure based on chart type
+        if viz_config['chart_type'] == "Bar Chart":
+            fig = px.bar(
+                df,
+                x=viz_config['word_column'],
+                y=viz_config['frequency_column'],
+                title=f"Top {viz_config['top_n']} Most Frequent Words"
+            )
+        elif viz_config['chart_type'] == "Line Chart":
+            fig = px.line(
+                df,
+                x=viz_config['word_column'],
+                y=viz_config['frequency_column'],
+                title=f"Top {viz_config['top_n']} Most Frequent Words",
+                markers=True
+            )
+        else:  # Area Chart
+            fig = px.area(
+                df,
+                x=viz_config['word_column'],
+                y=viz_config['frequency_column'],
+                title=f"Top {viz_config['top_n']} Most Frequent Words"
+            )
+        # Apply scale
+        if viz_config['scale'] == "Logarithmic":
+            fig.update_yaxis(type="log")
+        # Update layout
+        fig.update_layout(
+            xaxis_title="Words",
+            yaxis_title="Frequency",
+            height=600,
+            showlegend=False
+        )
+        # Display
+        st.plotly_chart(fig, use_container_width=True)
+        # Summary statistics
+        with st.expander("📊 Summary Statistics"):
+            col1, col2, col3 = st.columns(3)
+            with col1:
+                st.metric("Total Words", f"{len(analyzer.df):,}")
+            with col2:
+                st.metric("Total Frequency", f"{analyzer.df[viz_config['frequency_column']].sum():,}")
+            with col3:
+                st.metric("Average Frequency", f"{analyzer.df[viz_config['frequency_column']].mean():.2f}")

web_app/handlers/frequency_handlers_updated.py ADDED Viewed

	@@ -0,0 +1,694 @@

+"""
+Frequency Analysis Handlers for Streamlit Interface
+This module provides Streamlit interface handlers for word frequency visualization,
+including file upload, visualization controls, and results display.
+Supports flexible column mapping for diverse frequency data formats.
+Updated to use MemoryFileHandler to avoid 403 errors on restricted environments.
+"""
+import streamlit as st
+import pandas as pd
+import plotly.graph_objects as go
+import plotly.express as px
+import numpy as np
+from typing import Dict, List, Optional
+import sys
+import os
+from pathlib import Path
+from io import StringIO, BytesIO
+# Add parent directory to path for imports
+sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
+from text_analyzer.frequency_analyzer import FrequencyAnalyzer
+from web_app.utils import MemoryFileHandler
+class FrequencyHandlers:
+    """
+    Streamlit interface handlers for frequency analysis functionality.
+    """
+    @staticmethod
+    def render_frequency_visualization_interface():
+        """
+        Main interface for frequency visualization analysis.
+        Manages state across multiple interactions.
+        """
+        st.subheader("📊 Word Frequency Visualization")
+        # Initialize session state
+        if 'analyzer' not in st.session_state:
+            st.session_state.analyzer = None
+        if 'format_info' not in st.session_state:
+            st.session_state.format_info = None
+        if 'uploaded_file_content' not in st.session_state:
+            st.session_state.uploaded_file_content = None
+        if 'column_config' not in st.session_state:
+            st.session_state.column_config = None
+        # File selection
+        uploaded_file = FrequencyHandlers.render_file_selection_section()
+        if uploaded_file:
+            # Track file changes
+            current_file_name = uploaded_file.name if hasattr(uploaded_file, 'name') else 'sample_file'
+            if st.session_state.get('last_file_name') != current_file_name:
+                st.session_state.last_file_name = current_file_name
+                st.session_state.analyzer = None
+                st.session_state.format_info = None
+                try:
+                    # Check file size
+                    if hasattr(uploaded_file, 'size') and uploaded_file.size > 300 * 1024 * 1024:
+                        st.error(f"File too large ({uploaded_file.size / 1024 / 1024:.1f} MB). Maximum allowed: 300MB")
+                        return
+                    # Process file using memory-based approach
+                    content = MemoryFileHandler.process_uploaded_file(uploaded_file, as_text=False)
+                    if content:
+                        st.session_state.uploaded_file_content = content
+                        st.success(f"✅ File '{current_file_name}' ({len(content):,} bytes) uploaded successfully")
+                    else:
+                        st.error("Failed to read uploaded file. Please try again.")
+                        return
+                except Exception as e:
+                    st.error(f"❌ Failed to read uploaded file: {str(e)}")
+                    if "403" in str(e) or "Forbidden" in str(e):
+                        st.error("**Upload Error**: File upload was blocked. This is a known issue on Hugging Face Spaces. "
+                                "Please try using the sample files option or deploy locally.")
+                    return
+            try:
+                # Initialize analyzer and process file (only if needed)
+                if st.session_state.analyzer is None or st.session_state.format_info is None:
+                    st.session_state.analyzer = FrequencyAnalyzer(file_size_limit_mb=300)
+                    # Use the content we already read
+                    st.session_state.format_info = st.session_state.analyzer.detect_file_format(st.session_state.uploaded_file_content)
+                    # Show format detection results
+                    st.success(f"✅ File format detected: {st.session_state.format_info['separator']}-separated, "
+                              f"{st.session_state.format_info['line_count']} lines")
+                # Parse the data if not already done
+                if st.session_state.analyzer.df is None:
+                    with st.spinner("Parsing frequency data..."):
+                        try:
+                            # Create file-like object from content
+                            file_obj = BytesIO(st.session_state.uploaded_file_content)
+                            st.session_state.analyzer.read_frequency_data_from_content(file_obj)
+                            if st.session_state.analyzer.df is None or st.session_state.analyzer.df.empty:
+                                st.error("No data could be parsed from the file. Please check the file format.")
+                                return
+                        except Exception as e:
+                            st.error(f"Error parsing file: {str(e)}")
+                            return
+                # Display results
+                with st.expander("📋 Data Preview", expanded=True):
+                    FrequencyHandlers.render_data_preview(
+                        st.session_state.analyzer.df.head(20),
+                        st.session_state.analyzer.detected_columns
+                    )
+                # Column configuration - always allow user to change
+                st.session_state.column_config = FrequencyHandlers.render_enhanced_column_configuration(
+                    st.session_state.analyzer.detected_columns,
+                    st.session_state.analyzer.df
+                )
+                if st.session_state.column_config:
+                    # Set the analyzer's columns based on user selection
+                    st.session_state.analyzer.word_column = st.session_state.column_config['word_column']
+                    st.session_state.analyzer.frequency_column = st.session_state.column_config['frequency_column']
+                    # Visualization controls
+                    viz_config = FrequencyHandlers.render_enhanced_visualization_controls(st.session_state.analyzer, st.session_state.column_config)
+                    if viz_config:
+                        # Generate analysis
+                        FrequencyHandlers.render_enhanced_rank_based_analysis(st.session_state.analyzer, viz_config)
+            except Exception as e:
+                st.error(f"Error processing file: {str(e)}")
+                # Provide specific error guidance
+                if "403" in str(e) or "Forbidden" in str(e):
+                    st.error("**HTTP 403 Error**: File upload was blocked by the server.")
+                    st.info("This is a known limitation on Hugging Face Spaces. Please use the sample files option or deploy the app locally for full functionality.")
+                elif "timeout" in str(e).lower():
+                    st.error("**Timeout Error**: File processing took too long")
+                    st.info("Try uploading a smaller file or check your internet connection")
+                elif "memory" in str(e).lower() or "RAM" in str(e).upper():
+                    st.error("**Memory Error**: Not enough memory to process this file")
+                    st.info("Try uploading a smaller file")
+                else:
+                    with st.expander("Error Details"):
+                        st.code(str(e))
+            # Cleanup session for debugging
+            if st.sidebar.button("🔄 Reset Analysis", help="Clear all cached data and start fresh"):
+                for key in ['analyzer', 'format_info', 'uploaded_file_content', 'column_config', 'last_file_name']:
+                    if key in st.session_state:
+                        del st.session_state[key]
+                st.experimental_rerun()
+    @staticmethod
+    def render_file_selection_section():
+        """
+        Render file selection section.
+        Returns:
+            File-like object or None
+        """
+        st.subheader("📄 Select Frequency Data")
+        # Data source selection
+        data_source = st.radio(
+            "Choose data source:",
+            ["Upload file", "Use sample files"],
+            help="Note: File uploads may experience issues on Hugging Face Spaces. Use sample files as a reliable alternative."
+        )
+        if data_source == "Upload file":
+            uploaded_file = st.file_uploader(
+                "Choose a frequency data file",
+                type=['tsv', 'csv', 'txt'],
+                help="Upload a TSV or CSV file with frequency data. Supports flexible column mapping.\n⚠️ If upload fails, try using sample files instead.",
+                accept_multiple_files=False
+            )
+        else:
+            # Sample files selection
+            sample_files = {
+                "word_freq.txt": "data/word_freq.txt",
+                "COCA_5000.txt": "data/COCA_5000.txt",
+                "jpn_word_freq.txt": "data/jpn_word_freq.txt"
+            }
+            selected_sample = st.selectbox(
+                "Choose a sample file:",
+                options=list(sample_files.keys()),
+                help="Pre-loaded frequency data files for testing"
+            )
+            if st.button("Load Sample File", type="primary"):
+                sample_path = sample_files[selected_sample]
+                if os.path.exists(sample_path):
+                    try:
+                        # Create a file-like object from sample file
+                        from io import BytesIO
+                        with open(sample_path, 'rb') as f:
+                            content = f.read()
+                        # Create BytesIO object that mimics uploaded file
+                        uploaded_file = BytesIO(content)
+                        uploaded_file.name = selected_sample
+                        uploaded_file.type = 'text/tab-separated-values' if selected_sample.endswith('.txt') else 'text/csv'
+                        uploaded_file.size = len(content)
+                        # Store in session state to persist across reruns
+                        st.session_state.sample_file = uploaded_file
+                        st.session_state.sample_file_name = selected_sample
+                        st.success(f"Loaded sample file: {selected_sample}")
+                    except Exception as e:
+                        st.error(f"Error loading sample file: {str(e)}")
+                        uploaded_file = None
+                else:
+                    st.error(f"Sample file not found: {sample_path}")
+                    uploaded_file = None
+            else:
+                # Check if sample file was previously loaded
+                uploaded_file = st.session_state.get('sample_file', None)
+                if uploaded_file and 'sample_file_name' in st.session_state:
+                    st.info(f"Using loaded sample file: {st.session_state.sample_file_name}")
+        if uploaded_file is None and data_source == "Upload file":
+            # Show example formats
+            st.info("**Supported formats:**")
+            col1, col2 = st.columns(2)
+            with col1:
+                st.write("**Traditional format:**")
+                example_traditional = """Type\tFreq\tRank
+the\t69868\t1
+of\t36426\t2
+and\t28891\t3"""
+                st.code(example_traditional, language="text")
+            with col2:
+                st.write("**Rich corpus format:**")
+                example_rich = """rank\tlForm\tlemma\tpos\tfrequency\tpmw
+1\tノ\tの\t助詞\t5061558\t48383.9
+2\tニ\tに\t助詞\t3576558\t34188.7
+3\tテ\tて\t助詞\t3493117\t33391.0"""
+                st.code(example_rich, language="text")
+            st.write("**File size limit:** 300MB")
+        return uploaded_file
+    @staticmethod
+    def render_data_preview(df: pd.DataFrame, detected_cols: Dict[str, List[str]]):
+        """
+        Render enhanced data preview section with column detection results.
+        Args:
+            df: Preview DataFrame
+            detected_cols: Detected column categorization
+        """
+        st.write("**File Preview:**")
+        st.dataframe(
+            df,
+            use_container_width=True,
+            hide_index=True,
+            height=400
+        )
+        st.caption(f"Showing first {len(df)} of total entries")
+        # Show detected columns
+        with st.expander("🔍 Detected Columns", expanded=False):
+            col1, col2, col3 = st.columns(3)
+            with col1:
+                st.write("**Word Columns:**")
+                for col in detected_cols.get('word_columns', []):
+                    st.write(f"• {col}")
+            with col2:
+                st.write("**Frequency Columns:**")
+                for col in detected_cols.get('frequency_columns', []):
+                    st.write(f"• {col}")
+            with col3:
+                st.write("**Other Columns:**")
+                for col in detected_cols.get('other_columns', []):
+                    st.write(f"• {col}")
+    @staticmethod
+    def render_enhanced_column_configuration(detected_cols: Dict[str, List[str]], df: pd.DataFrame):
+        """
+        Render enhanced column configuration with smart defaults.
+        Args:
+            detected_cols: Detected column categorization
+            df: The full DataFrame
+        Returns:
+            Dictionary with column configuration or None
+        """
+        st.subheader("⚙️ Column Configuration")
+        col1, col2 = st.columns(2)
+        with col1:
+            # Word column selection with smart default
+            word_cols = detected_cols.get('word_columns', [])
+            if not word_cols:
+                word_cols = list(df.columns)
+            default_word = 0
+            # Prioritize columns with 'word', 'token', 'lemma', etc.
+            for i, col in enumerate(word_cols):
+                if any(term in col.lower() for term in ['word', 'token', 'lemma', 'type']):
+                    default_word = i
+                    break
+            word_col = st.selectbox(
+                "Word/Token Column",
+                options=word_cols,
+                index=default_word,
+                help="Select the column containing words or tokens"
+            )
+        with col2:
+            # Frequency column selection with smart default
+            freq_cols = detected_cols.get('frequency_columns', [])
+            if not freq_cols:
+                # Try to identify numeric columns
+                freq_cols = [col for col in df.columns if pd.api.types.is_numeric_dtype(df[col])]
+            if not freq_cols:
+                freq_cols = list(df.columns)
+            default_freq = 0
+            # Prioritize columns with 'freq', 'count', etc.
+            for i, col in enumerate(freq_cols):
+                if any(term in col.lower() for term in ['freq', 'count', 'occurrences']):
+                    default_freq = i
+                    break
+            freq_col = st.selectbox(
+                "Frequency Column",
+                options=freq_cols,
+                index=default_freq,
+                help="Select the column containing frequency counts"
+            )
+        if word_col and freq_col:
+            # Validate configuration
+            if word_col == freq_col:
+                st.error("Word and frequency columns cannot be the same!")
+                return None
+            # Show sample data with selected columns
+            st.write("**Preview with selected columns:**")
+            preview_df = df[[word_col, freq_col]].head(5)
+            st.dataframe(preview_df, use_container_width=True, hide_index=True)
+            return {
+                'word_column': word_col,
+                'frequency_column': freq_col
+            }
+        return None
+    @staticmethod
+    def render_enhanced_visualization_controls(analyzer: FrequencyAnalyzer, column_config: Dict[str, str]):
+        """
+        Render enhanced visualization controls.
+        Args:
+            analyzer: FrequencyAnalyzer instance
+            column_config: Column configuration
+        Returns:
+            Dictionary with visualization configuration or None
+        """
+        st.subheader("📊 Visualization Settings")
+        # Get data statistics
+        total_words = len(analyzer.df)
+        max_freq = analyzer.df[column_config['frequency_column']].max()
+        min_freq = analyzer.df[column_config['frequency_column']].min()
+        col1, col2, col3 = st.columns(3)
+        with col1:
+            chart_type = st.selectbox(
+                "Chart Type",
+                ["Bar Chart", "Line Chart", "Area Chart", "Scatter Plot"],
+                help="Select visualization type"
+            )
+        with col2:
+            # Dynamic range based on data
+            max_words = min(total_words, 1000)
+            default_n = min(50, max_words)
+            top_n = st.slider(
+                "Number of Words",
+                min_value=10,
+                max_value=max_words,
+                value=default_n,
+                step=10,
+                help=f"Display top N words (total: {total_words:,})"
+            )
+        with col3:
+            scale = st.selectbox(
+                "Y-Axis Scale",
+                ["Linear", "Logarithmic"],
+                help="Logarithmic scale is useful for data with large frequency variations"
+            )
+        # Advanced options
+        with st.expander("🎨 Advanced Options", expanded=False):
+            col1, col2 = st.columns(2)
+            with col1:
+                color_scheme = st.selectbox(
+                    "Color Scheme",
+                    ["Viridis", "Blues", "Reds", "Turbo", "Rainbow"],
+                    help="Select color scheme for visualization"
+                )
+                show_values = st.checkbox(
+                    "Show Values on Chart",
+                    value=False,
+                    help="Display frequency values on the chart"
+                )
+            with col2:
+                orientation = st.radio(
+                    "Orientation",
+                    ["Vertical", "Horizontal"],
+                    help="Chart orientation"
+                )
+                show_grid = st.checkbox(
+                    "Show Grid",
+                    value=True,
+                    help="Display grid lines"
+                )
+        # Summary statistics
+        st.write("**Data Statistics:**")
+        stat_col1, stat_col2, stat_col3, stat_col4 = st.columns(4)
+        with stat_col1:
+            st.metric("Total Words", f"{total_words:,}")
+        with stat_col2:
+            st.metric("Max Frequency", f"{max_freq:,}")
+        with stat_col3:
+            st.metric("Min Frequency", f"{min_freq:,}")
+        with stat_col4:
+            mean_freq = analyzer.df[column_config['frequency_column']].mean()
+            st.metric("Mean Frequency", f"{mean_freq:,.1f}")
+        return {
+            'chart_type': chart_type,
+            'top_n': top_n,
+            'scale': scale,
+            'color_scheme': color_scheme.lower(),
+            'show_values': show_values,
+            'orientation': orientation.lower(),
+            'show_grid': show_grid,
+            'word_column': column_config['word_column'],
+            'frequency_column': column_config['frequency_column']
+        }
+    @staticmethod
+    def render_enhanced_rank_based_analysis(analyzer: FrequencyAnalyzer, viz_config: dict):
+        """
+        Render enhanced rank-based frequency analysis.
+        Args:
+            analyzer: FrequencyAnalyzer instance with loaded data
+            viz_config: Visualization configuration
+        """
+        st.subheader("📈 Frequency Analysis Results")
+        # Get top N words
+        top_n = viz_config['top_n']
+        word_col = viz_config['word_column']
+        freq_col = viz_config['frequency_column']
+        # Sort and get top N
+        df_sorted = analyzer.df.sort_values(by=freq_col, ascending=False).head(top_n).copy()
+        # Add rank column
+        df_sorted['rank'] = range(1, len(df_sorted) + 1)
+        # Create visualization
+        if viz_config['orientation'] == 'horizontal':
+            x_col, y_col = freq_col, word_col
+            # Reverse order for horizontal bar chart
+            df_sorted = df_sorted.iloc[::-1]
+        else:
+            x_col, y_col = word_col, freq_col
+        # Create figure based on chart type
+        if viz_config['chart_type'] == "Bar Chart":
+            fig = px.bar(
+                df_sorted,
+                x=x_col,
+                y=y_col,
+                color=freq_col,
+                color_continuous_scale=viz_config['color_scheme'],
+                title=f"Top {top_n} Most Frequent Words",
+                labels={freq_col: "Frequency", word_col: "Words"},
+                orientation='h' if viz_config['orientation'] == 'horizontal' else 'v'
+            )
+        elif viz_config['chart_type'] == "Line Chart":
+            fig = px.line(
+                df_sorted,
+                x=word_col,
+                y=freq_col,
+                markers=True,
+                title=f"Top {top_n} Most Frequent Words",
+                labels={freq_col: "Frequency", word_col: "Words"}
+            )
+            fig.update_traces(line_color=px.colors.qualitative.Plotly[0], line_width=3)
+        elif viz_config['chart_type'] == "Area Chart":
+            fig = px.area(
+                df_sorted,
+                x=word_col,
+                y=freq_col,
+                title=f"Top {top_n} Most Frequent Words",
+                labels={freq_col: "Frequency", word_col: "Words"}
+            )
+        else:  # Scatter Plot
+            fig = px.scatter(
+                df_sorted,
+                x='rank',
+                y=freq_col,
+                text=word_col,
+                size=freq_col,
+                color=freq_col,
+                color_continuous_scale=viz_config['color_scheme'],
+                title=f"Rank-Frequency Distribution (Top {top_n})",
+                labels={freq_col: "Frequency", 'rank': "Rank"}
+            )
+            fig.update_traces(textposition='top center')
+        # Apply logarithmic scale if selected
+        if viz_config['scale'] == "Logarithmic":
+            if viz_config['orientation'] == 'horizontal':
+                fig.update_xaxes(type="log")
+            else:
+                fig.update_yaxes(type="log")
+        # Show values on chart if selected
+        if viz_config['show_values'] and viz_config['chart_type'] == "Bar Chart":
+            fig.update_traces(texttemplate='%{value:,.0f}', textposition='outside')
+        # Update layout
+        fig.update_layout(
+            showlegend=False,
+            height=600,
+            xaxis_tickangle=-45 if viz_config['orientation'] == 'vertical' else 0,
+            plot_bgcolor='white' if viz_config['show_grid'] else 'rgba(0,0,0,0)',
+            xaxis_showgrid=viz_config['show_grid'],
+            yaxis_showgrid=viz_config['show_grid']
+        )
+        # Display chart
+        st.plotly_chart(fig, use_container_width=True)
+        # Additional analyses
+        tab1, tab2, tab3 = st.tabs(["📊 Statistics", "📋 Data Table", "📈 Distribution Analysis"])
+        with tab1:
+            FrequencyHandlers.render_statistics_summary(df_sorted, freq_col, word_col)
+        with tab2:
+            FrequencyHandlers.render_data_table(df_sorted, word_col, freq_col)
+        with tab3:
+            FrequencyHandlers.render_distribution_analysis(analyzer, freq_col, viz_config)
+    @staticmethod
+    def render_statistics_summary(df: pd.DataFrame, freq_col: str, word_col: str):
+        """Render statistical summary of the frequency data."""
+        col1, col2, col3 = st.columns(3)
+        with col1:
+            st.write("**Frequency Statistics:**")
+            st.write(f"• Total frequency: {df[freq_col].sum():,}")
+            st.write(f"• Mean frequency: {df[freq_col].mean():,.1f}")
+            st.write(f"• Median frequency: {df[freq_col].median():,.1f}")
+        with col2:
+            st.write("**Coverage Analysis:**")
+            total_freq = df[freq_col].sum()
+            cumsum = df[freq_col].cumsum()
+            coverage_50 = len(cumsum[cumsum <= total_freq * 0.5])
+            coverage_80 = len(cumsum[cumsum <= total_freq * 0.8])
+            st.write(f"• Words for 50% coverage: {coverage_50}")
+            st.write(f"• Words for 80% coverage: {coverage_80}")
+            st.write(f"• Top 10 words: {(df[freq_col].head(10).sum() / total_freq * 100):.1f}%")
+        with col3:
+            st.write("**Diversity Metrics:**")
+            st.write(f"• Unique words shown: {len(df)}")
+            st.write(f"• Hapax legomena: {len(df[df[freq_col] == 1])}")
+            st.write(f"• Type-token ratio: {len(df) / df[freq_col].sum():.4f}")
+    @staticmethod
+    def render_data_table(df: pd.DataFrame, word_col: str, freq_col: str):
+        """Render interactive data table."""
+        # Add percentage column
+        df_display = df.copy()
+        df_display['percentage'] = (df_display[freq_col] / df_display[freq_col].sum() * 100).round(2)
+        df_display['cumulative_%'] = (df_display[freq_col].cumsum() / df_display[freq_col].sum() * 100).round(2)
+        # Display options
+        col1, col2 = st.columns([1, 3])
+        with col1:
+            show_cols = st.multiselect(
+                "Columns to show:",
+                options=df_display.columns.tolist(),
+                default=['rank', word_col, freq_col, 'percentage', 'cumulative_%']
+            )
+        # Display table
+        st.dataframe(
+            df_display[show_cols],
+            use_container_width=True,
+            hide_index=True,
+            height=400
+        )
+        # Download button
+        csv = df_display[show_cols].to_csv(index=False)
+        st.download_button(
+            label="📥 Download as CSV",
+            data=csv,
+            file_name=f"frequency_analysis_top_{len(df)}.csv",
+            mime="text/csv"
+        )
+    @staticmethod
+    def render_distribution_analysis(analyzer: FrequencyAnalyzer, freq_col: str, viz_config: dict):
+        """Render frequency distribution analysis."""
+        # Zipf's law analysis
+        st.write("**Zipf's Law Analysis:**")
+        df_full = analyzer.df.sort_values(by=freq_col, ascending=False).copy()
+        df_full['rank'] = range(1, len(df_full) + 1)
+        df_full['log_rank'] = np.log10(df_full['rank'])
+        df_full['log_freq'] = np.log10(df_full[freq_col])
+        # Create Zipf plot
+        fig_zipf = px.scatter(
+            df_full.head(min(1000, len(df_full))),
+            x='log_rank',
+            y='log_freq',
+            title="Zipf's Law Distribution (Log-Log Plot)",
+            labels={'log_rank': 'log₁₀(Rank)', 'log_freq': 'log₁₀(Frequency)'},
+            trendline="ols"
+        )
+        fig_zipf.update_layout(height=400)
+        st.plotly_chart(fig_zipf, use_container_width=True)
+        # Frequency bands analysis
+        st.write("**Frequency Bands:**")
+        bands = pd.cut(df_full[freq_col],
+                      bins=[0, 1, 10, 100, 1000, 10000, float('inf')],
+                      labels=['1', '2-10', '11-100', '101-1000', '1001-10000', '10000+'])
+        band_counts = bands.value_counts().sort_index()
+        col1, col2 = st.columns(2)
+        with col1:
+            st.write("Words per frequency band:")
+            for band, count in band_counts.items():
+                st.write(f"• {band}: {count:,} words")
+        with col2:
+            # Pie chart of frequency bands
+            fig_pie = px.pie(
+                values=band_counts.values,
+                names=band_counts.index,
+                title="Distribution of Words by Frequency Band"
+            )
+            fig_pie.update_layout(height=300)
+            st.plotly_chart(fig_pie, use_container_width=True)

web_app/utils/__init__.py CHANGED Viewed

@@ -1,5 +1,6 @@
 """Web app utilities package."""
 from .file_upload_handler import FileUploadHandler
-__all__ = ['FileUploadHandler']

 """Web app utilities package."""
 from .file_upload_handler import FileUploadHandler
+from .memory_file_handler import MemoryFileHandler
+__all__ = ['FileUploadHandler', 'MemoryFileHandler']

web_app/utils/memory_file_handler.py ADDED Viewed

	@@ -0,0 +1,170 @@

+"""
+Memory-based File Handler for Hugging Face Spaces Compatibility
+This module provides an alternative to disk-based file handling by keeping
+files in memory, avoiding 403 errors from filesystem restrictions.
+"""
+import streamlit as st
+from io import BytesIO, StringIO
+from typing import Optional, Union, Dict, Any
+import pandas as pd
+import zipfile
+class MemoryFileHandler:
+    """Handle files entirely in memory to avoid filesystem restrictions."""
+    @staticmethod
+    def process_uploaded_file(uploaded_file, as_text: bool = False, encoding: str = 'utf-8') -> Optional[Union[bytes, str]]:
+        """
+        Process uploaded file directly from Streamlit's UploadedFile object.
+        Args:
+            uploaded_file: Streamlit UploadedFile object
+            as_text: Whether to return content as decoded text
+            encoding: Text encoding to use if as_text is True
+        Returns:
+            File content as bytes or string, or None if error
+        """
+        try:
+            # Reset file pointer to beginning
+            uploaded_file.seek(0)
+            # Read content directly from uploaded file
+            if as_text:
+                # For text mode, decode the bytes
+                content = uploaded_file.read()
+                if isinstance(content, bytes):
+                    return content.decode(encoding)
+                return content
+            else:
+                # For binary mode, return raw bytes
+                return uploaded_file.read()
+        except Exception as e:
+            st.error(f"Failed to read file: {str(e)}")
+            return None
+    @staticmethod
+    def process_csv_tsv_file(uploaded_file, delimiter: Optional[str] = None) -> Optional[pd.DataFrame]:
+        """
+        Process CSV/TSV file directly into pandas DataFrame.
+        Args:
+            uploaded_file: Streamlit UploadedFile object
+            delimiter: Column delimiter (auto-detected if None)
+        Returns:
+            DataFrame or None if error
+        """
+        try:
+            # Reset file pointer
+            uploaded_file.seek(0)
+            # Auto-detect delimiter if not provided
+            if delimiter is None:
+                # Read first few lines to detect delimiter
+                uploaded_file.seek(0)
+                sample = uploaded_file.read(1024).decode('utf-8', errors='ignore')
+                uploaded_file.seek(0)
+                if '\t' in sample:
+                    delimiter = '\t'
+                else:
+                    delimiter = ','
+            # Read directly into DataFrame
+            df = pd.read_csv(uploaded_file, delimiter=delimiter, encoding='utf-8')
+            return df
+        except Exception as e:
+            st.error(f"Failed to process CSV/TSV file: {str(e)}")
+            return None
+    @staticmethod
+    def handle_zip_file(uploaded_file) -> Optional[Dict[str, bytes]]:
+        """
+        Handle ZIP file uploads by extracting contents to memory.
+        Args:
+            uploaded_file: Streamlit UploadedFile object (should be a ZIP file)
+        Returns:
+            Dictionary mapping filenames to file contents, or None if error
+        """
+        try:
+            # Reset file pointer
+            uploaded_file.seek(0)
+            # Read ZIP file into memory
+            zip_bytes = BytesIO(uploaded_file.read())
+            # Extract files to memory
+            file_contents = {}
+            with zipfile.ZipFile(zip_bytes, 'r') as zip_file:
+                for filename in zip_file.namelist():
+                    if not filename.endswith('/'):  # Skip directories
+                        file_contents[filename] = zip_file.read(filename)
+            return file_contents
+        except Exception as e:
+            st.error(f"Failed to process ZIP file: {str(e)}")
+            return None
+    @staticmethod
+    def create_download_content(content: Union[str, bytes], filename: str) -> bytes:
+        """
+        Prepare content for download.
+        Args:
+            content: Content to download (string or bytes)
+            filename: Suggested filename for download
+        Returns:
+            Bytes ready for download
+        """
+        if isinstance(content, str):
+            return content.encode('utf-8')
+        return content
+    @staticmethod
+    def store_in_session(key: str, content: Any):
+        """
+        Store content in session state for persistence across reruns.
+        Args:
+            key: Session state key
+            content: Content to store
+        """
+        st.session_state[key] = content
+    @staticmethod
+    def retrieve_from_session(key: str) -> Optional[Any]:
+        """
+        Retrieve content from session state.
+        Args:
+            key: Session state key
+        Returns:
+            Stored content or None
+        """
+        return st.session_state.get(key, None)
+    @staticmethod
+    def clear_session_storage(prefix: str = ""):
+        """
+        Clear session storage.
+        Args:
+            prefix: Only clear keys starting with this prefix
+        """
+        if prefix:
+            keys_to_remove = [k for k in st.session_state.keys() if k.startswith(prefix)]
+            for key in keys_to_remove:
+                del st.session_state[key]
+        else:
+            st.session_state.clear()