simple-text-analyzer / BATCH_PERFORMANCE_OPTIMIZATION.md
egumasa's picture
gpu support
4d2898f

Batch Processing Performance Optimization

Performance Issues Identified

1. Multiple Analysis Calls Per File (Biggest Issue)

The original implementation made 3 separate calls to analyze_text() for each file:

  • One for Content Words (CW)
  • One for Function Words (FW)
  • One for n-grams (without word type filter)

Each call runs the entire SpaCy pipeline (tokenization, POS tagging, dependency parsing), essentially tripling the processing time.

2. Memory Accumulation

  • All results stored in memory with detailed token information
  • No streaming or chunking capabilities
  • Everything stays in memory until batch completes

3. Default Model Size

  • Default SpaCy model is 'trf' (transformer-based), which is much slower than 'md'
  • Found in session_manager.py: 'model_size': 'trf'

Optimizations Implemented

Phase 1: Single-Pass Analysis (70% Performance Gain)

Changes Made:

  1. Modified analyze_text() method to support separate_word_types parameter

    • Processes both CW and FW in a single pass through the text
    • Collects statistics for both word types simultaneously
    • N-grams are processed in the same pass
  2. Updated batch processing handlers to use single-pass analysis:

    # OLD: 3 separate calls
    for word_type in ['CW', 'FW']:
        analysis = analyzer.analyze_text(text, ...)
    full_analysis = analyzer.analyze_text(text, ...)  # for n-grams
    
    # NEW: Single optimized call
    analysis = analyzer.analyze_text(
        text_content,
        selected_indices,
        separate_word_types=True  # Process CW/FW separately in same pass
    )
    
  3. Added optimized batch method analyze_batch_memory():

    • Works directly with in-memory file contents
    • Supports all new analysis parameters
    • Maintains backward compatibility

Performance Recommendations

1. Use 'md' Model Instead of 'trf'

The transformer model ('trf') is significantly slower. For batch processing, consider using 'md':

  • 3-5x faster processing
  • Still provides good accuracy for lexical sophistication analysis

2. Enable Smart Defaults

Smart defaults optimize which measures to compute, reducing unnecessary calculations.

3. For Very Large Batches

Consider implementing:

  • Chunk processing (process N files at a time)
  • Parallel processing using multiprocessing
  • Results streaming to disk instead of memory accumulation

Expected Performance Gains

With the optimizations implemented:

  • ~70% reduction in processing time from eliminating redundant analysis calls
  • Additional 20-30% possible by switching from 'trf' to 'md' model
  • Memory usage remains similar but could be optimized further with streaming

How to Use the Optimized Version

The optimizations are transparent to users. The batch processing will automatically use the single-pass analysis when:

  • No specific word type filter is selected
  • Processing files that need both CW and FW analysis

For legacy compatibility, the old analyze_batch() method has been updated to use the optimized approach internally.

GPU Status Monitoring in Debug Mode

The web app now includes comprehensive GPU status information in debug mode. To access:

  1. Enable "🐛 Debug Mode" in the sidebar
  2. Expand the "GPU Status" section

Features

PyTorch/CUDA Information:

  • PyTorch installation and version
  • CUDA availability and version
  • Number of GPUs and their names
  • GPU memory usage (allocated, reserved, free)

SpaCy GPU Configuration:

  • SpaCy GPU enablement status
  • Current GPU device being used
  • spacy-transformers installation status

Active Model GPU Status:

  • Current model's device configuration
  • GPU optimization status (mixed precision, batch sizes)
  • SpaCy version information

Performance Tips:

  • Optimization recommendations
  • Common troubleshooting guidance

Benefits

This integrated GPU monitoring eliminates the need for the separate test_gpu_support.py script for most use cases. Developers can now:

  • Quickly verify GPU availability without running external scripts
  • Monitor GPU memory usage during batch processing
  • Confirm that models are correctly utilizing GPU acceleration
  • Troubleshoot performance issues more effectively

Usage Example

When processing large batches with transformer models:

  1. Enable debug mode to monitor GPU utilization
  2. Check that the model is using GPU (not CPU fallback)
  3. Monitor memory usage to prevent out-of-memory errors
  4. Adjust batch sizes based on available GPU memory