Spaces:

egumasa
/

simple-text-analyzer

Building

App Files Files Community

simple-text-analyzer / BATCH_PERFORMANCE_OPTIMIZATION.md

egumasa

gpu support

4d2898f 7 months ago

preview code

raw

history blame contribute delete

4.55 kB

Batch Processing Performance Optimization

Performance Issues Identified

1. Multiple Analysis Calls Per File (Biggest Issue)

The original implementation made 3 separate calls to analyze_text() for each file:

One for Content Words (CW)
One for Function Words (FW)
One for n-grams (without word type filter)

Each call runs the entire SpaCy pipeline (tokenization, POS tagging, dependency parsing), essentially tripling the processing time.

2. Memory Accumulation

All results stored in memory with detailed token information
No streaming or chunking capabilities
Everything stays in memory until batch completes

3. Default Model Size

Default SpaCy model is 'trf' (transformer-based), which is much slower than 'md'
Found in session_manager.py: 'model_size': 'trf'

Optimizations Implemented

Phase 1: Single-Pass Analysis (70% Performance Gain)

Changes Made:

Modified analyze_text() method to support separate_word_types parameter
- Processes both CW and FW in a single pass through the text
- Collects statistics for both word types simultaneously
- N-grams are processed in the same pass

Updated batch processing handlers to use single-pass analysis:

# OLD: 3 separate calls
for word_type in ['CW', 'FW']:
    analysis = analyzer.analyze_text(text, ...)
full_analysis = analyzer.analyze_text(text, ...)  # for n-grams

# NEW: Single optimized call
analysis = analyzer.analyze_text(
    text_content,
    selected_indices,
    separate_word_types=True  # Process CW/FW separately in same pass
)

Added optimized batch method analyze_batch_memory():
- Works directly with in-memory file contents
- Supports all new analysis parameters
- Maintains backward compatibility

Performance Recommendations

1. Use 'md' Model Instead of 'trf'

The transformer model ('trf') is significantly slower. For batch processing, consider using 'md':

3-5x faster processing
Still provides good accuracy for lexical sophistication analysis

2. Enable Smart Defaults

Smart defaults optimize which measures to compute, reducing unnecessary calculations.

3. For Very Large Batches

Consider implementing:

Chunk processing (process N files at a time)
Parallel processing using multiprocessing
Results streaming to disk instead of memory accumulation

Expected Performance Gains

With the optimizations implemented:

~70% reduction in processing time from eliminating redundant analysis calls
Additional 20-30% possible by switching from 'trf' to 'md' model
Memory usage remains similar but could be optimized further with streaming

How to Use the Optimized Version

The optimizations are transparent to users. The batch processing will automatically use the single-pass analysis when:

No specific word type filter is selected
Processing files that need both CW and FW analysis

For legacy compatibility, the old analyze_batch() method has been updated to use the optimized approach internally.

GPU Status Monitoring in Debug Mode

The web app now includes comprehensive GPU status information in debug mode. To access:

Enable "🐛 Debug Mode" in the sidebar
Expand the "GPU Status" section

Features

PyTorch/CUDA Information:

PyTorch installation and version
CUDA availability and version
Number of GPUs and their names
GPU memory usage (allocated, reserved, free)

SpaCy GPU Configuration:

SpaCy GPU enablement status
Current GPU device being used
spacy-transformers installation status

Active Model GPU Status:

Current model's device configuration
GPU optimization status (mixed precision, batch sizes)
SpaCy version information

Performance Tips:

Optimization recommendations
Common troubleshooting guidance

Benefits

This integrated GPU monitoring eliminates the need for the separate test_gpu_support.py script for most use cases. Developers can now:

Quickly verify GPU availability without running external scripts
Monitor GPU memory usage during batch processing
Confirm that models are correctly utilizing GPU acceleration
Troubleshoot performance issues more effectively

Usage Example

When processing large batches with transformer models:

Enable debug mode to monitor GPU utilization
Check that the model is using GPU (not CPU fallback)
Monitor memory usage to prevent out-of-memory errors
Adjust batch sizes based on available GPU memory