# Batch Processing Performance Optimization

## Performance Issues Identified

### 1. **Multiple Analysis Calls Per File (Biggest Issue)**
The original implementation made 3 separate calls to `analyze_text()` for each file:
- One for Content Words (CW)
- One for Function Words (FW)
- One for n-grams (without word type filter)

Each call runs the entire SpaCy pipeline (tokenization, POS tagging, dependency parsing), essentially **tripling** the processing time.

### 2. **Memory Accumulation**
- All results stored in memory with detailed token information
- No streaming or chunking capabilities
- Everything stays in memory until batch completes

### 3. **Default Model Size**
- Default SpaCy model is 'trf' (transformer-based), which is much slower than 'md'
- Found in `session_manager.py`: `'model_size': 'trf'`

## Optimizations Implemented

### Phase 1: Single-Pass Analysis (70% Performance Gain)

**Changes Made:**

1. **Modified `analyze_text()` method** to support `separate_word_types` parameter
   - Processes both CW and FW in a single pass through the text
   - Collects statistics for both word types simultaneously
   - N-grams are processed in the same pass

2. **Updated batch processing handlers** to use single-pass analysis:
   ```python
   # OLD: 3 separate calls
   for word_type in ['CW', 'FW']:
       analysis = analyzer.analyze_text(text, ...)
   full_analysis = analyzer.analyze_text(text, ...)  # for n-grams
   
   # NEW: Single optimized call
   analysis = analyzer.analyze_text(
       text_content,
       selected_indices,
       separate_word_types=True  # Process CW/FW separately in same pass
   )
   ```

3. **Added optimized batch method** `analyze_batch_memory()`:
   - Works directly with in-memory file contents
   - Supports all new analysis parameters
   - Maintains backward compatibility

## Performance Recommendations

### 1. **Use 'md' Model Instead of 'trf'**
The transformer model ('trf') is significantly slower. For batch processing, consider using 'md':
- 3-5x faster processing
- Still provides good accuracy for lexical sophistication analysis

### 2. **Enable Smart Defaults**
Smart defaults optimize which measures to compute, reducing unnecessary calculations.

### 3. **For Very Large Batches**
Consider implementing:
- Chunk processing (process N files at a time)
- Parallel processing using multiprocessing
- Results streaming to disk instead of memory accumulation

## Expected Performance Gains

With the optimizations implemented:
- **~70% reduction** in processing time from eliminating redundant analysis calls
- **Additional 20-30%** possible by switching from 'trf' to 'md' model
- **Memory usage** remains similar but could be optimized further with streaming

## How to Use the Optimized Version

The optimizations are transparent to users. The batch processing will automatically use the single-pass analysis when:
- No specific word type filter is selected
- Processing files that need both CW and FW analysis

For legacy compatibility, the old `analyze_batch()` method has been updated to use the optimized approach internally.

## GPU Status Monitoring in Debug Mode

The web app now includes comprehensive GPU status information in debug mode. To access:

1. Enable "🐛 Debug Mode" in the sidebar
2. Expand the "GPU Status" section

### Features

**PyTorch/CUDA Information:**
- PyTorch installation and version
- CUDA availability and version
- Number of GPUs and their names
- GPU memory usage (allocated, reserved, free)

**SpaCy GPU Configuration:**
- SpaCy GPU enablement status
- Current GPU device being used
- spacy-transformers installation status

**Active Model GPU Status:**
- Current model's device configuration
- GPU optimization status (mixed precision, batch sizes)
- SpaCy version information

**Performance Tips:**
- Optimization recommendations
- Common troubleshooting guidance

### Benefits

This integrated GPU monitoring eliminates the need for the separate `test_gpu_support.py` script for most use cases. Developers can now:

- Quickly verify GPU availability without running external scripts
- Monitor GPU memory usage during batch processing
- Confirm that models are correctly utilizing GPU acceleration
- Troubleshoot performance issues more effectively

### Usage Example

When processing large batches with transformer models:
1. Enable debug mode to monitor GPU utilization
2. Check that the model is using GPU (not CPU fallback)
3. Monitor memory usage to prevent out-of-memory errors
4. Adjust batch sizes based on available GPU memory