Memory Reduction Guide - 20% Reduction Applied

Overview

Training configuration has been optimized to reduce system memory requirements by approximately 20% while maintaining training quality.

Original: 1288x1288
New: 1152x1152
Memory Impact: ~20% reduction in activation memory
Rationale: Image area reduced from 1,658,944 to 1,327,104 pixels (80% of original)
Note: 1152 is divisible by 64, maintaining efficiency

Original: 16 steps
New: 20 steps
Rationale: Maintains effective batch size (220=40 vs 216=32) without increasing memory
Note: Gradient accumulation doesn't increase memory, only computation time

Breakdown:

cd /workspace/soccer_cv_ball
./scripts/resume_training_low_memory.sh

cd /workspace/soccer_cv_ball
python scripts/train_ball.py \
    --config configs/resume_20_epochs_low_memory.yaml \
    --output-dir models

# Monitor system RAM
free -h

# Monitor GPU memory (if using GPU)
nvidia-smi

Further reductions possible:

Reduce resolution further: 1152 → 1024 (additional ~20% reduction)
Reduce batch size: 2 → 1 (50% reduction, but slower training)
Disable gradient accumulation: Reduces effective batch but saves some memory
Enable gradient checkpointing: Trade compute for memory (if supported)

The low memory configuration is saved at:

Setting	Original	Low Memory	Reduction
Resolution	1288	1152	20%
Multi-scale	true	false	-
Expanded scales	true	false	-
num_workers	2	1	50%
pin_memory	true	false	-
prefetch_factor	default	1	-
grad_accum_steps	16	20	- (maintains effective batch)

The checkpoint will be loaded correctly despite resolution change (RF-DETR handles this)
Training will resume from epoch 20
All other training parameters remain the same (learning rate, optimizer, etc.)
Model architecture unchanged (RF-DETR Base)