SmolFactory

Sleeping

App Files Files Community

SmolFactory / docs /TRAINER_SELECTION_SUMMARY.md

Tonic

adds sft , quantization, better readmes

40fd629 unverified 9 months ago

preview code

raw

history blame

4.3 kB

	# Trainer Selection Implementation Summary

	## ✅ Completed Implementation

	### 1. Configuration Changes
	- ✅ Added `trainer_type` field to base `SmolLM3Config` (default: "sft")
	- ✅ Added `trainer_type` field to `SmolLM3DPOConfig` (default: "dpo")
	- ✅ Updated config file generation in `launch.sh` to include trainer_type

	### 2. Training Script Updates
	- ✅ Added `--trainer_type` argument to `src/train.py`
	- ✅ Added `--trainer-type` argument to `scripts/training/train.py`
	- ✅ Implemented trainer selection logic in `src/train.py`
	- ✅ Updated trainer instantiation to support both SFT and DPO

	### 3. Launch Script Updates
	- ✅ Added interactive trainer type selection (Step 3.5)
	- ✅ Updated configuration summary to show trainer type
	- ✅ Updated training parameters display to show trainer type
	- ✅ Updated training script call to pass trainer_type argument
	- ✅ Updated summary report to include trainer type

	### 4. Documentation and Testing
	- ✅ Created comprehensive `TRAINER_SELECTION_GUIDE.md`
	- ✅ Created test script `tests/test_trainer_selection.py`
	- ✅ All tests passing (3/3)

	## 🎯 Key Features

	### Interactive Selection
	Users can now choose between SFT and DPO during the launch process:
	```
	Step 3.5: Trainer Type Selection
	====================================

	Select the type of training to perform:
	1. SFT (Supervised Fine-tuning) - Standard instruction tuning
	2. DPO (Direct Preference Optimization) - Preference-based training
	```

	### Command Line Override
	Users can override the config's trainer type via command line:
	```bash
	python src/train.py config/train_smollm3.py --trainer_type dpo
	python scripts/training/train.py --config config/train_smollm3.py --trainer-type dpo
	```

	### Configuration Priority
	1. Command line argument (highest priority)
	2. Config file trainer_type field (medium priority)
	3. Default value "sft" (lowest priority)

	### Automatic Trainer Selection
	The system automatically selects the appropriate trainer:
	- SFT: Uses `SmolLM3Trainer` with `SFTTrainer` backend
	- DPO: Uses `SmolLM3DPOTrainer` with `DPOTrainer` backend

	## 📋 Usage Examples

	### Launch Script (Interactive)
	```bash
	./launch.sh
	# Follow prompts and select SFT or DPO
	```

	### Direct Training
	```bash
	# SFT training (default)
	python src/train.py config/train_smollm3.py

	# DPO training
	python src/train.py config/train_smollm3_dpo.py

	# Override trainer type
	python src/train.py config/train_smollm3.py --trainer_type dpo
	```

	### Training Script
	```bash
	# SFT training
	python scripts/training/train.py --config config/train_smollm3.py

	# DPO training with override
	python scripts/training/train.py --config config/train_smollm3.py --trainer-type dpo
	```

	## 🔧 Technical Details

	### Files Modified
	1. `config/train_smollm3.py` - Added trainer_type field
	2. `config/train_smollm3_dpo.py` - Added trainer_type field
	3. `src/train.py` - Added trainer selection logic
	4. `scripts/training/train.py` - Added trainer_type argument
	5. `launch.sh` - Added interactive selection and config generation
	6. `src/trainer.py` - Already had both trainer classes

	### Files Created
	1. `docs/TRAINER_SELECTION_GUIDE.md` - Comprehensive documentation
	2. `tests/test_trainer_selection.py` - Test suite
	3. `TRAINER_SELECTION_SUMMARY.md` - This summary

	## ✅ Testing Results
	```
	🧪 Testing Trainer Selection Implementation
	==================================================
	Testing config trainer_type...
	✅ Base config trainer_type: sft
	✅ DPO config trainer_type: dpo
	Testing trainer class existence...
	✅ Trainer module imported successfully
	✅ Both trainer classes exist
	Testing config inheritance...
	✅ DPO config properly inherits from base config
	✅ Trainer type inheritance works correctly
	==================================================
	Tests passed: 3/3
	🎉 All tests passed!
	```

	## 🚀 Next Steps

	The trainer selection feature is now fully implemented and tested. Users can:

	1. Use the interactive launch script to select SFT or DPO
	2. Override trainer type via command line arguments
	3. Use DPO configs that automatically select DPO trainer
	4. Monitor training with the same Trackio integration for both trainers

	The implementation maintains backward compatibility while adding the new trainer selection capability.