MentorFlow / teacher_agent_dev /COMPARISON_README.md
Cornelius
Deploy MentorFlow with GPU support
a52f96d
# Strategy Comparison: Teacher vs Baselines
## Overview
This module compares three training strategies for the student agent:
1. **Random Strategy**: Student receives random questions from task generator until they can confidently pass difficult questions
2. **Progressive Strategy**: Student receives questions in progressive difficulty order (Easy β†’ Medium β†’ Hard) within each family sequentially
3. **Teacher Strategy**: RL teacher agent learns optimal curriculum using UCB bandit algorithm
## Goal
Demonstrate that the **Teacher-trained student performs best** - achieving highest accuracy on difficult questions.
## Running the Comparison
```bash
cd teacher_agent_dev
python compare_strategies.py
```
This will:
- Train all three strategies for 500 iterations
- Track accuracy on general questions and difficult questions
- Generate comparison plots showing all three strategies
- Print summary statistics
## Output
### Plot: `comparison_all_strategies.png`
The plot contains three subplots:
1. **General Accuracy Over Time**: Shows how student accuracy improves on medium-difficulty questions
2. **Difficult Question Accuracy**: **KEY METRIC** - Shows accuracy on hard questions (most important for demonstrating teacher superiority)
3. **Learning Efficiency**: Bar chart showing iterations to reach 75% target vs final performance
### Key Metrics Tracked
- **General Accuracy**: Student performance on medium-difficulty questions from all topics
- **Difficult Accuracy**: Student performance on hard-difficulty questions (target metric)
- **Iterations to Target**: How many iterations until student reaches 75% accuracy on difficult questions
- **Final Accuracy**: Final performance after 500 iterations
## Expected Results
The Teacher strategy should show:
- βœ… **Highest final accuracy** on difficult questions
- βœ… **Efficient learning** (good balance of speed and performance)
- βœ… **Better curriculum** (smarter topic/difficulty selection)
### Example Output
```
STRATEGY COMPARISON SUMMARY
======================================================================
Random | βœ… Reached | Iterations: 51 | Final Acc: 0.760
Progressive | βœ… Reached | Iterations: 310 | Final Acc: 0.520
Teacher | βœ… Reached | Iterations: 55 | Final Acc: 0.880
======================================================================
```
**Teacher wins with highest final accuracy!**
## Strategy Details
### Random Strategy
- Completely random selection of topics and difficulties
- No curriculum structure
- Baseline for comparison
- May reach target quickly due to luck, but doesn't optimize learning
### Progressive Strategy
- Rigid curriculum: Easy β†’ Medium β†’ Hard for each topic sequentially
- No adaptation to student needs
- Slow to reach difficult questions
- Doesn't account for forgetting or optimal pacing
### Teacher Strategy
- **RL-based curriculum learning**
- Uses UCB bandit to balance exploration/exploitation
- Adapts based on student improvement (reward signal)
- Optimizes for efficient learning
- Can strategically review topics to prevent forgetting
## Visualization Features
- **Color coding**: Teacher in green (highlighted as best), Random in red, Progressive in teal
- **Line styles**: Teacher with solid thick line, baselines with dashed/dotted
- **Annotations**: Final accuracy values labeled on plots
- **Target line**: 75% accuracy threshold marked on difficult question plot
- **Summary statistics**: Table showing which strategies reached target and when
## Customization
You can modify parameters in `compare_strategies.py`:
```python
num_iterations = 500 # Number of training iterations
target_accuracy = 0.75 # Target accuracy on difficult questions
seed = 42 # Random seed for reproducibility
```
## Files
- `compare_strategies.py` - Main comparison script
- `comparison_all_strategies.png` - Generated comparison plot
- `train_teacher.py` - Teacher training logic
- `mock_student.py` - Student agent implementation
- `mock_task_generator.py` - Task generator
## Notes
- All strategies use the same student parameters for fair comparison
- Evaluation uses held-out test sets
- Teacher strategy learns from rewards based on student improvement
- Results may vary slightly due to randomness, but teacher should consistently outperform baselines