Spaces:
Paused
Paused
File size: 4,359 Bytes
a52f96d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
# Strategy Comparison: Teacher vs Baselines
## Overview
This module compares three training strategies for the student agent:
1. **Random Strategy**: Student receives random questions from task generator until they can confidently pass difficult questions
2. **Progressive Strategy**: Student receives questions in progressive difficulty order (Easy β Medium β Hard) within each family sequentially
3. **Teacher Strategy**: RL teacher agent learns optimal curriculum using UCB bandit algorithm
## Goal
Demonstrate that the **Teacher-trained student performs best** - achieving highest accuracy on difficult questions.
## Running the Comparison
```bash
cd teacher_agent_dev
python compare_strategies.py
```
This will:
- Train all three strategies for 500 iterations
- Track accuracy on general questions and difficult questions
- Generate comparison plots showing all three strategies
- Print summary statistics
## Output
### Plot: `comparison_all_strategies.png`
The plot contains three subplots:
1. **General Accuracy Over Time**: Shows how student accuracy improves on medium-difficulty questions
2. **Difficult Question Accuracy**: **KEY METRIC** - Shows accuracy on hard questions (most important for demonstrating teacher superiority)
3. **Learning Efficiency**: Bar chart showing iterations to reach 75% target vs final performance
### Key Metrics Tracked
- **General Accuracy**: Student performance on medium-difficulty questions from all topics
- **Difficult Accuracy**: Student performance on hard-difficulty questions (target metric)
- **Iterations to Target**: How many iterations until student reaches 75% accuracy on difficult questions
- **Final Accuracy**: Final performance after 500 iterations
## Expected Results
The Teacher strategy should show:
- β
**Highest final accuracy** on difficult questions
- β
**Efficient learning** (good balance of speed and performance)
- β
**Better curriculum** (smarter topic/difficulty selection)
### Example Output
```
STRATEGY COMPARISON SUMMARY
======================================================================
Random | β
Reached | Iterations: 51 | Final Acc: 0.760
Progressive | β
Reached | Iterations: 310 | Final Acc: 0.520
Teacher | β
Reached | Iterations: 55 | Final Acc: 0.880
======================================================================
```
**Teacher wins with highest final accuracy!**
## Strategy Details
### Random Strategy
- Completely random selection of topics and difficulties
- No curriculum structure
- Baseline for comparison
- May reach target quickly due to luck, but doesn't optimize learning
### Progressive Strategy
- Rigid curriculum: Easy β Medium β Hard for each topic sequentially
- No adaptation to student needs
- Slow to reach difficult questions
- Doesn't account for forgetting or optimal pacing
### Teacher Strategy
- **RL-based curriculum learning**
- Uses UCB bandit to balance exploration/exploitation
- Adapts based on student improvement (reward signal)
- Optimizes for efficient learning
- Can strategically review topics to prevent forgetting
## Visualization Features
- **Color coding**: Teacher in green (highlighted as best), Random in red, Progressive in teal
- **Line styles**: Teacher with solid thick line, baselines with dashed/dotted
- **Annotations**: Final accuracy values labeled on plots
- **Target line**: 75% accuracy threshold marked on difficult question plot
- **Summary statistics**: Table showing which strategies reached target and when
## Customization
You can modify parameters in `compare_strategies.py`:
```python
num_iterations = 500 # Number of training iterations
target_accuracy = 0.75 # Target accuracy on difficult questions
seed = 42 # Random seed for reproducibility
```
## Files
- `compare_strategies.py` - Main comparison script
- `comparison_all_strategies.png` - Generated comparison plot
- `train_teacher.py` - Teacher training logic
- `mock_student.py` - Student agent implementation
- `mock_task_generator.py` - Task generator
## Notes
- All strategies use the same student parameters for fair comparison
- Evaluation uses held-out test sets
- Teacher strategy learns from rewards based on student improvement
- Results may vary slightly due to randomness, but teacher should consistently outperform baselines
|