Spaces:
Paused
Paused
| # Strategy Comparison: Teacher vs Baselines | |
| ## Overview | |
| This module compares three training strategies for the student agent: | |
| 1. **Random Strategy**: Student receives random questions from task generator until they can confidently pass difficult questions | |
| 2. **Progressive Strategy**: Student receives questions in progressive difficulty order (Easy β Medium β Hard) within each family sequentially | |
| 3. **Teacher Strategy**: RL teacher agent learns optimal curriculum using UCB bandit algorithm | |
| ## Goal | |
| Demonstrate that the **Teacher-trained student performs best** - achieving highest accuracy on difficult questions. | |
| ## Running the Comparison | |
| ```bash | |
| cd teacher_agent_dev | |
| python compare_strategies.py | |
| ``` | |
| This will: | |
| - Train all three strategies for 500 iterations | |
| - Track accuracy on general questions and difficult questions | |
| - Generate comparison plots showing all three strategies | |
| - Print summary statistics | |
| ## Output | |
| ### Plot: `comparison_all_strategies.png` | |
| The plot contains three subplots: | |
| 1. **General Accuracy Over Time**: Shows how student accuracy improves on medium-difficulty questions | |
| 2. **Difficult Question Accuracy**: **KEY METRIC** - Shows accuracy on hard questions (most important for demonstrating teacher superiority) | |
| 3. **Learning Efficiency**: Bar chart showing iterations to reach 75% target vs final performance | |
| ### Key Metrics Tracked | |
| - **General Accuracy**: Student performance on medium-difficulty questions from all topics | |
| - **Difficult Accuracy**: Student performance on hard-difficulty questions (target metric) | |
| - **Iterations to Target**: How many iterations until student reaches 75% accuracy on difficult questions | |
| - **Final Accuracy**: Final performance after 500 iterations | |
| ## Expected Results | |
| The Teacher strategy should show: | |
| - β **Highest final accuracy** on difficult questions | |
| - β **Efficient learning** (good balance of speed and performance) | |
| - β **Better curriculum** (smarter topic/difficulty selection) | |
| ### Example Output | |
| ``` | |
| STRATEGY COMPARISON SUMMARY | |
| ====================================================================== | |
| Random | β Reached | Iterations: 51 | Final Acc: 0.760 | |
| Progressive | β Reached | Iterations: 310 | Final Acc: 0.520 | |
| Teacher | β Reached | Iterations: 55 | Final Acc: 0.880 | |
| ====================================================================== | |
| ``` | |
| **Teacher wins with highest final accuracy!** | |
| ## Strategy Details | |
| ### Random Strategy | |
| - Completely random selection of topics and difficulties | |
| - No curriculum structure | |
| - Baseline for comparison | |
| - May reach target quickly due to luck, but doesn't optimize learning | |
| ### Progressive Strategy | |
| - Rigid curriculum: Easy β Medium β Hard for each topic sequentially | |
| - No adaptation to student needs | |
| - Slow to reach difficult questions | |
| - Doesn't account for forgetting or optimal pacing | |
| ### Teacher Strategy | |
| - **RL-based curriculum learning** | |
| - Uses UCB bandit to balance exploration/exploitation | |
| - Adapts based on student improvement (reward signal) | |
| - Optimizes for efficient learning | |
| - Can strategically review topics to prevent forgetting | |
| ## Visualization Features | |
| - **Color coding**: Teacher in green (highlighted as best), Random in red, Progressive in teal | |
| - **Line styles**: Teacher with solid thick line, baselines with dashed/dotted | |
| - **Annotations**: Final accuracy values labeled on plots | |
| - **Target line**: 75% accuracy threshold marked on difficult question plot | |
| - **Summary statistics**: Table showing which strategies reached target and when | |
| ## Customization | |
| You can modify parameters in `compare_strategies.py`: | |
| ```python | |
| num_iterations = 500 # Number of training iterations | |
| target_accuracy = 0.75 # Target accuracy on difficult questions | |
| seed = 42 # Random seed for reproducibility | |
| ``` | |
| ## Files | |
| - `compare_strategies.py` - Main comparison script | |
| - `comparison_all_strategies.png` - Generated comparison plot | |
| - `train_teacher.py` - Teacher training logic | |
| - `mock_student.py` - Student agent implementation | |
| - `mock_task_generator.py` - Task generator | |
| ## Notes | |
| - All strategies use the same student parameters for fair comparison | |
| - Evaluation uses held-out test sets | |
| - Teacher strategy learns from rewards based on student improvement | |
| - Results may vary slightly due to randomness, but teacher should consistently outperform baselines | |