Spaces:

iteratehack
/

MentorFlow

Paused

App Files Files Community

MentorFlow / teacher_agent_dev /COMPARISON_README.md

Cornelius

Deploy MentorFlow with GPU support

a52f96d 15 days ago

preview code

raw

history blame contribute delete

4.36 kB

	# Strategy Comparison: Teacher vs Baselines

	## Overview

	This module compares three training strategies for the student agent:

	1. Random Strategy: Student receives random questions from task generator until they can confidently pass difficult questions
	2. Progressive Strategy: Student receives questions in progressive difficulty order (Easy → Medium → Hard) within each family sequentially
	3. Teacher Strategy: RL teacher agent learns optimal curriculum using UCB bandit algorithm

	## Goal

	Demonstrate that the Teacher-trained student performs best - achieving highest accuracy on difficult questions.

	## Running the Comparison

	```bash
	cd teacher_agent_dev
	python compare_strategies.py
	```

	This will:
	- Train all three strategies for 500 iterations
	- Track accuracy on general questions and difficult questions
	- Generate comparison plots showing all three strategies
	- Print summary statistics

	## Output

	### Plot: `comparison_all_strategies.png`

	The plot contains three subplots:

	1. General Accuracy Over Time: Shows how student accuracy improves on medium-difficulty questions
	2. Difficult Question Accuracy: KEY METRIC - Shows accuracy on hard questions (most important for demonstrating teacher superiority)
	3. Learning Efficiency: Bar chart showing iterations to reach 75% target vs final performance

	### Key Metrics Tracked

	- General Accuracy: Student performance on medium-difficulty questions from all topics
	- Difficult Accuracy: Student performance on hard-difficulty questions (target metric)
	- Iterations to Target: How many iterations until student reaches 75% accuracy on difficult questions
	- Final Accuracy: Final performance after 500 iterations

	## Expected Results

	The Teacher strategy should show:
	- ✅ Highest final accuracy on difficult questions
	- ✅ Efficient learning (good balance of speed and performance)
	- ✅ Better curriculum (smarter topic/difficulty selection)

	### Example Output

	```
	STRATEGY COMPARISON SUMMARY
	======================================================================
	Random \| ✅ Reached \| Iterations: 51 \| Final Acc: 0.760
	Progressive \| ✅ Reached \| Iterations: 310 \| Final Acc: 0.520
	Teacher \| ✅ Reached \| Iterations: 55 \| Final Acc: 0.880
	======================================================================
	```

	Teacher wins with highest final accuracy!

	## Strategy Details

	### Random Strategy
	- Completely random selection of topics and difficulties
	- No curriculum structure
	- Baseline for comparison
	- May reach target quickly due to luck, but doesn't optimize learning

	### Progressive Strategy
	- Rigid curriculum: Easy → Medium → Hard for each topic sequentially
	- No adaptation to student needs
	- Slow to reach difficult questions
	- Doesn't account for forgetting or optimal pacing

	### Teacher Strategy
	- RL-based curriculum learning
	- Uses UCB bandit to balance exploration/exploitation
	- Adapts based on student improvement (reward signal)
	- Optimizes for efficient learning
	- Can strategically review topics to prevent forgetting

	## Visualization Features

	- Color coding: Teacher in green (highlighted as best), Random in red, Progressive in teal
	- Line styles: Teacher with solid thick line, baselines with dashed/dotted
	- Annotations: Final accuracy values labeled on plots
	- Target line: 75% accuracy threshold marked on difficult question plot
	- Summary statistics: Table showing which strategies reached target and when

	## Customization

	You can modify parameters in `compare_strategies.py`:

	```python
	num_iterations = 500 # Number of training iterations
	target_accuracy = 0.75 # Target accuracy on difficult questions
	seed = 42 # Random seed for reproducibility
	```

	## Files

	- `compare_strategies.py` - Main comparison script
	- `comparison_all_strategies.png` - Generated comparison plot
	- `train_teacher.py` - Teacher training logic
	- `mock_student.py` - Student agent implementation
	- `mock_task_generator.py` - Task generator

	## Notes

	- All strategies use the same student parameters for fair comparison
	- Evaluation uses held-out test sets
	- Teacher strategy learns from rewards based on student improvement
	- Results may vary slightly due to randomness, but teacher should consistently outperform baselines