Spaces:

iteratehack
/

MentorFlow

Paused

App Files Files Community

MentorFlow / teacher_agent_dev /README.md

Cornelius

Deploy MentorFlow with GPU support

a52f96d 14 days ago

preview code

raw

history blame contribute delete

7.01 kB

	# Teacher Agent Development System

	A complete teacher agent system for developing and testing meta-RL curriculum learning algorithms independently.

	## Overview

	This system provides:
	- Mock Student Agent: Realistic student with learning + forgetting (Ebbinghaus curve)
	- Mock Task Generator: Simple task generator with multiple topics and difficulties
	- Teacher Agent: UCB (Upper Confidence Bound) bandit algorithm for curriculum sequencing
	- Training Loop: Complete training system with evaluation
	- Visualization: Plotting utilities for analysis

	## Installation

	```bash
	pip install -r requirements.txt
	```

	## Quick Start

	### 1. Run Tests

	```bash
	python test_teacher.py
	```

	This verifies:
	- Student learns with practice
	- Student forgets over time
	- Teacher explores actions
	- Teacher exploits good actions

	### 2. Train Teacher Agent

	```bash
	python train_teacher.py
	```

	Expected output:
	```
	======================================================================
	TEACHER AGENT TRAINING
	======================================================================
	Iterations: 500
	Evaluation tasks: 15
	Action space: 30 actions
	======================================================================
	Iteration 0 \| Student Acc: 0.267 \| Avg Reward: 0.850 \| Action: his-ea-N
	Iteration 50 \| Student Acc: 0.453 \| Avg Reward: 1.120 \| Action: sci-me-R
	...
	Iteration 500 \| Student Acc: 0.812 \| Avg Reward: 0.780 \| Action: lit-ha-N
	```

	### 3. Generate Visualizations

	```python
	from train_teacher import train_teacher
	from visualize import *

	# Train teacher
	history, teacher, student = train_teacher(num_iterations=500)

	# Generate plots
	plot_learning_curves(history)
	plot_curriculum_heatmap(history)
	plot_action_distributions(teacher)
	```

	### 4. Compare with Baselines

	```python
	from train_teacher import train_teacher, train_baseline_random, train_baseline_fixed
	from visualize import plot_comparison

	# Train all strategies
	history_teacher, _, _ = train_teacher(num_iterations=500, verbose=False)
	history_random = train_baseline_random(num_iterations=500)
	history_fixed = train_baseline_fixed(num_iterations=500)

	# Compare
	plot_comparison({
	'teacher': history_teacher,
	'random': history_random,
	'fixed': history_fixed
	})
	```

	## Architecture

	### Components

	1. interfaces.py: Shared data structures (Task, StudentState, TeacherAction) and ABC interfaces
	2. mock_student.py: Student agent with learning (improves with practice) and forgetting (Ebbinghaus curve)
	3. mock_task_generator.py: Simple task generator with 5 topics × 3 difficulties
	4. teacher_agent.py: UCB bandit algorithm for selecting curriculum actions
	5. train_teacher.py: Main training loop connecting all components
	6. test_teacher.py: Unit tests for all components
	7. visualize.py: Plotting utilities for analysis

	### Action Space

	Teacher selects from 30 actions:
	- 5 topics: history, science, literature, geography, current_events
	- 3 difficulties: easy, medium, hard
	- 2 options: new material or review

	### Student Model

	- Learning: Skill improves with practice: `new_skill = old_skill + learning_rate * difficulty_factor * (1 - old_skill)`
	- Forgetting: Retention decays over time: `retention = exp(-forgetting_rate * time_since_practice)`
	- Effective Skill: `effective_skill = base_skill * retention`
	- Accuracy: `accuracy = 0.25 + 0.75 * effective_skill` (25% is random guessing on 4-choice MCQ)

	### Teacher Algorithm

	UCB (Upper Confidence Bound):
	```
	UCB(a) = estimated_reward(a) + exploration_bonus × sqrt(log(total_pulls) / pulls(a))
	```

	- Balances exploration (trying new actions) vs exploitation (using known-good actions)
	- Exploration bonus controls adventurousness (higher = more exploration)

	### Reward Function

	```
	reward = improvement + difficulty_bonus + review_bonus + review_penalty

	where:
	- improvement = accuracy_after - accuracy_before
	- difficulty_bonus = easy:0.5, medium:1.0, hard:2.0
	- review_bonus = 1.0 if review and improvement > 0
	- review_penalty = -0.5 if review and accuracy > 0.9 (wasted review)
	```

	## Expected Behavior

	### Early Iterations (0-100)
	- Teacher explores all topics/difficulties
	- Tries mostly easy tasks (build foundation)
	- High exploration, low exploitation

	### Mid Iterations (100-300)
	- Starts increasing difficulty
	- Discovers which topics student struggles with
	- Begins strategic reviewing

	### Late Iterations (300-500)
	- Mostly medium/hard tasks (student is skilled)
	- Reviews topics just before forgetting threshold
	- High exploitation of known-good curriculum

	### Emergent Behaviors
	- Teacher gives harder tasks as student improves
	- Teacher reviews topics ~30-50 iterations after practice (optimal timing)
	- Teacher specializes in topics student finds difficult

	## Success Criteria

	After training, you should see:
	- ✅ Student reaches >70% accuracy by iteration 500
	- ✅ Teacher discovers: easy tasks first → harder tasks later
	- ✅ Teacher learns to review before forgetting
	- ✅ Teacher reward stabilizes (not just random)

	## File Structure

	```
	teacher_agent_dev/
	├── interfaces.py # Shared data structures and ABC interfaces
	├── mock_student.py # Mock student with learning + forgetting
	├── mock_task_generator.py # Simple task generator
	├── teacher_agent.py # MAIN: UCB bandit teacher algorithm
	├── train_teacher.py # Training loop
	├── test_teacher.py # Unit tests
	├── visualize.py # Plotting utilities
	├── requirements.txt # Dependencies
	└── README.md # This file
	```

	## Customization

	### Adjust Student Learning
	```python
	student = MockStudentAgent(
	learning_rate=0.15, # How fast student learns (higher = faster)
	forgetting_rate=0.05 # How fast student forgets (higher = faster)
	)
	```

	### Adjust Teacher Exploration
	```python
	teacher = TeacherAgent(
	exploration_bonus=2.0 # Higher = more exploration, Lower = more exploitation
	)
	```

	### Add More Topics/Difficulties
	Edit `mock_task_generator.py` to add more templates or modify `teacher_agent.py` to adjust action space.

	## Troubleshooting

	Issue: Student doesn't learn
	- Solution: Increase `learning_rate` in MockStudentAgent

	Issue: Teacher doesn't explore
	- Solution: Increase `exploration_bonus` in TeacherAgent

	Issue: Forgetting too fast/slow
	- Solution: Adjust `forgetting_rate` in MockStudentAgent

	Issue: Division by zero errors
	- Solution: UCB handles cold start automatically (untried actions selected first)

	## Next Steps

	1. Replace mock components: When teammates finish real student/task generator, swap out mock components
	2. Tune hyperparameters: Adjust learning_rate, forgetting_rate, exploration_bonus
	3. Experiment with algorithms: Try different bandit algorithms (Thompson Sampling, ε-greedy)
	4. Add features: More sophisticated reward functions, state representations, etc.

	## License

	MIT