Cornelius
Deploy MentorFlow with GPU support
a52f96d
# Teacher Agent Development System
A complete teacher agent system for developing and testing meta-RL curriculum learning algorithms independently.
## Overview
This system provides:
- **Mock Student Agent**: Realistic student with learning + forgetting (Ebbinghaus curve)
- **Mock Task Generator**: Simple task generator with multiple topics and difficulties
- **Teacher Agent**: UCB (Upper Confidence Bound) bandit algorithm for curriculum sequencing
- **Training Loop**: Complete training system with evaluation
- **Visualization**: Plotting utilities for analysis
## Installation
```bash
pip install -r requirements.txt
```
## Quick Start
### 1. Run Tests
```bash
python test_teacher.py
```
This verifies:
- Student learns with practice
- Student forgets over time
- Teacher explores actions
- Teacher exploits good actions
### 2. Train Teacher Agent
```bash
python train_teacher.py
```
Expected output:
```
======================================================================
TEACHER AGENT TRAINING
======================================================================
Iterations: 500
Evaluation tasks: 15
Action space: 30 actions
======================================================================
Iteration 0 | Student Acc: 0.267 | Avg Reward: 0.850 | Action: his-ea-N
Iteration 50 | Student Acc: 0.453 | Avg Reward: 1.120 | Action: sci-me-R
...
Iteration 500 | Student Acc: 0.812 | Avg Reward: 0.780 | Action: lit-ha-N
```
### 3. Generate Visualizations
```python
from train_teacher import train_teacher
from visualize import *
# Train teacher
history, teacher, student = train_teacher(num_iterations=500)
# Generate plots
plot_learning_curves(history)
plot_curriculum_heatmap(history)
plot_action_distributions(teacher)
```
### 4. Compare with Baselines
```python
from train_teacher import train_teacher, train_baseline_random, train_baseline_fixed
from visualize import plot_comparison
# Train all strategies
history_teacher, _, _ = train_teacher(num_iterations=500, verbose=False)
history_random = train_baseline_random(num_iterations=500)
history_fixed = train_baseline_fixed(num_iterations=500)
# Compare
plot_comparison({
'teacher': history_teacher,
'random': history_random,
'fixed': history_fixed
})
```
## Architecture
### Components
1. **interfaces.py**: Shared data structures (Task, StudentState, TeacherAction) and ABC interfaces
2. **mock_student.py**: Student agent with learning (improves with practice) and forgetting (Ebbinghaus curve)
3. **mock_task_generator.py**: Simple task generator with 5 topics Γ— 3 difficulties
4. **teacher_agent.py**: UCB bandit algorithm for selecting curriculum actions
5. **train_teacher.py**: Main training loop connecting all components
6. **test_teacher.py**: Unit tests for all components
7. **visualize.py**: Plotting utilities for analysis
### Action Space
Teacher selects from **30 actions**:
- 5 topics: history, science, literature, geography, current_events
- 3 difficulties: easy, medium, hard
- 2 options: new material or review
### Student Model
- **Learning**: Skill improves with practice: `new_skill = old_skill + learning_rate * difficulty_factor * (1 - old_skill)`
- **Forgetting**: Retention decays over time: `retention = exp(-forgetting_rate * time_since_practice)`
- **Effective Skill**: `effective_skill = base_skill * retention`
- **Accuracy**: `accuracy = 0.25 + 0.75 * effective_skill` (25% is random guessing on 4-choice MCQ)
### Teacher Algorithm
**UCB (Upper Confidence Bound)**:
```
UCB(a) = estimated_reward(a) + exploration_bonus Γ— sqrt(log(total_pulls) / pulls(a))
```
- Balances exploration (trying new actions) vs exploitation (using known-good actions)
- Exploration bonus controls adventurousness (higher = more exploration)
### Reward Function
```
reward = improvement + difficulty_bonus + review_bonus + review_penalty
where:
- improvement = accuracy_after - accuracy_before
- difficulty_bonus = easy:0.5, medium:1.0, hard:2.0
- review_bonus = 1.0 if review and improvement > 0
- review_penalty = -0.5 if review and accuracy > 0.9 (wasted review)
```
## Expected Behavior
### Early Iterations (0-100)
- Teacher explores all topics/difficulties
- Tries mostly easy tasks (build foundation)
- High exploration, low exploitation
### Mid Iterations (100-300)
- Starts increasing difficulty
- Discovers which topics student struggles with
- Begins strategic reviewing
### Late Iterations (300-500)
- Mostly medium/hard tasks (student is skilled)
- Reviews topics just before forgetting threshold
- High exploitation of known-good curriculum
### Emergent Behaviors
- Teacher gives harder tasks as student improves
- Teacher reviews topics ~30-50 iterations after practice (optimal timing)
- Teacher specializes in topics student finds difficult
## Success Criteria
After training, you should see:
- βœ… Student reaches >70% accuracy by iteration 500
- βœ… Teacher discovers: easy tasks first β†’ harder tasks later
- βœ… Teacher learns to review before forgetting
- βœ… Teacher reward stabilizes (not just random)
## File Structure
```
teacher_agent_dev/
β”œβ”€β”€ interfaces.py # Shared data structures and ABC interfaces
β”œβ”€β”€ mock_student.py # Mock student with learning + forgetting
β”œβ”€β”€ mock_task_generator.py # Simple task generator
β”œβ”€β”€ teacher_agent.py # MAIN: UCB bandit teacher algorithm
β”œβ”€β”€ train_teacher.py # Training loop
β”œβ”€β”€ test_teacher.py # Unit tests
β”œβ”€β”€ visualize.py # Plotting utilities
β”œβ”€β”€ requirements.txt # Dependencies
└── README.md # This file
```
## Customization
### Adjust Student Learning
```python
student = MockStudentAgent(
learning_rate=0.15, # How fast student learns (higher = faster)
forgetting_rate=0.05 # How fast student forgets (higher = faster)
)
```
### Adjust Teacher Exploration
```python
teacher = TeacherAgent(
exploration_bonus=2.0 # Higher = more exploration, Lower = more exploitation
)
```
### Add More Topics/Difficulties
Edit `mock_task_generator.py` to add more templates or modify `teacher_agent.py` to adjust action space.
## Troubleshooting
**Issue**: Student doesn't learn
- **Solution**: Increase `learning_rate` in MockStudentAgent
**Issue**: Teacher doesn't explore
- **Solution**: Increase `exploration_bonus` in TeacherAgent
**Issue**: Forgetting too fast/slow
- **Solution**: Adjust `forgetting_rate` in MockStudentAgent
**Issue**: Division by zero errors
- **Solution**: UCB handles cold start automatically (untried actions selected first)
## Next Steps
1. **Replace mock components**: When teammates finish real student/task generator, swap out mock components
2. **Tune hyperparameters**: Adjust learning_rate, forgetting_rate, exploration_bonus
3. **Experiment with algorithms**: Try different bandit algorithms (Thompson Sampling, Ξ΅-greedy)
4. **Add features**: More sophisticated reward functions, state representations, etc.
## License
MIT