Spaces:
Paused
Paused
File size: 7,012 Bytes
a52f96d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
# Teacher Agent Development System
A complete teacher agent system for developing and testing meta-RL curriculum learning algorithms independently.
## Overview
This system provides:
- **Mock Student Agent**: Realistic student with learning + forgetting (Ebbinghaus curve)
- **Mock Task Generator**: Simple task generator with multiple topics and difficulties
- **Teacher Agent**: UCB (Upper Confidence Bound) bandit algorithm for curriculum sequencing
- **Training Loop**: Complete training system with evaluation
- **Visualization**: Plotting utilities for analysis
## Installation
```bash
pip install -r requirements.txt
```
## Quick Start
### 1. Run Tests
```bash
python test_teacher.py
```
This verifies:
- Student learns with practice
- Student forgets over time
- Teacher explores actions
- Teacher exploits good actions
### 2. Train Teacher Agent
```bash
python train_teacher.py
```
Expected output:
```
======================================================================
TEACHER AGENT TRAINING
======================================================================
Iterations: 500
Evaluation tasks: 15
Action space: 30 actions
======================================================================
Iteration 0 | Student Acc: 0.267 | Avg Reward: 0.850 | Action: his-ea-N
Iteration 50 | Student Acc: 0.453 | Avg Reward: 1.120 | Action: sci-me-R
...
Iteration 500 | Student Acc: 0.812 | Avg Reward: 0.780 | Action: lit-ha-N
```
### 3. Generate Visualizations
```python
from train_teacher import train_teacher
from visualize import *
# Train teacher
history, teacher, student = train_teacher(num_iterations=500)
# Generate plots
plot_learning_curves(history)
plot_curriculum_heatmap(history)
plot_action_distributions(teacher)
```
### 4. Compare with Baselines
```python
from train_teacher import train_teacher, train_baseline_random, train_baseline_fixed
from visualize import plot_comparison
# Train all strategies
history_teacher, _, _ = train_teacher(num_iterations=500, verbose=False)
history_random = train_baseline_random(num_iterations=500)
history_fixed = train_baseline_fixed(num_iterations=500)
# Compare
plot_comparison({
'teacher': history_teacher,
'random': history_random,
'fixed': history_fixed
})
```
## Architecture
### Components
1. **interfaces.py**: Shared data structures (Task, StudentState, TeacherAction) and ABC interfaces
2. **mock_student.py**: Student agent with learning (improves with practice) and forgetting (Ebbinghaus curve)
3. **mock_task_generator.py**: Simple task generator with 5 topics Γ 3 difficulties
4. **teacher_agent.py**: UCB bandit algorithm for selecting curriculum actions
5. **train_teacher.py**: Main training loop connecting all components
6. **test_teacher.py**: Unit tests for all components
7. **visualize.py**: Plotting utilities for analysis
### Action Space
Teacher selects from **30 actions**:
- 5 topics: history, science, literature, geography, current_events
- 3 difficulties: easy, medium, hard
- 2 options: new material or review
### Student Model
- **Learning**: Skill improves with practice: `new_skill = old_skill + learning_rate * difficulty_factor * (1 - old_skill)`
- **Forgetting**: Retention decays over time: `retention = exp(-forgetting_rate * time_since_practice)`
- **Effective Skill**: `effective_skill = base_skill * retention`
- **Accuracy**: `accuracy = 0.25 + 0.75 * effective_skill` (25% is random guessing on 4-choice MCQ)
### Teacher Algorithm
**UCB (Upper Confidence Bound)**:
```
UCB(a) = estimated_reward(a) + exploration_bonus Γ sqrt(log(total_pulls) / pulls(a))
```
- Balances exploration (trying new actions) vs exploitation (using known-good actions)
- Exploration bonus controls adventurousness (higher = more exploration)
### Reward Function
```
reward = improvement + difficulty_bonus + review_bonus + review_penalty
where:
- improvement = accuracy_after - accuracy_before
- difficulty_bonus = easy:0.5, medium:1.0, hard:2.0
- review_bonus = 1.0 if review and improvement > 0
- review_penalty = -0.5 if review and accuracy > 0.9 (wasted review)
```
## Expected Behavior
### Early Iterations (0-100)
- Teacher explores all topics/difficulties
- Tries mostly easy tasks (build foundation)
- High exploration, low exploitation
### Mid Iterations (100-300)
- Starts increasing difficulty
- Discovers which topics student struggles with
- Begins strategic reviewing
### Late Iterations (300-500)
- Mostly medium/hard tasks (student is skilled)
- Reviews topics just before forgetting threshold
- High exploitation of known-good curriculum
### Emergent Behaviors
- Teacher gives harder tasks as student improves
- Teacher reviews topics ~30-50 iterations after practice (optimal timing)
- Teacher specializes in topics student finds difficult
## Success Criteria
After training, you should see:
- β
Student reaches >70% accuracy by iteration 500
- β
Teacher discovers: easy tasks first β harder tasks later
- β
Teacher learns to review before forgetting
- β
Teacher reward stabilizes (not just random)
## File Structure
```
teacher_agent_dev/
βββ interfaces.py # Shared data structures and ABC interfaces
βββ mock_student.py # Mock student with learning + forgetting
βββ mock_task_generator.py # Simple task generator
βββ teacher_agent.py # MAIN: UCB bandit teacher algorithm
βββ train_teacher.py # Training loop
βββ test_teacher.py # Unit tests
βββ visualize.py # Plotting utilities
βββ requirements.txt # Dependencies
βββ README.md # This file
```
## Customization
### Adjust Student Learning
```python
student = MockStudentAgent(
learning_rate=0.15, # How fast student learns (higher = faster)
forgetting_rate=0.05 # How fast student forgets (higher = faster)
)
```
### Adjust Teacher Exploration
```python
teacher = TeacherAgent(
exploration_bonus=2.0 # Higher = more exploration, Lower = more exploitation
)
```
### Add More Topics/Difficulties
Edit `mock_task_generator.py` to add more templates or modify `teacher_agent.py` to adjust action space.
## Troubleshooting
**Issue**: Student doesn't learn
- **Solution**: Increase `learning_rate` in MockStudentAgent
**Issue**: Teacher doesn't explore
- **Solution**: Increase `exploration_bonus` in TeacherAgent
**Issue**: Forgetting too fast/slow
- **Solution**: Adjust `forgetting_rate` in MockStudentAgent
**Issue**: Division by zero errors
- **Solution**: UCB handles cold start automatically (untried actions selected first)
## Next Steps
1. **Replace mock components**: When teammates finish real student/task generator, swap out mock components
2. **Tune hyperparameters**: Adjust learning_rate, forgetting_rate, exploration_bonus
3. **Experiment with algorithms**: Try different bandit algorithms (Thompson Sampling, Ξ΅-greedy)
4. **Add features**: More sophisticated reward functions, state representations, etc.
## License
MIT
|