File size: 2,185 Bytes
b5ace96
 
a52f96d
 
 
b5ace96
c775d45
b5ace96
 
a52f96d
 
b5ace96
 
a52f96d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
title: MentorFlow
emoji: πŸŽ“
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.0
app_file: app.py
pinned: false
license: mit
hardware: gpu-t4
---

# MentorFlow - Teacher-Student RL System

A meta-curriculum reinforcement learning system where an AI Teacher Agent learns to select optimal educational tasks to train an AI Student Agent.

## πŸš€ Features

- **Three Training Strategies**: Compare Random, Progressive, and Teacher-guided curriculum
- **LM Student (DistilBERT)**: Real neural network learning with memory decay
- **GPU Support**: Fast training with CUDA acceleration
- **Interactive Comparison**: Visualize learning curves and performance metrics

## πŸ“Š Usage

1. **Set Parameters**:
   - Iterations: Number of training iterations (50-500)
   - Seed: Random seed for reproducibility
   - Device: Choose GPU (cuda) or CPU

2. **Run Comparison**:
   - Click "Run Comparison" to start training
   - Monitor progress in the output text
   - View generated comparison plots

3. **Analyze Results**:
   - Learning curves show how each strategy improves
   - Difficult question performance shows final accuracy
   - Curriculum diversity shows topic coverage

## ⚑ Performance

- **With GPU**: ~5-10 minutes for 500 iterations
- **With CPU**: ~15-30 minutes for 500 iterations

## πŸ“ Project Structure

```
MentorFlow/
β”œβ”€β”€ app.py                      # Gradio web interface
β”œβ”€β”€ teacher_agent_dev/          # Teacher agent system
β”‚   β”œβ”€β”€ compare_strategies.py  # Main comparison script
β”‚   β”œβ”€β”€ teacher_agent.py       # UCB bandit teacher
β”‚   └── ...
β”œβ”€β”€ student_agent_dev/          # LM Student system
β”‚   β”œβ”€β”€ student_agent.py       # DistilBERT student
β”‚   └── ...
└── requirements_hf.txt        # Dependencies
```

## πŸ”§ Technical Details

- **Teacher Agent**: UCB (Upper Confidence Bound) multi-armed bandit
- **Student Agent**: DistilBERT with online learning
- **Memory Decay**: Ebbinghaus forgetting curve
- **Task Generator**: Procedural generation with 15 topics Γ— 7 difficulties

## πŸ“– More Information

See the main repository for detailed documentation and development guides.