aman-jaglan commited on
Commit
717ad39
·
verified ·
1 Parent(s): 0dddd69

update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -30
README.md CHANGED
@@ -1,75 +1,118 @@
1
  ---
2
  license: apache-2.0
3
- ---
4
-
5
- ---
6
  base_model: Qwen/Qwen3-8B
7
  tags:
8
  - adaptive-teaching
9
  - reinforcement-learning
10
  - educational
 
11
  datasets:
12
  - Arc-Intelligence/Arc-ATLAS-Teach-v0
13
  language:
14
  - en
15
  library_name: transformers
 
16
  ---
17
 
18
  # ATLAS-Teach-8B-Instruct
19
 
20
- An adaptive teaching model trained using the Reinforcement Collaborative Learning (RCL) framework. This is the supervised fine-tuning (SFT) checkpoint before reinforcement learning.
21
 
22
  ## Model Details
23
 
 
24
  - **Base Model**: Qwen/Qwen3-8B
25
- - **Model Size**: 8B parameters
26
- - **Training Stage**: Supervised Fine-tuning (Pre-RL)
27
- - **Framework**: RCL (Reinforcement Collaborative Learning)
28
 
29
- ## Training Data
 
 
 
 
30
 
31
- Trained on `Arc-Intelligence/Arc-ATLAS-Teach-v0` dataset with RCL-specific formatting for adaptive teaching.
32
 
33
- ## Intended Use
 
 
 
34
 
35
- This model is designed for:
36
- - Adaptive teaching based on student capability assessment
37
- - Educational content generation
38
- - Problem-solving assistance with tailored explanations
39
-
40
- ## Training Configuration
41
 
42
- - **Hardware**: 8x H100 GPUs
43
- - **Framework**: RCL
44
- - **Mixed Precision**: BF16
45
 
46
- ## Adaptive Teaching Protocol
 
 
47
 
48
- The model implements a two-pass teaching approach:
49
-
50
- 1. **Diagnostic Probing**: Assesses student understanding with minimal interaction
51
- 2. **Adaptive Teaching**: Generates tailored teaching based on diagnosed capability
52
 
53
  ## Usage
54
 
 
55
  ```python
56
  from transformers import AutoModelForCausalLM, AutoTokenizer
57
 
58
  model = AutoModelForCausalLM.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
59
  tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
60
 
61
- # Format your input according to the RCL teaching protocol
62
- prompt = "Question: {your_question}\n\nProvide adaptive teaching:"
 
 
 
 
 
 
 
 
63
  inputs = tokenizer(prompt, return_tensors="pt")
64
- outputs = model.generate(**inputs)
 
 
 
 
 
65
  response = tokenizer.decode(outputs[0], skip_special_tokens=True)
66
  ```
67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  ## Limitations
69
 
70
- - This is a pre-RL checkpoint; the full RCL training includes an additional RL phase
71
- - Performance metrics on specific benchmarks are being evaluated
 
 
 
 
 
 
 
 
 
72
 
73
  ## License
74
 
75
- Apache 2.0
 
 
 
 
 
1
  ---
2
  license: apache-2.0
 
 
 
3
  base_model: Qwen/Qwen3-8B
4
  tags:
5
  - adaptive-teaching
6
  - reinforcement-learning
7
  - educational
8
+ - reasoning
9
  datasets:
10
  - Arc-Intelligence/Arc-ATLAS-Teach-v0
11
  language:
12
  - en
13
  library_name: transformers
14
+ pipeline_tag: text-generation
15
  ---
16
 
17
  # ATLAS-Teach-8B-Instruct
18
 
19
+ A supervised fine-tuned teaching model that forms the foundation for Reinforcement Collaborative Learning (RCL). This checkpoint represents the initial teaching capability before reinforcement learning optimization.
20
 
21
  ## Model Details
22
 
23
+ ### Architecture
24
  - **Base Model**: Qwen/Qwen3-8B
25
+ - **Parameters**: 8B
26
+ - **Context Length**: 16,384 tokens
27
+ - **Training Stage**: Supervised Fine-tuning (SFT)
28
 
29
+ ### Training Framework
30
+ - **Method**: Reinforcement Collaborative Learning (RCL) - SFT phase
31
+ - **Hardware**: 4x H100 GPUs
32
+ - **Optimization**: DeepSpeed ZeRO-3
33
+ - **Precision**: BF16
34
 
35
+ ## Dataset
36
 
37
+ **Arc-Intelligence/Arc-ATLAS-Teach-v0**
38
+ - Custom dataset designed for adaptive teaching scenarios
39
+ - Formatted with RCL-specific teaching protocols
40
+ - Includes reasoning traces and solution demonstrations
41
 
42
+ ## Adaptive Teaching Approach
 
 
 
 
 
43
 
44
+ The model follows a structured teaching protocol:
 
 
45
 
46
+ ### Two-Pass System
47
+ 1. **Student Diagnostic**: Brief capability assessment (≤500 tokens)
48
+ 2. **Adaptive Response**: Tailored teaching based on diagnosed understanding level
49
 
50
+ ### Key Features
51
+ - Asymmetric reward structure (2x penalty for performance degradation)
52
+ - Efficiency-aware teaching generation
53
+ - Solution tag enforcement (`<solution></solution>`)
54
 
55
  ## Usage
56
 
57
+ ### Basic Generation
58
  ```python
59
  from transformers import AutoModelForCausalLM, AutoTokenizer
60
 
61
  model = AutoModelForCausalLM.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
62
  tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
63
 
64
+ # Example prompt following RCL format
65
+ prompt = """Question: {problem_text}
66
+
67
+ Briefly describe:
68
+ 1. What type of problem this is
69
+ 2. The key concepts or steps needed
70
+ 3. Any potential challenges you see
71
+
72
+ Your initial approach:"""
73
+
74
  inputs = tokenizer(prompt, return_tensors="pt")
75
+ outputs = model.generate(
76
+ **inputs,
77
+ max_new_tokens=2048,
78
+ temperature=0.7,
79
+ do_sample=True
80
+ )
81
  response = tokenizer.decode(outputs[0], skip_special_tokens=True)
82
  ```
83
 
84
+ ### Teaching Format
85
+ The model expects structured input for optimal teaching generation:
86
+ - Problem statement with clear question
87
+ - Optional student approach for adaptive guidance
88
+ - Responses include `<solution>` tags for final answers
89
+
90
+ ## Training Configuration
91
+
92
+ Key hyperparameters from SFT phase:
93
+ - Learning rate: 1e-5
94
+ - Batch size: Per-device batch size of 1
95
+ - Mixed precision: BF16
96
+ - Gradient accumulation: Optimized for 4 GPU setup
97
+
98
  ## Limitations
99
 
100
+ - **Pre-RL Checkpoint**: This model has not undergone reinforcement learning optimization
101
+ - **Domain Scope**: Primarily trained on mathematical and reasoning problems
102
+ - **Token Limits**: Student diagnostic capped at 500 tokens for efficiency
103
+ - **Evaluation**: Full benchmark results pending RL phase completion
104
+
105
+ ## Future Development
106
+
107
+ This SFT checkpoint serves as the foundation for:
108
+ - Reinforcement learning with adaptive teaching rewards
109
+ - Student model capability assessment integration
110
+ - Multi-turn teaching dialogue optimization
111
 
112
  ## License
113
 
114
+ Apache 2.0
115
+
116
+ ## Repository
117
+
118
+ Training code and implementation details: [GitHub - RCL](https://github.com/Arc-Computer/RCL)