Jarrodbarnes commited on
Commit
ba6b17f
·
verified ·
1 Parent(s): 717ad39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -76
README.md CHANGED
@@ -1,118 +1,117 @@
1
  ---
2
  license: apache-2.0
3
- base_model: Qwen/Qwen3-8B
 
4
  tags:
5
- - adaptive-teaching
6
- - reinforcement-learning
7
- - educational
8
- - reasoning
 
 
9
  datasets:
10
- - Arc-Intelligence/Arc-ATLAS-Teach-v0
11
- language:
12
- - en
13
- library_name: transformers
14
- pipeline_tag: text-generation
15
  ---
16
 
17
- # ATLAS-Teach-8B-Instruct
18
 
19
- A supervised fine-tuned teaching model that forms the foundation for Reinforcement Collaborative Learning (RCL). This checkpoint represents the initial teaching capability before reinforcement learning optimization.
20
 
21
- ## Model Details
22
 
23
- ### Architecture
24
- - **Base Model**: Qwen/Qwen3-8B
25
- - **Parameters**: 8B
26
- - **Context Length**: 16,384 tokens
27
- - **Training Stage**: Supervised Fine-tuning (SFT)
28
 
29
- ### Training Framework
30
- - **Method**: Reinforcement Collaborative Learning (RCL) - SFT phase
31
- - **Hardware**: 4x H100 GPUs
32
- - **Optimization**: DeepSpeed ZeRO-3
33
- - **Precision**: BF16
34
 
35
- ## Dataset
36
 
37
- **Arc-Intelligence/Arc-ATLAS-Teach-v0**
38
- - Custom dataset designed for adaptive teaching scenarios
39
- - Formatted with RCL-specific teaching protocols
40
- - Includes reasoning traces and solution demonstrations
41
 
42
- ## Adaptive Teaching Approach
 
43
 
44
- The model follows a structured teaching protocol:
45
 
46
- ### Two-Pass System
47
- 1. **Student Diagnostic**: Brief capability assessment (≤500 tokens)
48
- 2. **Adaptive Response**: Tailored teaching based on diagnosed understanding level
49
 
50
- ### Key Features
51
- - Asymmetric reward structure (2x penalty for performance degradation)
52
- - Efficiency-aware teaching generation
53
- - Solution tag enforcement (`<solution></solution>`)
54
 
55
- ## Usage
56
 
57
- ### Basic Generation
58
  ```python
59
  from transformers import AutoModelForCausalLM, AutoTokenizer
60
 
61
- model = AutoModelForCausalLM.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
62
- tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
63
-
64
- # Example prompt following RCL format
65
- prompt = """Question: {problem_text}
 
66
 
67
- Briefly describe:
68
- 1. What type of problem this is
69
- 2. The key concepts or steps needed
70
- 3. Any potential challenges you see
71
 
72
- Your initial approach:"""
73
 
74
- inputs = tokenizer(prompt, return_tensors="pt")
75
  outputs = model.generate(
76
  **inputs,
77
- max_new_tokens=2048,
78
  temperature=0.7,
79
  do_sample=True
80
  )
81
  response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 
82
  ```
83
 
84
- ### Teaching Format
85
- The model expects structured input for optimal teaching generation:
86
- - Problem statement with clear question
87
- - Optional student approach for adaptive guidance
88
- - Responses include `<solution>` tags for final answers
89
 
90
- ## Training Configuration
91
 
92
- Key hyperparameters from SFT phase:
93
- - Learning rate: 1e-5
94
- - Batch size: Per-device batch size of 1
95
- - Mixed precision: BF16
96
- - Gradient accumulation: Optimized for 4 GPU setup
97
 
98
- ## Limitations
 
 
 
 
99
 
100
- - **Pre-RL Checkpoint**: This model has not undergone reinforcement learning optimization
101
- - **Domain Scope**: Primarily trained on mathematical and reasoning problems
102
- - **Token Limits**: Student diagnostic capped at 500 tokens for efficiency
103
- - **Evaluation**: Full benchmark results pending RL phase completion
 
 
 
104
 
105
- ## Future Development
 
 
 
 
106
 
107
- This SFT checkpoint serves as the foundation for:
108
- - Reinforcement learning with adaptive teaching rewards
109
- - Student model capability assessment integration
110
- - Multi-turn teaching dialogue optimization
111
 
112
- ## License
113
 
114
- Apache 2.0
 
 
 
 
 
 
 
 
 
115
 
116
- ## Repository
117
 
118
- Training code and implementation details: [GitHub - RCL](https://github.com/Arc-Computer/RCL)
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
  tags:
6
+ - supervised-fine-tuning
7
+ - teacher-model
8
+ - pedagogy
9
+ - reasoning
10
+ - sft
11
+ base_model: Qwen/Qwen3-8B
12
  datasets:
13
+ - Arc-Intelligence/Arc-ATLAS-Teach-v0
14
+ model-index:
15
+ - name: ATLAS-8B-Instruct
16
+ results: []
 
17
  ---
18
 
19
+ ![ATLAS Banner](https://huggingface.co/Arc-Intelligence/ATLAS-8B-Instruct/resolve/main/ATLAS.png)
20
 
21
+ # ATLAS-8B-Instruct
22
 
23
+ **ATLAS-8B-Instruct** is a specialized teaching model developed by Arc Intelligence. It is the result of the first phase—Supervised Fine-Tuning (SFT)—of the [ATLAS Framework](https://github.com/Arc-Computer/ATLAS).
24
 
25
+ This model serves as the crucial foundation for the final reinforcement learning teacher, `ATLAS-8B-Thinking`. It has been trained on the `Arc-ATLAS-Teach-v0` dataset to learn the formats and structures of effective pedagogy, including how to generate high-quality reasoning traces, explanations, and solution demonstrations.
 
 
 
 
26
 
27
+ Think of this model as having memorized the curriculum; it knows what good teaching looks like. It is the essential starting point before the RL phase teaches it *how to adapt* that teaching to individual students.
 
 
 
 
28
 
29
+ ## Model's Role in the ATLAS Framework
30
 
31
+ The ATLAS training pipeline is a two-stage process:
 
 
 
32
 
33
+ 1. **Phase 1: Supervised Fine-Tuning (SFT)** → This is the phase that produces **`ATLAS-8B-Instruct`**. It learns the core knowledge and teaching formats from a static dataset.
34
+ 2. **Phase 2: Reinforcement Learning (RL)** → This phase takes `ATLAS-8B-Instruct` as its starting point and trains it to become an adaptive teacher, resulting in the final `ATLAS-8B-Thinking` model.
35
 
36
+ This checkpoint is released for researchers who wish to replicate our work, build upon the SFT foundation, or experiment with the second-stage RL training.
37
 
38
+ ## How to Use
 
 
39
 
40
+ `ATLAS-8B-Instruct` is not a general-purpose chat model. It is designed to generate teaching content based on the structured format used in our dataset.
 
 
 
41
 
42
+ ### Basic Generation Example
43
 
 
44
  ```python
45
  from transformers import AutoModelForCausalLM, AutoTokenizer
46
 
47
+ model = AutoModelForCausalLM.from_pretrained(
48
+ "Arc-Intelligence/ATLAS-8B-Instruct",
49
+ torch_dtype="auto",
50
+ device_map="auto"
51
+ )
52
+ tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-8B-Instruct")
53
 
54
+ # Example prompt following the SFT format
55
+ prompt = """Question: A farmer has 52 trees planted in a row over a length of 1850 meters. What is the distance between each tree?
 
 
56
 
57
+ Provide a step-by-step explanation to solve this problem."""
58
 
59
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
60
  outputs = model.generate(
61
  **inputs,
62
+ max_new_tokens=512,
63
  temperature=0.7,
64
  do_sample=True
65
  )
66
  response = tokenizer.decode(outputs[0], skip_special_tokens=True)
67
+ print(response)
68
  ```
69
 
70
+ ### Continuing to RL Training
 
 
 
 
71
 
72
+ This model is the direct input for the second phase of the ATLAS training pipeline. To use this model as the base for RL training, follow the instructions in the main repository.
73
 
74
+ ```bash
75
+ # In the ATLAS repository, the RL script is configured
76
+ # to load an SFT checkpoint like this one.
 
 
77
 
78
+ # Run Phase 2: Reinforcement Learning (RL)
79
+ scripts/launch_with_server.sh 1 3 configs/run/teacher_rcl.yaml
80
+ ```
81
+
82
+ ## Training Details
83
 
84
+ - **Base Model:** [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
85
+ - **Training Stage:** Supervised Fine-Tuning (SFT) only
86
+ - **Dataset:** [Arc-Intelligence/Arc-ATLAS-Teach-v0](https://huggingface.co/datasets/Arc-Intelligence/Arc-ATLAS-Teach-v0)
87
+ - **Context Length:** 8192 tokens
88
+ - **Hardware:** 4x H100 GPUs
89
+ - **Precision:** BF16
90
+ - **Framework:** DeepSpeed ZeRO-3
91
 
92
+ ## Limitations
93
+
94
+ - **Pre-RL Checkpoint**: This model has not undergone the reinforcement learning optimization that teaches adaptive teaching. The full performance gains reported in our paper are only realized after the RL phase.
95
+ - **Domain Scope**: Primarily trained on the mathematical and reasoning problems present in the `Arc-ATLAS-Teach-v0` dataset.
96
+ - **Not for Chat**: The model is not intended for conversational use and performs best with prompts that match the SFT data format.
97
 
98
+ ## Citation
 
 
 
99
 
100
+ If you use the ATLAS framework or our models in your research, please cite our work:
101
 
102
+ ```bibtex
103
+ @misc{barnes2025atlas,
104
+ title={{ATLAS: Adaptive Teaching and Learning Alignment System for Reinforcement Learning}},
105
+ author={Jarrod Barnes and Aman Jaglan},
106
+ year={2025},
107
+ publisher={Arc Intelligence},
108
+ note={Technical Report},
109
+ url={[https://github.com/Arc-Computer/ATLAS](https://github.com/Arc-Computer/ATLAS)}
110
+ }
111
+ ```
112
 
113
+ ## Project Resources
114
 
115
+ - **GitHub Repository:** [https://github.com/Arc-Computer/ATLAS](https://github.com/Arc-Computer/ATLAS)
116
+ - **Final RL Model:** [ATLAS-8B-Thinking](https://huggingface.co/Arc-Intelligence/ATLAS-8B-Thinking)
117
+ - **Training Dataset:** [Arc-ATLAS-Teach-v0](https://huggingface.co/datasets/Arc-Intelligence/Arc-ATLAS-Teach-v0)