Arc-Intelligence
/

ATLAS-8B-Instruct

@@ -1,118 +1,117 @@
 ---
 license: apache-2.0
-base_model: Qwen/Qwen3-8B
 tags:
-  - adaptive-teaching
-  - reinforcement-learning
-  - educational
-  - reasoning
 datasets:
-  - Arc-Intelligence/Arc-ATLAS-Teach-v0
-language:
-  - en
-library_name: transformers
-pipeline_tag: text-generation
 ---
-# ATLAS-Teach-8B-Instruct
-A supervised fine-tuned teaching model that forms the foundation for Reinforcement Collaborative Learning (RCL). This checkpoint represents the initial teaching capability before reinforcement learning optimization.
-## Model Details
-### Architecture
-- **Base Model**: Qwen/Qwen3-8B
-- **Parameters**: 8B
-- **Context Length**: 16,384 tokens
-- **Training Stage**: Supervised Fine-tuning (SFT)
-### Training Framework
-- **Method**: Reinforcement Collaborative Learning (RCL) - SFT phase
-- **Hardware**: 4x H100 GPUs
-- **Optimization**: DeepSpeed ZeRO-3
-- **Precision**: BF16
-## Dataset
-**Arc-Intelligence/Arc-ATLAS-Teach-v0**
-- Custom dataset designed for adaptive teaching scenarios
-- Formatted with RCL-specific teaching protocols
-- Includes reasoning traces and solution demonstrations
-## Adaptive Teaching Approach
-The model follows a structured teaching protocol:
-### Two-Pass System
-1. **Student Diagnostic**: Brief capability assessment (≤500 tokens)
-2. **Adaptive Response**: Tailored teaching based on diagnosed understanding level
-### Key Features
-- Asymmetric reward structure (2x penalty for performance degradation)
-- Efficiency-aware teaching generation
-- Solution tag enforcement (`<solution></solution>`)
-## Usage
-### Basic Generation
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
-tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
-# Example prompt following RCL format
-prompt = """Question: {problem_text}
-Briefly describe:
-1. What type of problem this is
-2. The key concepts or steps needed
-3. Any potential challenges you see
-Your initial approach:"""
-inputs = tokenizer(prompt, return_tensors="pt")
 outputs = model.generate(
     **inputs,
-    max_new_tokens=2048,
     temperature=0.7,
     do_sample=True
 )
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 ```
-### Teaching Format
-The model expects structured input for optimal teaching generation:
-- Problem statement with clear question
-- Optional student approach for adaptive guidance
-- Responses include `<solution>` tags for final answers
-## Training Configuration
-Key hyperparameters from SFT phase:
-- Learning rate: 1e-5
-- Batch size: Per-device batch size of 1
-- Mixed precision: BF16
-- Gradient accumulation: Optimized for 4 GPU setup
-## Limitations
-- **Pre-RL Checkpoint**: This model has not undergone reinforcement learning optimization
-- **Domain Scope**: Primarily trained on mathematical and reasoning problems
-- **Token Limits**: Student diagnostic capped at 500 tokens for efficiency
-- **Evaluation**: Full benchmark results pending RL phase completion
-## Future Development
-This SFT checkpoint serves as the foundation for:
-- Reinforcement learning with adaptive teaching rewards
-- Student model capability assessment integration
-- Multi-turn teaching dialogue optimization
-## License
-Apache 2.0
-## Repository
-Training code and implementation details: [GitHub - RCL](https://github.com/Arc-Computer/RCL)

 ---
 license: apache-2.0
+language:
+- en
 tags:
+- supervised-fine-tuning
+- teacher-model
+- pedagogy
+- reasoning
+- sft
+base_model: Qwen/Qwen3-8B
 datasets:
+- Arc-Intelligence/Arc-ATLAS-Teach-v0
+model-index:
+- name: ATLAS-8B-Instruct
+  results: []
 ---
+![ATLAS Banner](https://huggingface.co/Arc-Intelligence/ATLAS-8B-Instruct/resolve/main/ATLAS.png)
+# ATLAS-8B-Instruct
+**ATLAS-8B-Instruct** is a specialized teaching model developed by Arc Intelligence. It is the result of the first phase—Supervised Fine-Tuning (SFT)—of the [ATLAS Framework](https://github.com/Arc-Computer/ATLAS).
+This model serves as the crucial foundation for the final reinforcement learning teacher, `ATLAS-8B-Thinking`. It has been trained on the `Arc-ATLAS-Teach-v0` dataset to learn the formats and structures of effective pedagogy, including how to generate high-quality reasoning traces, explanations, and solution demonstrations.
+Think of this model as having memorized the curriculum; it knows what good teaching looks like. It is the essential starting point before the RL phase teaches it *how to adapt* that teaching to individual students.
+## Model's Role in the ATLAS Framework
+The ATLAS training pipeline is a two-stage process:
+1.  **Phase 1: Supervised Fine-Tuning (SFT)** → This is the phase that produces **`ATLAS-8B-Instruct`**. It learns the core knowledge and teaching formats from a static dataset.
+2.  **Phase 2: Reinforcement Learning (RL)** → This phase takes `ATLAS-8B-Instruct` as its starting point and trains it to become an adaptive teacher, resulting in the final `ATLAS-8B-Thinking` model.
+This checkpoint is released for researchers who wish to replicate our work, build upon the SFT foundation, or experiment with the second-stage RL training.
+## How to Use
+`ATLAS-8B-Instruct` is not a general-purpose chat model. It is designed to generate teaching content based on the structured format used in our dataset.
+### Basic Generation Example
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "Arc-Intelligence/ATLAS-8B-Instruct",
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-8B-Instruct")
+# Example prompt following the SFT format
+prompt = """Question: A farmer has 52 trees planted in a row over a length of 1850 meters. What is the distance between each tree?
+Provide a step-by-step explanation to solve this problem."""
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
 outputs = model.generate(
     **inputs,
+    max_new_tokens=512,
     temperature=0.7,
     do_sample=True
 )
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
 ```
+### Continuing to RL Training
+This model is the direct input for the second phase of the ATLAS training pipeline. To use this model as the base for RL training, follow the instructions in the main repository.
+```bash
+# In the ATLAS repository, the RL script is configured
+# to load an SFT checkpoint like this one.
+# Run Phase 2: Reinforcement Learning (RL)
+scripts/launch_with_server.sh 1 3 configs/run/teacher_rcl.yaml
+```
+## Training Details
+- **Base Model:** [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
+- **Training Stage:** Supervised Fine-Tuning (SFT) only
+- **Dataset:** [Arc-Intelligence/Arc-ATLAS-Teach-v0](https://huggingface.co/datasets/Arc-Intelligence/Arc-ATLAS-Teach-v0)
+- **Context Length:** 8192 tokens
+- **Hardware:** 4x H100 GPUs
+- **Precision:** BF16
+- **Framework:** DeepSpeed ZeRO-3
+## Limitations
+-   **Pre-RL Checkpoint**: This model has not undergone the reinforcement learning optimization that teaches adaptive teaching. The full performance gains reported in our paper are only realized after the RL phase.
+-   **Domain Scope**: Primarily trained on the mathematical and reasoning problems present in the `Arc-ATLAS-Teach-v0` dataset.
+-   **Not for Chat**: The model is not intended for conversational use and performs best with prompts that match the SFT data format.
+## Citation
+If you use the ATLAS framework or our models in your research, please cite our work:
+```bibtex
+@misc{barnes2025atlas,
+      title={{ATLAS: Adaptive Teaching and Learning Alignment System for Reinforcement Learning}},
+      author={Jarrod Barnes and Aman Jaglan},
+      year={2025},
+      publisher={Arc Intelligence},
+      note={Technical Report},
+      url={[https://github.com/Arc-Computer/ATLAS](https://github.com/Arc-Computer/ATLAS)}
+}
+```
+## Project Resources
+- **GitHub Repository:** [https://github.com/Arc-Computer/ATLAS](https://github.com/Arc-Computer/ATLAS)
+- **Final RL Model:** [ATLAS-8B-Thinking](https://huggingface.co/Arc-Intelligence/ATLAS-8B-Thinking)
+- **Training Dataset:** [Arc-ATLAS-Teach-v0](https://huggingface.co/datasets/Arc-Intelligence/Arc-ATLAS-Teach-v0)