Arc-Intelligence
/

ATLAS-8B-Instruct

@@ -1,75 +1,118 @@
 ---
 license: apache-2.0
----
----
 base_model: Qwen/Qwen3-8B
 tags:
   - adaptive-teaching
   - reinforcement-learning
   - educational
 datasets:
   - Arc-Intelligence/Arc-ATLAS-Teach-v0
 language:
   - en
 library_name: transformers
 ---
 # ATLAS-Teach-8B-Instruct
-An adaptive teaching model trained using the Reinforcement Collaborative Learning (RCL) framework. This is the supervised fine-tuning (SFT) checkpoint before reinforcement learning.
 ## Model Details
 - **Base Model**: Qwen/Qwen3-8B
-- **Model Size**: 8B parameters
-- **Training Stage**: Supervised Fine-tuning (Pre-RL)
-- **Framework**: RCL (Reinforcement Collaborative Learning)
-## Training Data
-Trained on `Arc-Intelligence/Arc-ATLAS-Teach-v0` dataset with RCL-specific formatting for adaptive teaching.
-## Intended Use
-This model is designed for:
-- Adaptive teaching based on student capability assessment
-- Educational content generation
-- Problem-solving assistance with tailored explanations
-## Training Configuration
-- **Hardware**: 8x H100 GPUs
-- **Framework**: RCL
-- **Mixed Precision**: BF16
-## Adaptive Teaching Protocol
-The model implements a two-pass teaching approach:
-1. **Diagnostic Probing**: Assesses student understanding with minimal interaction
-2. **Adaptive Teaching**: Generates tailored teaching based on diagnosed capability
 ## Usage
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
 tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
-# Format your input according to the RCL teaching protocol
-prompt = "Question: {your_question}\n\nProvide adaptive teaching:"
 inputs = tokenizer(prompt, return_tensors="pt")
-outputs = model.generate(**inputs)
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 ```
 ## Limitations
-- This is a pre-RL checkpoint; the full RCL training includes an additional RL phase
-- Performance metrics on specific benchmarks are being evaluated
 ## License
-Apache 2.0

 ---
 license: apache-2.0
 base_model: Qwen/Qwen3-8B
 tags:
   - adaptive-teaching
   - reinforcement-learning
   - educational
+  - reasoning
 datasets:
   - Arc-Intelligence/Arc-ATLAS-Teach-v0
 language:
   - en
 library_name: transformers
+pipeline_tag: text-generation
 ---
 # ATLAS-Teach-8B-Instruct
+A supervised fine-tuned teaching model that forms the foundation for Reinforcement Collaborative Learning (RCL). This checkpoint represents the initial teaching capability before reinforcement learning optimization.
 ## Model Details
+### Architecture
 - **Base Model**: Qwen/Qwen3-8B
+- **Parameters**: 8B
+- **Context Length**: 16,384 tokens
+- **Training Stage**: Supervised Fine-tuning (SFT)
+### Training Framework
+- **Method**: Reinforcement Collaborative Learning (RCL) - SFT phase
+- **Hardware**: 4x H100 GPUs
+- **Optimization**: DeepSpeed ZeRO-3
+- **Precision**: BF16
+## Dataset
+**Arc-Intelligence/Arc-ATLAS-Teach-v0**
+- Custom dataset designed for adaptive teaching scenarios
+- Formatted with RCL-specific teaching protocols
+- Includes reasoning traces and solution demonstrations
+## Adaptive Teaching Approach
+The model follows a structured teaching protocol:
+### Two-Pass System
+1. **Student Diagnostic**: Brief capability assessment (≤500 tokens)
+2. **Adaptive Response**: Tailored teaching based on diagnosed understanding level
+### Key Features
+- Asymmetric reward structure (2x penalty for performance degradation)
+- Efficiency-aware teaching generation
+- Solution tag enforcement (`<solution></solution>`)
 ## Usage
+### Basic Generation
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
 tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
+# Example prompt following RCL format
+prompt = """Question: {problem_text}
+Briefly describe:
+1. What type of problem this is
+2. The key concepts or steps needed
+3. Any potential challenges you see
+Your initial approach:"""
 inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=2048,
+    temperature=0.7,
+    do_sample=True
+)
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 ```
+### Teaching Format
+The model expects structured input for optimal teaching generation:
+- Problem statement with clear question
+- Optional student approach for adaptive guidance
+- Responses include `<solution>` tags for final answers
+## Training Configuration
+Key hyperparameters from SFT phase:
+- Learning rate: 1e-5
+- Batch size: Per-device batch size of 1
+- Mixed precision: BF16
+- Gradient accumulation: Optimized for 4 GPU setup
 ## Limitations
+- **Pre-RL Checkpoint**: This model has not undergone reinforcement learning optimization
+- **Domain Scope**: Primarily trained on mathematical and reasoning problems
+- **Token Limits**: Student diagnostic capped at 500 tokens for efficiency
+- **Evaluation**: Full benchmark results pending RL phase completion
+## Future Development
+This SFT checkpoint serves as the foundation for:
+- Reinforcement learning with adaptive teaching rewards
+- Student model capability assessment integration
+- Multi-turn teaching dialogue optimization
 ## License
+Apache 2.0
+## Repository
+Training code and implementation details: [GitHub - RCL](https://github.com/Arc-Computer/RCL)