Arc-Intelligence
/

ATLAS-8B-Thinking

Text Generation

reinforcement-learning

teacher-student

adaptive-learning

Eval Results (legacy)

text-generation-inference

Model card Files Files and versions

Jarrodbarnes commited on Sep 11, 2025

Commit

125e3d7

·

verified ·

1 Parent(s): 2ab2667

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -38,6 +38,8 @@ model-index:
 # ATLAS-8B-Thinking
 **ATLAS-8B-Thinking** is a specialized teacher model developed by Arc Intelligence, designed to solve the core reliability problem in reinforcement learning for LLMs. Standard RL fine-tuning is often brittle, leading to performance degradation where new skills are learned at the expense of old ones.
 This model reframes the training process as one of **effective pedagogy**. Instead of just optimizing a student model, `ATLAS-8B-Thinking` first uses a lightweight **diagnostic probe** to assess the student's reasoning. Based on this diagnosis, it provides **adaptive guidance**—comprehensive help for struggling models and minimal intervention for capable ones. This "do no harm" approach ensures consistent capability improvement without the usual side effects of RL.

 # ATLAS-8B-Thinking
+![ATLAS Banner](https://huggingface.co/Arc-Intelligence/ATLAS-8B-Thinking/resolve/main/ATLAS.jpg)
 **ATLAS-8B-Thinking** is a specialized teacher model developed by Arc Intelligence, designed to solve the core reliability problem in reinforcement learning for LLMs. Standard RL fine-tuning is often brittle, leading to performance degradation where new skills are learned at the expense of old ones.
 This model reframes the training process as one of **effective pedagogy**. Instead of just optimizing a student model, `ATLAS-8B-Thinking` first uses a lightweight **diagnostic probe** to assess the student's reasoning. Based on this diagnosis, it provides **adaptive guidance**—comprehensive help for struggling models and minimal intervention for capable ones. This "do no harm" approach ensures consistent capability improvement without the usual side effects of RL.