Emperorizzis
/

ASTRA-32B-Thinking-v1

@@ -1,7 +1,9 @@
 ---
-license: apache-2.0
 base_model:
 - Qwen/Qwen3-32B
 ---
 [![GitHub](https://img.shields.io/badge/GitHub-Astra-blue?logo=github)](https://github.com/LianjiaTech/astra)
@@ -14,6 +16,8 @@ base_model:
 The **ASTRA-32B-Thinking-v1** model is derived from [**Qwen3-32B**](https://huggingface.co/Qwen/Qwen3-32B) and specifically optimized for multi-step, tool-augmented tasks, with enhanced agentic capabilities in complex tool use and structured reasoning.
 We also provide a **14B** variant [**ASTRA-14B-Thinking-v1**](https://huggingface.co/Emperorizzis/ASTRAL-14B-Thinking-v1).
@@ -69,4 +73,4 @@ We then conduct **multi-turn**, **tool-integrated** Reinforcement Learning with
 }
 ```
-> **Note**: Although the model was trained with bf16 precision, verl saves checkpoints in float32 by default, and we did not change this setting.

 ---
 base_model:
 - Qwen/Qwen3-32B
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
 ---
 [![GitHub](https://img.shields.io/badge/GitHub-Astra-blue?logo=github)](https://github.com/LianjiaTech/astra)
 The **ASTRA-32B-Thinking-v1** model is derived from [**Qwen3-32B**](https://huggingface.co/Qwen/Qwen3-32B) and specifically optimized for multi-step, tool-augmented tasks, with enhanced agentic capabilities in complex tool use and structured reasoning.
+The model was introduced in the paper [ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas](https://huggingface.co/papers/2601.21558).
 We also provide a **14B** variant [**ASTRA-14B-Thinking-v1**](https://huggingface.co/Emperorizzis/ASTRAL-14B-Thinking-v1).
 }
 ```
+> **Note**: Although the model was trained with bf16 precision, verl saves checkpoints in float32 by default, and we did not change this setting.