Instructions to use Emperorizzis/ASTRA-14B-Thinking-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Emperorizzis/ASTRA-14B-Thinking-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Emperorizzis/ASTRA-14B-Thinking-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Emperorizzis/ASTRA-14B-Thinking-v1") model = AutoModelForCausalLM.from_pretrained("Emperorizzis/ASTRA-14B-Thinking-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Emperorizzis/ASTRA-14B-Thinking-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Emperorizzis/ASTRA-14B-Thinking-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Emperorizzis/ASTRA-14B-Thinking-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Emperorizzis/ASTRA-14B-Thinking-v1
- SGLang
How to use Emperorizzis/ASTRA-14B-Thinking-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Emperorizzis/ASTRA-14B-Thinking-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Emperorizzis/ASTRA-14B-Thinking-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Emperorizzis/ASTRA-14B-Thinking-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Emperorizzis/ASTRA-14B-Thinking-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Emperorizzis/ASTRA-14B-Thinking-v1 with Docker Model Runner:
docker model run hf.co/Emperorizzis/ASTRA-14B-Thinking-v1
ASTRA-14B-Thinking-v1
Model Description
The ASTRA-14B-Thinking-v1 model is derived from Qwen3-14B and specifically optimized for multi-step, tool-augmented tasks, with enhanced agentic capabilities in complex tool use and structured reasoning.
This model was introduced in the paper ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas. ASTRA is a fully automated end-to-end framework for training tool-augmented language model agents via scalable data synthesis and verifiable reinforcement learning.
We also provide a 32B variant ASTRA-32B-Thinking-v1.
Model Performances
ASTRA-Thinking-14B achieves state-of-the-art performance on the BFCL-V3 multi-turn subset at comparable model scales.
Result on BFCL-V3 multi-turn subset:

Data Curation
The training data is built upon two core pillars of automation:
1. Tool-Grounded SFT Data
- Key Feature: We constructed an extensive tool pool from 1,585 MCP servers, encompassing 19,036 tools across 41 domains. The data pipeline analyzes schema-level dependencies to generate executable tool-chains, ensuring that the synthesized trajectories are realistic and parameter-satisfiable.
- Sample Data: ASTRA-SFT-1k
2. Automated Verifiable Environments Synthesis
Key Feature: To support robust reinforcement learning, we synthesize fully verifiable environments implemented in Python.. These environments are validated via sandboxed execution, providing multi-turn, step-wise verifiable training signals for reinforcement learning.
Sample Data: ASTRA-RL-1k
Training Process
The model is trained in two sequential stages to enhance complex agentic decision-making:
Supervised Fine-Tuning (SFT) Before RL, we cold-start the model with high-quality, multi-turn tool-use trajectories. This stage establishes strong behavioral priors for tool-calling formats, long-context understanding, and complex task planning, while ensuring tool diversity and improving coverage of real-world scenarios.
Reinforcement Learning (RL)
We then conduct multi-turn, tool-integrated Reinforcement Learning with Verifiable Rewards (RLVR). Training uses Adaptive Batch Filling to improve optimization stability and data utilizationand, and adopts batch-level token-loss averaging for more stable and efficient optimization. At each step, actions are executed in a code sandbox and deterministically verified to produce reliable rewards.
Disclaimer
- Non-endorsement & liability disclaimer: The model is provided for research and educational purposes only. It does not reflect the views, interests, beliefs, or endorsements of any individual or organization, and should not be interpreted as making claims about any group. The project maintainers disclaim responsibility for any direct or indirect harm or damages arising from the use or misuse of the model or related resources.
Citation
@misc{tian2026astraautomatedsynthesisagentic,
title={ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas},
author={Xiaoyu Tian and Haotian Wang and Shuaiting Chen and Hao Zhou and Kaichi Yu and Yudian Zhang and Jade Ouyang and Junxi Yin and Jiong Chen and Baoyan Guo and Lei Zhang and Junjie Tao and Yuansheng Song and Ming Cui and Chengwei Liu},
year={2026},
eprint={2601.21558},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.21558},
}
Note: Although the model was trained with bf16 precision, verl saves checkpoints in float32 by default, and we did not change this setting.
- Downloads last month
- 7