Instructions to use Emperorizzis/ASTRA-14B-Thinking-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Emperorizzis/ASTRA-14B-Thinking-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Emperorizzis/ASTRA-14B-Thinking-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Emperorizzis/ASTRA-14B-Thinking-v1")
model = AutoModelForCausalLM.from_pretrained("Emperorizzis/ASTRA-14B-Thinking-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Emperorizzis/ASTRA-14B-Thinking-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Emperorizzis/ASTRA-14B-Thinking-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Emperorizzis/ASTRA-14B-Thinking-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Emperorizzis/ASTRA-14B-Thinking-v1

SGLang

How to use Emperorizzis/ASTRA-14B-Thinking-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Emperorizzis/ASTRA-14B-Thinking-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Emperorizzis/ASTRA-14B-Thinking-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Emperorizzis/ASTRA-14B-Thinking-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Emperorizzis/ASTRA-14B-Thinking-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Emperorizzis/ASTRA-14B-Thinking-v1 with Docker Model Runner:
```
docker model run hf.co/Emperorizzis/ASTRA-14B-Thinking-v1
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

ASTRA-14B-Thinking-v1

Model Description

The ASTRA-14B-Thinking-v1 model is derived from Qwen3-14B and specifically optimized for multi-step, tool-augmented tasks, with enhanced agentic capabilities in complex tool use and structured reasoning.

This model was introduced in the paper ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas. ASTRA is a fully automated end-to-end framework for training tool-augmented language model agents via scalable data synthesis and verifiable reinforcement learning.

We also provide a 32B variant ASTRA-32B-Thinking-v1.

Model Performances

ASTRA-Thinking-14B achieves state-of-the-art performance on the BFCL-V3 multi-turn subset at comparable model scales.

Result on BFCL-V3 multi-turn subset:

Data Curation

The training data is built upon two core pillars of automation:

1. Tool-Grounded SFT Data

Key Feature: We constructed an extensive tool pool from 1,585 MCP servers, encompassing 19,036 tools across 41 domains. The data pipeline analyzes schema-level dependencies to generate executable tool-chains, ensuring that the synthesized trajectories are realistic and parameter-satisfiable.
Sample Data: ASTRA-SFT-1k

2. Automated Verifiable Environments Synthesis

Key Feature: To support robust reinforcement learning, we synthesize fully verifiable environments implemented in Python.. These environments are validated via sandboxed execution, providing multi-turn, step-wise verifiable training signals for reinforcement learning.
Sample Data: ASTRA-RL-1k

Training Process

The model is trained in two sequential stages to enhance complex agentic decision-making:

Supervised Fine-Tuning (SFT) Before RL, we cold-start the model with high-quality, multi-turn tool-use trajectories. This stage establishes strong behavioral priors for tool-calling formats, long-context understanding, and complex task planning, while ensuring tool diversity and improving coverage of real-world scenarios.
Reinforcement Learning (RL)
We then conduct multi-turn, tool-integrated Reinforcement Learning with Verifiable Rewards (RLVR). Training uses Adaptive Batch Filling to improve optimization stability and data utilizationand, and adopts batch-level token-loss averaging for more stable and efficient optimization. At each step, actions are executed in a code sandbox and deterministically verified to produce reliable rewards.

Disclaimer

Non-endorsement & liability disclaimer: The model is provided for research and educational purposes only. It does not reflect the views, interests, beliefs, or endorsements of any individual or organization, and should not be interpreted as making claims about any group. The project maintainers disclaim responsibility for any direct or indirect harm or damages arising from the use or misuse of the model or related resources.

Citation

@misc{tian2026astraautomatedsynthesisagentic,
      title={ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas}, 
      author={Xiaoyu Tian and Haotian Wang and Shuaiting Chen and Hao Zhou and Kaichi Yu and Yudian Zhang and Jade Ouyang and Junxi Yin and Jiong Chen and Baoyan Guo and Lei Zhang and Junjie Tao and Yuansheng Song and Ming Cui and Chengwei Liu},
      year={2026},
      eprint={2601.21558},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.21558}, 
}