Instructions to use CajZella/TRICE-30B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CajZella/TRICE-30B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CajZella/TRICE-30B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("CajZella/TRICE-30B")
model = AutoModelForCausalLM.from_pretrained("CajZella/TRICE-30B", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use CajZella/TRICE-30B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CajZella/TRICE-30B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CajZella/TRICE-30B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/CajZella/TRICE-30B

SGLang

How to use CajZella/TRICE-30B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CajZella/TRICE-30B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CajZella/TRICE-30B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CajZella/TRICE-30B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CajZella/TRICE-30B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use CajZella/TRICE-30B with Docker Model Runner:
```
docker model run hf.co/CajZella/TRICE-30B
```

Teaching Thinking Models to Reason with Tools

A Full-Pipeline Recipe for Tool-Integrated Reasoning

Model Description

TRICE-30B is a tool-integrated reasoning model built on Qwen3-30B-A3B-Thinking-2507, capable of Textual Reasoning Interleaved with Code Execution. Tool-integrated reasoning (TIR) offers a direct way to extend thinking models beyond the limits of text-only reasoning. Paradoxically, we observe that tool-enabled evaluation can degrade reasoning performance even when the strong thinking models make almost no actual tool calls. To resolve this inherent instability, we move beyond scattered techniques and propose a systematic, full-pipeline recipe spanning data preparation, SFT, the transition from SFT to RL, and RL itself, with the goal of injecting natural tool-use behavior into a strong thinking model without sacrificing its no-tool reasoning ability. The resulting TRICE-30B achieves state-of-the-art TIR performance, surpassing both existing TIR methods and frontier open-source reasoning models at the same or even larger parameter scales.

Key Highlights

State-of-the-art ~30B TIR performance. TRICE-30B reaches 99.2% on AIME 2025, 92.5% on HMMT 2025, 82.5% on BeyondAIME, and 81.9% average across five competition-level math benchmarks with tools, yielding +14.8% average improvement over the base Qwen3-30B-A3B-Thinking-2507.
No-tool reasoning is preserved. The TIR capability is injected without degrading intrinsic reasoning mode — TRICE-30B retains or even improves the text-only reasoning ability on most benchmarks.
Strong on the hardest problems. On APEX 2025, a collection of national and international Olympiad problems where most open-source models score near zero, TRICE-30B reaches 16.7%.
Surpasses much larger text-only models. TRICE-30B with tools outperforms Qwen3-235B-A22B-Thinking and DeepSeek-V3.2-Thinking in text-only mode on HMMT 2025, BeyondAIME, and IMOAnswerBench.
Cross-domain transfer. Although trained only on math data, the learned interleaved reasoning pattern transfers to different domains, with gains of up to +11.7% on FrontierScience, GPQA-Diamond, and LiveCodeBench.

Performance

Unless otherwise noted, all models are evaluated under our unified protocol on five competition-level benchmarks: AIME 2025, HMMT 2025, BeyondAIME, IMOAnswerBench, and APEX 2025. Every question is repeated 8 times to ensure reproducibility. We use a consistent configuration of 80K maximum rollout length and up to 128 tool calls in a stateful sandbox.

Generalization

Usage

TRICE-30B supports both text-only and tool-integrated inference. For multi-turn tool-integrated reasoning, we recommend deploying via SGLang and pairing it with a stateful Python sandbox for code execution.

Citation

If you find this model or our recipe useful, please cite:

@misc{cheng2026teachingthinkingmodelsreason,
      title={Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning}, 
      author={Qianjia Cheng and Yuchen Zhang and Zhilin Wang and Yuxin Zuo and Shunkai Zhang and Yuchen Fan and Yu Qiao and Bowen Zhou and Ning Ding and Yu Cheng and Yun Luo and Ganqu Cui},
      year={2026},
      eprint={2605.06326},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.06326}, 
}

Acknowledgements

TRICE-30B is built on top of Qwen3-30B-A3B-Thinking-2507. Training is conducted with the Slime framework. We thank the open-source community for the models, tools, benchmarks, and infrastructure that made this work possible.