Instructions to use CajZella/TRICE-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CajZella/TRICE-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="CajZella/TRICE-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("CajZella/TRICE-4B") model = AutoModelForMultimodalLM.from_pretrained("CajZella/TRICE-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use CajZella/TRICE-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "CajZella/TRICE-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CajZella/TRICE-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/CajZella/TRICE-4B
- SGLang
How to use CajZella/TRICE-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "CajZella/TRICE-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CajZella/TRICE-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "CajZella/TRICE-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CajZella/TRICE-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use CajZella/TRICE-4B with Docker Model Runner:
docker model run hf.co/CajZella/TRICE-4B
Teaching Thinking Models to Reason with Tools
A Full-Pipeline Recipe for Tool-Integrated Reasoning
Model Description
TRICE-4B is a tool-integrated reasoning model built on Qwen3-4B-Thinking-2507, capable of Textual Reasoning Interleaved with Code Execution. Tool-integrated reasoning (TIR) offers a direct way to extend thinking models beyond the limits of text-only reasoning. Paradoxically, we observe that tool-enabled evaluation can degrade reasoning performance even when the strong thinking models make almost no actual tool calls. To resolve this inherent instability, we move beyond scattered techniques and propose a systematic, full-pipeline recipe spanning data preparation, SFT, the transition from SFT to RL, and RL itself, with the goal of injecting natural tool-use behavior into a strong thinking model without sacrificing its no-tool reasoning ability. The resulting TRICE-4B achieves state-of-the-art TIR performance, surpassing both existing TIR methods and frontier open-source reasoning models at the same or even larger parameter scales.
Key Highlights
- State-of-the-art <10B TIR performance. TRICE-4B reaches 96.7% on AIME 2025, 86.7% on HMMT 2025, 71.3% on BeyondAIME, and 72.2% average across five competition-level math benchmarks with tools, yielding +14.0% average improvement over the base Qwen3-4B-Thinking-2507.
- No-tool reasoning is preserved. The TIR capability is injected without degrading intrinsic reasoning mode — TRICE-4B retains or even improves the text-only reasoning ability on most benchmarks.
- Cross-domain transfer. Although trained only on math data, the learned interleaved reasoning pattern transfers to different domains, with gains of up to +14.5% on FrontierScience, GPQA-Diamond, and LiveCodeBench.
Performance
We use a consistent configuration of 80K maximum rollout length and up to 128 tool calls in a stateful sandbox.
Generalization
Usage
TRICE-4B supports both text-only and tool-integrated inference. For multi-turn tool-integrated reasoning, we recommend deploying via SGLang and pairing it with a stateful Python sandbox for code execution.
Citation
If you find this model or our recipe useful, please cite:
@misc{cheng2026teachingthinkingmodelsreason,
title={Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning},
author={Qianjia Cheng and Yuchen Zhang and Zhilin Wang and Yuxin Zuo and Shunkai Zhang and Yuchen Fan and Yu Qiao and Bowen Zhou and Ning Ding and Yu Cheng and Yun Luo and Ganqu Cui},
year={2026},
eprint={2605.06326},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.06326},
}
Acknowledgements
TRICE-4B is built on top of Qwen3-4B-Thinking-2507. Training is conducted with the Slime framework. We thank the open-source community for the models, tools, benchmarks, and infrastructure that made this work possible.
- Downloads last month
- 44

