File size: 5,667 Bytes
be48e43 c566ffb 03123aa c566ffb cd84089 c566ffb cd84089 c566ffb cd84089 c566ffb cd84089 c566ffb cd84089 c566ffb cd84089 c566ffb cd84089 c566ffb 59b7eb2 c566ffb 1f8eded 8b953d5 c566ffb be48e43 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
---
license: apache-2.0
---
# JT-Math-8B-Thinking
<p align="center">
<a href="https://www.arxiv.org/abs/2507.19748" target="_blank">
<img src="https://img.shields.io/badge/Paper-ArXiv-red">
</a>
<a href="https://huggingface.co/JT-LM/JT-Math-8B-Thinking" target="_blank">
<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue">
</a>
<a href="https://www.modelscope.cn/models/JiuTian-AI/JT-Math-8B-Thinking" target="_blank">
<img src="https://img.shields.io/badge/%F0%9F%A4%96%20ModelScope-Models-blue">
</a>
</p>
We are excited to present JT-Math-8B-Thinking, a powerful 8-billion parameter model from the JT-Math series, engineered specifically for advanced mathematical reasoning and complex problem-solving. Fine-tuned on a carefully curated, bilingual (English and Chinese) dataset, ensuring high performance on mathematical tasks in both languages. This model has 32,768-token context window, allowing it to process and reason over extensive and intricate problem descriptions.
JT-Math-8B-Thinking has been meticulously optimized to tackle difficult mathematical challenges, achieving state-of-the-art (SOTA) performance on key reasoning benchmarks when compared against models of a similar parameter class. Its development process involves a multi-stage training pipeline designed to maximize its reasoning capabilities.
For full transparency and reproducibility, please refer to our technical report which details our training recipe and pipeline.
## Model Details
The performance of **JT-Math-8B-Thinking** stems from a meticulous, multi-stage training approach aimed at tackling complex mathematical challenges with state-of-the-art accuracy. Building on the **JT-Math-8B-Base** model, its training pipeline involved **Supervised Fine-Tuning (SFT)** using a high-quality, bilingual dataset of intricate math problems. This SFT phase leveraged the model's native **32,768-token context window**, enabling it to comprehend lengthy premises, multi-step instructions, and problems with extensive background information right from the start. Following SFT, an advanced **Reinforcement Learning (RL)** phase further refined its reasoning capabilities. This RL process employed a multi-stage curriculum, gradually introducing problems of increasing difficulty, and was specifically engineered to boost the model's focus and accuracy across the entire 32K context window, ensuring the coherence and precision of even the longest reasoning chains.
## Model Downloads
We release the following models to support a wide range of applications.
| Model Name | Context Length | Hugging Face Link | ModelScope Link | Notes |
| ------------------- | -------------- | ---------------------------------------------------------- | ---------------------------------------------------------- | ---------------------------------------------------------- |
| JT-Math-8B-Thinking | 32K | [Link](https://huggingface.co/JT-LM/JT-Math-8B-Thinking) | [Link](https://www.modelscope.cn/models/JiuTian-AI/JT-Math-8B-Thinking) | The premier model for complex, long-context reasoning. |
------
## Evaluation Results
JT-Math-8B-Thinking achieves competitive performance among open-source models in the ~8B class on mathematical reasoning benchmarks.

## How to Get Started
This example shows how to use the `JT-Math-8B-Thinking` model to solve math problems.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "JT-LM/JT-Math-8B-Thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
prompt = "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
gen_kwargs = {
"do_sample": True,
"temperature": 0.65, # Recommended temperature is 0.65
"max_new_tokens": 32768,
}
generated_ids = model.generate(
**model_inputs,
**gen_kwargs
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
raw_content = tokenizer.decode(output_ids, skip_special_tokens=True)
if "</think>" in raw_content:
thinking_content = raw_content.rsplit("</think>", 1)[0].strip("\n")
content = raw_content.rsplit("</think>", 1)[1].strip("\n")
else:
think_content = raw_content
content = ""
print("raw content:", raw_content)
print("thinking content:", thinking_content)
print("content:", content)
```
## Citation
If you use JT-Math-8B-Thinking in your research, please cite our work:
```latex
@article{jiutian-math2025,
title={JIUTIAN MATH: A MULTI-STAGE FRAMEWORK FOR ADVANCED MATHEMATICAL REASONING IN LARGE LANGUAGE MODELS},
author={Yifan Hao, Fangning Chao, Yaqian Hao, Zhaojun Cui, Huan Bai, Haiyu Zhang, Yankai Liu, Chao Deng, Junlan Feng},
journal={arXiv:2507.19748},
year={2025}
}
``` |