lyk2586 commited on
Commit
13b8496
·
verified ·
1 Parent(s): 52c7248

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -37,8 +37,7 @@ For full transparency and reproducibility, please refer to our technical report
37
 
38
  ## Model Details
39
 
40
- The performance of **JT-Math-8B-Thinking** stems from a meticulous, multi-stage training approach aimed at tackling complex mathematical challenges with state-of-the-art accuracy. Building on the **JT-Math-8B-Base** model, its training pipeline involved **Supervised Fine-Tuning (SFT)** using a high-quality, bilingual dataset of intricate math problems. This SFT phase leveraged the model's native **32,768-token context window**, enabling it to comprehend lengthy premises, multi-step instructions, and problems with extensive background information right from the start. Following SFT, an advanced **Reinforcement Learning (RL)** phase further refined its reasoning capabilities. This RL process employed a multi-stage curriculum, gradually introducing problems of increasing difficulty, and was specifically engineered to boost the model's focus and accuracy across the entire 32K context window, ensuring the coherence and precision of even the longest reasoning chains.
41
-
42
 
43
 
44
 
 
37
 
38
  ## Model Details
39
 
40
+ JT-Math-8B-Thinking achieves its cutting-edge performance on complex mathematical challenges through a rigorous, multi-stage training methodology. Starting with the robust JT-Math-8B-Base model, our pipeline first implemented Supervised Fine-Tuning (SFT). This involved training on a high-quality, bilingual dataset of intricate math problems, capitalizing on the model's impressive native 32,768-token context window. This large context allowed the model to grasp lengthy problem descriptions, multi-step instructions, and extensive background details from the outset. Subsequently, an advanced Reinforcement Learning (RL) phase, incorporating a multi-stage curriculum of progressively harder problems, further honed its reasoning abilities.
 
41
 
42
 
43