caiyuchen
/

DAPO-step-21

@@ -1,38 +1,31 @@
-    ---
 license: apache-2.0
 tags:
-  - math
-  - rl
-  - qwen3
-  - dapomath17k
 library_name: transformers
 pipeline_tag: text-generation
 language: en
----
-# DAPO RL Checkpoint - Step 21
-This model is a reinforcement learning fine-tuned version of **Qwen3-8B-Base**, trained on the **DAPO-Math-17k** dataset using the **DAPO-style RL paradigm**.
-- **Base Model**: Qwen3-8B-Base
-- **Training Method**: Reinforcement Learning (DAPO)
-- **Dataset**: DAPO-Math-17k
-- **Checkpoint**: global_step_21
 ---
 ## 🔧 Prompt Format (Chat Template)
 During RL training and inference, each question is formatted as:
-{question}
-Please reason step by step, and put your final answer within oxed{}
 Then wrapped using the chat template:
 ```python
 prompt = tokenizer.apply_chat_template(
-    [{"content": question_with_instruction, "role": "user"}],
     tokenize=False,
     add_generation_prompt=True,
 )
@@ -44,16 +37,16 @@ prompt = tokenizer.apply_chat_template(
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained("caiyuchen/DAPO-step-21")
-tokenizer = AutoTokenizer.from_pretrained("caiyuchen/DAPO-step-21")
 question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
 question_with_instruction = question + "
-Please reason step by step, and put your final answer within \boxed{}"
 # Apply chat template
 prompt = tokenizer.apply_chat_template(
-    [{"content": question_with_instruction, "role": "user"}],
     tokenize=False,
     add_generation_prompt=True,
 )

+---
 license: apache-2.0
 tags:
+- math
+- rl
+- qwen3
+- dapomath17k
 library_name: transformers
 pipeline_tag: text-generation
 language: en
+datasets:
+- BytedTsinghua-SIA/DAPO-Math-17k
+base_model:
+- Qwen/Qwen3-8B-Base
 ---
 ## 🔧 Prompt Format (Chat Template)
 During RL training and inference, each question is formatted as:
+{{question}}
+Please reason step by step, and put your final answer within boxed{{}}
 Then wrapped using the chat template:
 ```python
 prompt = tokenizer.apply_chat_template(
+    [{{"content": question_with_instruction, "role": "user"}}],
     tokenize=False,
     add_generation_prompt=True,
 )
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("caiyuchen/DAPO-step-{i}")
+tokenizer = AutoTokenizer.from_pretrained("caiyuchen/DAPO-step-{i}")
 question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
 question_with_instruction = question + "
+Please reason step by step, and put your final answer within \boxed{{}}"
 # Apply chat template
 prompt = tokenizer.apply_chat_template(
+    [{{"content": question_with_instruction, "role": "user"}}],
     tokenize=False,
     add_generation_prompt=True,
 )