caiyuchen commited on
Commit
38abaca
·
verified ·
1 Parent(s): 16b9017

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +16 -23
README.md CHANGED
@@ -1,38 +1,31 @@
1
 
2
- ---
3
  license: apache-2.0
4
  tags:
5
- - math
6
- - rl
7
- - qwen3
8
- - dapomath17k
9
  library_name: transformers
10
  pipeline_tag: text-generation
11
  language: en
12
- ---
13
-
14
- # DAPO RL Checkpoint - Step 21
15
-
16
- This model is a reinforcement learning fine-tuned version of **Qwen3-8B-Base**, trained on the **DAPO-Math-17k** dataset using the **DAPO-style RL paradigm**.
17
-
18
- - **Base Model**: Qwen3-8B-Base
19
- - **Training Method**: Reinforcement Learning (DAPO)
20
- - **Dataset**: DAPO-Math-17k
21
- - **Checkpoint**: global_step_21
22
-
23
  ---
24
 
25
  ## 🔧 Prompt Format (Chat Template)
26
 
27
  During RL training and inference, each question is formatted as:
28
- {question}
29
- Please reason step by step, and put your final answer within oxed{}
30
 
31
  Then wrapped using the chat template:
32
 
33
  ```python
34
  prompt = tokenizer.apply_chat_template(
35
- [{"content": question_with_instruction, "role": "user"}],
36
  tokenize=False,
37
  add_generation_prompt=True,
38
  )
@@ -44,16 +37,16 @@ prompt = tokenizer.apply_chat_template(
44
  ```python
45
  from transformers import AutoModelForCausalLM, AutoTokenizer
46
 
47
- model = AutoModelForCausalLM.from_pretrained("caiyuchen/DAPO-step-21")
48
- tokenizer = AutoTokenizer.from_pretrained("caiyuchen/DAPO-step-21")
49
 
50
  question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
51
  question_with_instruction = question + "
52
- Please reason step by step, and put your final answer within \boxed{}"
53
 
54
  # Apply chat template
55
  prompt = tokenizer.apply_chat_template(
56
- [{"content": question_with_instruction, "role": "user"}],
57
  tokenize=False,
58
  add_generation_prompt=True,
59
  )
 
1
 
2
+ ---
3
  license: apache-2.0
4
  tags:
5
+ - math
6
+ - rl
7
+ - qwen3
8
+ - dapomath17k
9
  library_name: transformers
10
  pipeline_tag: text-generation
11
  language: en
12
+ datasets:
13
+ - BytedTsinghua-SIA/DAPO-Math-17k
14
+ base_model:
15
+ - Qwen/Qwen3-8B-Base
 
 
 
 
 
 
 
16
  ---
17
 
18
  ## 🔧 Prompt Format (Chat Template)
19
 
20
  During RL training and inference, each question is formatted as:
21
+ {{question}}
22
+ Please reason step by step, and put your final answer within boxed{{}}
23
 
24
  Then wrapped using the chat template:
25
 
26
  ```python
27
  prompt = tokenizer.apply_chat_template(
28
+ [{{"content": question_with_instruction, "role": "user"}}],
29
  tokenize=False,
30
  add_generation_prompt=True,
31
  )
 
37
  ```python
38
  from transformers import AutoModelForCausalLM, AutoTokenizer
39
 
40
+ model = AutoModelForCausalLM.from_pretrained("caiyuchen/DAPO-step-{i}")
41
+ tokenizer = AutoTokenizer.from_pretrained("caiyuchen/DAPO-step-{i}")
42
 
43
  question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$"
44
  question_with_instruction = question + "
45
+ Please reason step by step, and put your final answer within \boxed{{}}"
46
 
47
  # Apply chat template
48
  prompt = tokenizer.apply_chat_template(
49
+ [{{"content": question_with_instruction, "role": "user"}}],
50
  tokenize=False,
51
  add_generation_prompt=True,
52
  )