wheattoast11 commited on
Commit
e1cde7a
·
verified ·
1 Parent(s): 2583150

Training in progress, step 25

Browse files
README.md CHANGED
@@ -4,11 +4,10 @@ library_name: transformers
4
  model_name: OmniCoder-9B-Zero-Phase1
5
  tags:
6
  - generated_from_trainer
7
- - hf_jobs
8
- - trackio
9
- - trl
10
- - trackio:https://wheattoast11-trackio.hf.space?project=zero-rl&runs=phase1-grpo-tool-calling&sidebar=collapsed
11
  - grpo
 
 
 
12
  licence: license
13
  ---
14
 
@@ -31,14 +30,14 @@ print(output["generated_text"])
31
  ## Training procedure
32
 
33
 
34
- [<img src="https://raw.githubusercontent.com/gradio-app/trackio/refs/heads/main/trackio/assets/badge.png" alt="Visualize in Trackio" title="Visualize in Trackio" width="150" height="24"/>](https://wheattoast11-trackio.hf.space?project=zero-rl&runs=phase1-grpo-tool-calling&sidebar=collapsed)
35
 
36
 
37
  This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
38
 
39
  ### Framework versions
40
 
41
- - TRL: 0.29.1
42
  - Transformers: 5.4.0
43
  - Pytorch: 2.11.0
44
  - Datasets: 4.8.4
@@ -55,7 +54,6 @@ Cite GRPO as:
55
  year = 2024,
56
  eprint = {arXiv:2402.03300},
57
  }
58
-
59
  ```
60
 
61
  Cite TRL as:
 
4
  model_name: OmniCoder-9B-Zero-Phase1
5
  tags:
6
  - generated_from_trainer
 
 
 
 
7
  - grpo
8
+ - trl
9
+ - hf_jobs
10
+ - trackio:https://wheattoast11-trackio.hf.space?project=zero-rl&runs=phase1-grpo-carl&sidebar=collapsed
11
  licence: license
12
  ---
13
 
 
30
  ## Training procedure
31
 
32
 
33
+ [<img src="https://raw.githubusercontent.com/gradio-app/trackio/refs/heads/main/trackio/assets/badge.png" alt="Visualize in Trackio" title="Visualize in Trackio" width="150" height="24"/>](https://wheattoast11-trackio.hf.space?project=zero-rl&runs=phase1-grpo-carl&sidebar=collapsed)
34
 
35
 
36
  This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
37
 
38
  ### Framework versions
39
 
40
+ - TRL: 1.0.0
41
  - Transformers: 5.4.0
42
  - Pytorch: 2.11.0
43
  - Datasets: 4.8.4
 
54
  year = 2024,
55
  eprint = {arXiv:2402.03300},
56
  }
 
57
  ```
58
 
59
  Cite TRL as:
adapter_config.json CHANGED
@@ -29,13 +29,13 @@
29
  "rank_pattern": {},
30
  "revision": null,
31
  "target_modules": [
32
- "o_proj",
33
  "down_proj",
34
- "up_proj",
35
  "gate_proj",
 
 
36
  "k_proj",
37
- "v_proj",
38
- "q_proj"
39
  ],
40
  "target_parameters": null,
41
  "task_type": "CAUSAL_LM",
 
29
  "rank_pattern": {},
30
  "revision": null,
31
  "target_modules": [
32
+ "v_proj",
33
  "down_proj",
 
34
  "gate_proj",
35
+ "up_proj",
36
+ "q_proj",
37
  "k_proj",
38
+ "o_proj"
 
39
  ],
40
  "target_parameters": null,
41
  "task_type": "CAUSAL_LM",
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0cb818e838ca7d176b8401520ebb127c63e56e3644e2b7a6a8abe548ce47be88
3
  size 232818320
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b790a795e153f111ad7426c8e95268b1d57f091533a777007bcf634e91aaa4b2
3
  size 232818320
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2684b3deb1fb9e7a568591cf42459a21e00d8caeca8d7e424e8301cf1fd8828b
3
- size 7121
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c69cc6976b95d3503a0e7b3ef5fab8913834b8902f107b0affad90a5b67ead28
3
+ size 7313