Thrillcrazyer commited on
Commit
35d7d04
·
verified ·
1 Parent(s): ed10480

Training in progress, step 100

Browse files
README.md CHANGED
@@ -1,18 +1,17 @@
1
  ---
2
  base_model: Qwen/Qwen2.5-7B-Instruct
3
- datasets: DeepMath-103k
4
  library_name: transformers
5
  model_name: QWEN7_THIP
6
  tags:
7
  - generated_from_trainer
8
- - trl
9
  - grpo
 
10
  licence: license
11
  ---
12
 
13
  # Model Card for QWEN7_THIP
14
 
15
- This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on the [DeepMath-103k](https://huggingface.co/datasets/DeepMath-103k) dataset.
16
  It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
  ## Quick start
@@ -28,7 +27,7 @@ print(output["generated_text"])
28
 
29
  ## Training procedure
30
 
31
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/pthpark1/THIP_COMPARE_QWEN7/runs/606inito)
32
 
33
 
34
  This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
 
1
  ---
2
  base_model: Qwen/Qwen2.5-7B-Instruct
 
3
  library_name: transformers
4
  model_name: QWEN7_THIP
5
  tags:
6
  - generated_from_trainer
 
7
  - grpo
8
+ - trl
9
  licence: license
10
  ---
11
 
12
  # Model Card for QWEN7_THIP
13
 
14
+ This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
  ## Quick start
 
27
 
28
  ## Training procedure
29
 
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/pthpark1/THIP_COMPARE_QWEN7/runs/ayndenuq)
31
 
32
 
33
  This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
model-00001-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:abb043cdd898bf230ad4900341c0ced1c8dbcfc0417fea686ece626dc89f5bda
3
  size 4877660776
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9680dafec67b89aaa46f7a2901a9e62cd995742e855ba1c3574f9816aa1ad8e8
3
  size 4877660776
model-00002-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d6a0a251b82f46269774f454230cb3a648a554356a978674c93fc384fddd3b26
3
  size 4932751008
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9845c0d6b43877c33c70bbcb9dd092d83eab709ad8366cb341f2b5d88e0950d7
3
  size 4932751008
model-00003-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2e949a9e49dc00a6cd635d562b96159a73a02fb584a5e63e2e6ef28761d4fbe3
3
  size 4330865200
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0576e362c907628f077efed4ba9f0d063cc6dd9876d6031b273234ebdfe1fa4e
3
  size 4330865200
model-00004-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:957269831bdd6e2896a7c0a379f902da467865fc735d13c682e77a952036c881
3
  size 1089994880
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ee9a53587de644c4bb737559fc9a74aa4e6c922bf623451f678901f89e7568f
3
  size 1089994880