rokugatsu commited on
Commit
dce3149
·
verified ·
1 Parent(s): 561bb07

Upload DPO-trained Qwen3-4B-Instruct-2507 model

Browse files
README.md CHANGED
@@ -35,7 +35,7 @@ and decrease the likelihood of 'rejected' responses for given prompts.
35
  - DPO Dataset: u-10bei/sft_alfworld_trajectory_dataset_v2
36
  - DPO Method: Direct Preference Optimization (DPO)
37
  - Max sequence length: 2048
38
- - Epochs: 2
39
  - Learning rate: 2e-06
40
  - Beta parameter (DPO loss): 0.1
41
 
 
35
  - DPO Dataset: u-10bei/sft_alfworld_trajectory_dataset_v2
36
  - DPO Method: Direct Preference Optimization (DPO)
37
  - Max sequence length: 2048
38
+ - Epochs: 0.25
39
  - Learning rate: 2e-06
40
  - Beta parameter (DPO loss): 0.1
41
 
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:63dedf6f455c59df1553bfcd6beffc6c62bffa093364850f145810e425d9639e
3
  size 4967215360
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce13f7ebf58fb0c58a7bb0c9701ac69735db8fab5e9ab7c2a2bc2520da5245c8
3
  size 4967215360
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8a34372ba04244bd2fc6859164df263f14260af837054a3dfe626fccf22a4d45
3
  size 3077766632
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8df0319703594e6ed3546f806b709173f49bbe9eac57044412b0281fae7db0c3
3
  size 3077766632