SaminSkyfall commited on
Commit
28edc48
·
verified ·
1 Parent(s): b355892

Training in progress, step 5

Browse files
README.md CHANGED
@@ -1,17 +1,16 @@
1
  ---
2
- base_model: Qwen/Qwen3-0.6B
3
  library_name: transformers
4
- model_name: sft_full_0.6_pre
5
  tags:
6
  - generated_from_trainer
7
  - trl
8
- - sft
9
  licence: license
10
  ---
11
 
12
- # Model Card for sft_full_0.6_pre
13
 
14
- This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
  ## Quick start
@@ -20,17 +19,17 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
20
  from transformers import pipeline
21
 
22
  question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="SaminSkyfall/sft_full_0.6_pre", device="cuda")
24
  output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
  print(output["generated_text"])
26
  ```
27
 
28
  ## Training procedure
29
 
30
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/samin-skyfall-ai/huggingface/runs/3d7ylu2c)
31
 
32
 
33
- This model was trained with SFT.
34
 
35
  ### Framework versions
36
 
@@ -42,7 +41,18 @@ This model was trained with SFT.
42
 
43
  ## Citations
44
 
 
45
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  Cite TRL as:
48
 
 
1
  ---
 
2
  library_name: transformers
3
+ model_name: dpo_full_1.7
4
  tags:
5
  - generated_from_trainer
6
  - trl
7
+ - dpo
8
  licence: license
9
  ---
10
 
11
+ # Model Card for dpo_full_1.7
12
 
13
+ This model is a fine-tuned version of [None](https://huggingface.co/None).
14
  It has been trained using [TRL](https://github.com/huggingface/trl).
15
 
16
  ## Quick start
 
19
  from transformers import pipeline
20
 
21
  question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
22
+ generator = pipeline("text-generation", model="SaminSkyfall/dpo_full_1.7", device="cuda")
23
  output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
24
  print(output["generated_text"])
25
  ```
26
 
27
  ## Training procedure
28
 
29
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/samin-skyfall-ai/huggingface/runs/lc17gapu)
30
 
31
 
32
+ This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
33
 
34
  ### Framework versions
35
 
 
41
 
42
  ## Citations
43
 
44
+ Cite DPO as:
45
 
46
+ ```bibtex
47
+ @inproceedings{rafailov2023direct,
48
+ title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
49
+ author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
50
+ year = 2023,
51
+ booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
52
+ url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
53
+ editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
54
+ }
55
+ ```
56
 
57
  Cite TRL as:
58
 
adapter_config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
- "base_model_name_or_path": "Qwen/Qwen3-1.7B",
5
  "bias": "none",
6
  "corda_config": null,
7
  "eva_config": null,
@@ -18,21 +18,24 @@
18
  "lora_dropout": 0.0,
19
  "megatron_config": null,
20
  "megatron_core": "megatron.core",
21
- "modules_to_save": null,
 
 
 
22
  "peft_type": "LORA",
23
  "r": 32,
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
27
  "up_proj",
28
- "q_proj",
29
  "gate_proj",
30
  "o_proj",
31
  "k_proj",
32
  "down_proj",
33
- "v_proj"
 
34
  ],
35
- "task_type": "CAUSAL_LM",
36
  "trainable_token_indices": null,
37
  "use_dora": false,
38
  "use_rslora": false
 
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
+ "base_model_name_or_path": "Qwen/Qwen3-0.6B",
5
  "bias": "none",
6
  "corda_config": null,
7
  "eva_config": null,
 
18
  "lora_dropout": 0.0,
19
  "megatron_config": null,
20
  "megatron_core": "megatron.core",
21
+ "modules_to_save": [
22
+ "classifier",
23
+ "score"
24
+ ],
25
  "peft_type": "LORA",
26
  "r": 32,
27
  "rank_pattern": {},
28
  "revision": null,
29
  "target_modules": [
30
  "up_proj",
 
31
  "gate_proj",
32
  "o_proj",
33
  "k_proj",
34
  "down_proj",
35
+ "v_proj",
36
+ "q_proj"
37
  ],
38
+ "task_type": "SEQ_CLS",
39
  "trainable_token_indices": null,
40
  "use_dora": false,
41
  "use_rslora": false
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fc980bc3a09519384af5e0b77d15073ef269a7af6e5192ff97486fa1d14d071c
3
- size 139512976
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b4852f593289d8d754fb266ca714000259357a59ed79306d2de4f8dba95c8d0
3
+ size 80796648
config.json CHANGED
@@ -23,7 +23,7 @@
23
  "rope_theta": 1000000,
24
  "sliding_window": null,
25
  "tie_word_embeddings": true,
26
- "torch_dtype": "float16",
27
  "transformers_version": "4.51.3",
28
  "use_cache": true,
29
  "use_sliding_window": false,
 
23
  "rope_theta": 1000000,
24
  "sliding_window": null,
25
  "tie_word_embeddings": true,
26
+ "torch_dtype": "float32",
27
  "transformers_version": "4.51.3",
28
  "use_cache": true,
29
  "use_sliding_window": false,
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:174c6d288e87934a230721c5d098f7e5437de1c52f163a0b628628b9d70329d5
3
  size 4969539560
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:22ec6d375dfdbdaa1a7fa473ec529a3f4bb6b5f096aa185eb87bec70ef2dc35b
3
  size 4969539560
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:aa6d7b6fa4e7b23181588fa473b007f5160094762338ba38b346688fefd8d1f0
3
  size 1912795688
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:753d24c6c821e2a4ddd49039983732d238c3f062c6cd92361141e9f9d4e843fa
3
  size 1912795688
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:888bfd9d9a3a3086b0ec0f1f1c63d1c7f032ef68e2a3082a9d85326812373d0d
3
  size 6033
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce90abc86c871f836a6032a8a14fa996159108621f757ccb76c6a4b82a4a59fd
3
  size 6033