Ceachi commited on
Commit
de5c232
·
verified ·
1 Parent(s): 56d51ae

Training in progress, step 500

Browse files
Files changed (5) hide show
  1. README.md +10 -12
  2. config.json +1 -0
  3. model.safetensors +1 -1
  4. special_tokens_map.json +3 -21
  5. training_args.bin +2 -2
README.md CHANGED
@@ -1,17 +1,17 @@
1
  ---
2
- base_model: Ceachi/HW2-supervised
3
  library_name: transformers
4
  model_name: HW2-dpo
5
  tags:
6
  - generated_from_trainer
 
7
  - trl
8
- - dpo
9
  licence: license
10
  ---
11
 
12
  # Model Card for HW2-dpo
13
 
14
- This model is a fine-tuned version of [Ceachi/HW2-supervised](https://huggingface.co/Ceachi/HW2-supervised).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
  ## Quick start
@@ -30,7 +30,7 @@ print(output["generated_text"])
30
 
31
 
32
 
33
- This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
34
 
35
  ### Framework versions
36
 
@@ -42,16 +42,14 @@ This model was trained with DPO, a method introduced in [Direct Preference Optim
42
 
43
  ## Citations
44
 
45
- Cite DPO as:
46
 
47
  ```bibtex
48
- @inproceedings{rafailov2023direct,
49
- title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
50
- author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
51
- year = 2023,
52
- booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
53
- url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
54
- editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
55
  }
56
  ```
57
 
 
1
  ---
2
+ base_model: openai-community/gpt2
3
  library_name: transformers
4
  model_name: HW2-dpo
5
  tags:
6
  - generated_from_trainer
7
+ - orpo
8
  - trl
 
9
  licence: license
10
  ---
11
 
12
  # Model Card for HW2-dpo
13
 
14
+ This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
  ## Quick start
 
30
 
31
 
32
 
33
+ This model was trained with ORPO, a method introduced in [ORPO: Monolithic Preference Optimization without Reference Model](https://huggingface.co/papers/2403.07691).
34
 
35
  ### Framework versions
36
 
 
42
 
43
  ## Citations
44
 
45
+ Cite ORPO as:
46
 
47
  ```bibtex
48
+ @article{hong2024orpo,
49
+ title = {{ORPO: Monolithic Preference Optimization without Reference Model}},
50
+ author = {Jiwoo Hong and Noah Lee and James Thorne},
51
+ year = 2024,
52
+ eprint = {arXiv:2403.07691}
 
 
53
  }
54
  ```
55
 
config.json CHANGED
@@ -16,6 +16,7 @@
16
  "n_inner": null,
17
  "n_layer": 12,
18
  "n_positions": 1024,
 
19
  "reorder_and_upcast_attn": false,
20
  "resid_pdrop": 0.1,
21
  "scale_attn_by_inverse_layer_idx": false,
 
16
  "n_inner": null,
17
  "n_layer": 12,
18
  "n_positions": 1024,
19
+ "pad_token_id": 50256,
20
  "reorder_and_upcast_attn": false,
21
  "resid_pdrop": 0.1,
22
  "scale_attn_by_inverse_layer_idx": false,
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0b0d5761e1ed88e8383da4473c2337985fc6f8c80fb57eb37c61abe25728bc47
3
  size 497774208
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8cada5223d12d3fbd792f04699f4456fd9ca3a957cf29905e8d78ebf8e6c1868
3
  size 497774208
special_tokens_map.json CHANGED
@@ -1,24 +1,6 @@
1
  {
2
- "bos_token": {
3
- "content": "<|endoftext|>",
4
- "lstrip": false,
5
- "normalized": true,
6
- "rstrip": false,
7
- "single_word": false
8
- },
9
- "eos_token": {
10
- "content": "<|endoftext|>",
11
- "lstrip": false,
12
- "normalized": true,
13
- "rstrip": false,
14
- "single_word": false
15
- },
16
  "pad_token": "<|endoftext|>",
17
- "unk_token": {
18
- "content": "<|endoftext|>",
19
- "lstrip": false,
20
- "normalized": true,
21
- "rstrip": false,
22
- "single_word": false
23
- }
24
  }
 
1
  {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
 
 
 
 
 
 
 
 
 
 
 
 
4
  "pad_token": "<|endoftext|>",
5
+ "unk_token": "<|endoftext|>"
 
 
 
 
 
 
6
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1a8405b5b12c2421511578a2a263c8726b704e2f8f7a5ea6d224f70360867944
3
- size 6609
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a26db1abfd69f6f056e8d4ef498a373ecceb24089725c63a992fdee4bb81d1d6
3
+ size 5969