Training in progress, step 500

Browse files

Files changed (5) hide show

README.md +10 -12
config.json +1 -0
model.safetensors +1 -1
special_tokens_map.json +3 -21
training_args.bin +2 -2

README.md CHANGED Viewed

@@ -1,17 +1,17 @@
 ---
-base_model: Ceachi/HW2-supervised
 library_name: transformers
 model_name: HW2-dpo
 tags:
 - generated_from_trainer
 - trl
-- dpo
 licence: license
 ---
 # Model Card for HW2-dpo
-This model is a fine-tuned version of [Ceachi/HW2-supervised](https://huggingface.co/Ceachi/HW2-supervised).
 It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
@@ -30,7 +30,7 @@ print(output["generated_text"])
-This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
 ### Framework versions
@@ -42,16 +42,14 @@ This model was trained with DPO, a method introduced in [Direct Preference Optim
 ## Citations
-Cite DPO as:
 ```bibtex
-@inproceedings{rafailov2023direct,
-    title        = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
-    author       = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
-    year         = 2023,
-    booktitle    = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
-    url          = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
-    editor       = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
 }
 ```

 ---
+base_model: openai-community/gpt2
 library_name: transformers
 model_name: HW2-dpo
 tags:
 - generated_from_trainer
+- orpo
 - trl
 licence: license
 ---
 # Model Card for HW2-dpo
+This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2).
 It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
+This model was trained with ORPO, a method introduced in [ORPO: Monolithic Preference Optimization without Reference Model](https://huggingface.co/papers/2403.07691).
 ### Framework versions
 ## Citations
+Cite ORPO as:
 ```bibtex
+@article{hong2024orpo,
+    title        = {{ORPO: Monolithic Preference Optimization without Reference Model}},
+    author       = {Jiwoo Hong and Noah Lee and James Thorne},
+    year         = 2024,
+    eprint       = {arXiv:2403.07691}
 }
 ```

config.json CHANGED Viewed

@@ -16,6 +16,7 @@
   "n_inner": null,
   "n_layer": 12,
   "n_positions": 1024,
   "reorder_and_upcast_attn": false,
   "resid_pdrop": 0.1,
   "scale_attn_by_inverse_layer_idx": false,

   "n_inner": null,
   "n_layer": 12,
   "n_positions": 1024,
+  "pad_token_id": 50256,
   "reorder_and_upcast_attn": false,
   "resid_pdrop": 0.1,
   "scale_attn_by_inverse_layer_idx": false,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0b0d5761e1ed88e8383da4473c2337985fc6f8c80fb57eb37c61abe25728bc47
 size 497774208

 version https://git-lfs.github.com/spec/v1
+oid sha256:8cada5223d12d3fbd792f04699f4456fd9ca3a957cf29905e8d78ebf8e6c1868
 size 497774208

special_tokens_map.json CHANGED Viewed

@@ -1,24 +1,6 @@
 {
-  "bos_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "eos_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
   "pad_token": "<|endoftext|>",
-  "unk_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  }
 }

 {
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
   "pad_token": "<|endoftext|>",
+  "unk_token": "<|endoftext|>"
 }

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1a8405b5b12c2421511578a2a263c8726b704e2f8f7a5ea6d224f70360867944
-size 6609

 version https://git-lfs.github.com/spec/v1
+oid sha256:a26db1abfd69f6f056e8d4ef498a373ecceb24089725c63a992fdee4bb81d1d6
+size 5969