justthzz commited on
Commit
05c7838
·
verified ·
1 Parent(s): ba7c245

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -11
README.md CHANGED
@@ -9,28 +9,69 @@ tags:
9
  licence: license
10
  ---
11
 
12
- # Model Card for distilgpt2-dpo-checkpoint
13
 
14
- This model is a fine-tuned version of [distilgpt2](https://huggingface.co/distilgpt2).
15
- It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
- ## Quick start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ```python
20
  from transformers import pipeline
21
 
22
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="None", device="cuda")
24
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
- print(output["generated_text"])
 
26
  ```
27
 
28
- ## Training procedure
 
 
 
 
 
 
 
29
 
30
-
 
 
 
 
 
 
 
 
31
 
 
 
 
 
 
32
 
33
- This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
 
 
 
 
 
 
 
34
 
35
  ### Framework versions
36
 
 
9
  licence: license
10
  ---
11
 
12
+ # Preference-Tuned Summarizer using Direct Preference Optimization (DPO)
13
 
14
+ This repository hosts a lightweight text summarization model fine-tuned from DistilGPT2 using Direct Preference Optimization (DPO). The model was trained on preference-labeled data to generate summaries that align better with human preferences compared to traditional supervised fine-tuning.
 
15
 
16
+ ---
17
+
18
+ ## Model Details
19
+
20
+ - **Base model:** DistilGPT2
21
+ - **Fine-tuning method:** Direct Preference Optimization (DPO)
22
+ - **Dataset:** Preference pairs with `prompt`, `chosen`, and `rejected` summaries
23
+ - **Evaluation metrics:** ROUGE-1 (0.2841), ROUGE-L (0.2247), BLEU (0.0286)
24
+ - **Use case:** Generating high-quality, human-aligned text summaries
25
+
26
+ ---
27
+
28
+ ## How to Use
29
+
30
+ You can load and use the model easily with the Hugging Face Transformers library:
31
 
32
  ```python
33
  from transformers import pipeline
34
 
35
+ summarizer = pipeline("text-generation", model="justthzz/preference-tuned-summarizer")
36
+ text = "Summarize: Your input text here."
37
+
38
+ summary = summarizer(text, max_length=150, do_sample=False)
39
+ print(summary[0]['generated_text'])
40
  ```
41
 
42
+ ---
43
+
44
+ ## Files Included
45
+
46
+ - `pytorch_model.bin` - Model weights
47
+ - `config.json` - Model configuration
48
+ - Tokenizer files (`tokenizer.json`, `vocab.txt`, etc.)
49
+ - Model card and this README
50
 
51
+ ---
52
+
53
+ ## About DPO
54
+
55
+ Direct Preference Optimization is a fine-tuning technique that leverages preference-labeled datasets to directly optimize a model’s output preferences. This method improves alignment with human judgments beyond typical supervised fine-tuning.
56
+
57
+ ---
58
+
59
+ ## Evaluation Results
60
 
61
+ | Metric | Base Summary (avg) | DPO Summary (avg) |
62
+ |----------|--------------------|-------------------|
63
+ | ROUGE-1 | 0.0442 | **0.2841** |
64
+ | ROUGE-L | 0.0366 | **0.2247** |
65
+ | BLEU | 0.0000 | **0.0286** |
66
 
67
+ ---
68
+
69
+ ## Links
70
+
71
+ - [GitHub Repository](https://github.com/justthzz/preference-tuned-summarizer)
72
+ - [Model on Hugging Face](https://huggingface.co/justthzz/preference-tuned-summarizer)
73
+
74
+ ---
75
 
76
  ### Framework versions
77