justthzz
/

preference-tuned-summarizer

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

justthzz commited on Jul 8, 2025

Commit

05c7838

·

verified ·

1 Parent(s): ba7c245

Update README.md

Files changed (1) hide show

README.md +52 -11

README.md CHANGED Viewed

@@ -9,28 +9,69 @@ tags:
 licence: license
 ---
-# Model Card for distilgpt2-dpo-checkpoint
-This model is a fine-tuned version of [distilgpt2](https://huggingface.co/distilgpt2).
-It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
 ```python
 from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="None", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
 ```
-## Training procedure
-This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
 ### Framework versions

 licence: license
 ---
+# Preference-Tuned Summarizer using Direct Preference Optimization (DPO)
+This repository hosts a lightweight text summarization model fine-tuned from DistilGPT2 using Direct Preference Optimization (DPO). The model was trained on preference-labeled data to generate summaries that align better with human preferences compared to traditional supervised fine-tuning.
+---
+## Model Details
+- **Base model:** DistilGPT2
+- **Fine-tuning method:** Direct Preference Optimization (DPO)
+- **Dataset:** Preference pairs with `prompt`, `chosen`, and `rejected` summaries
+- **Evaluation metrics:** ROUGE-1 (0.2841), ROUGE-L (0.2247), BLEU (0.0286)
+- **Use case:** Generating high-quality, human-aligned text summaries
+---
+## How to Use
+You can load and use the model easily with the Hugging Face Transformers library:
 ```python
 from transformers import pipeline
+summarizer = pipeline("text-generation", model="justthzz/preference-tuned-summarizer")
+text = "Summarize: Your input text here."
+summary = summarizer(text, max_length=150, do_sample=False)
+print(summary[0]['generated_text'])
 ```
+---
+## Files Included
+- `pytorch_model.bin` - Model weights
+- `config.json` - Model configuration
+- Tokenizer files (`tokenizer.json`, `vocab.txt`, etc.)
+- Model card and this README
+---
+## About DPO
+Direct Preference Optimization is a fine-tuning technique that leverages preference-labeled datasets to directly optimize a model’s output preferences. This method improves alignment with human judgments beyond typical supervised fine-tuning.
+---
+## Evaluation Results
+| Metric   | Base Summary (avg) | DPO Summary (avg) |
+|----------|--------------------|-------------------|
+| ROUGE-1  | 0.0442             | **0.2841**        |
+| ROUGE-L  | 0.0366             | **0.2247**        |
+| BLEU     | 0.0000             | **0.0286**        |
+---
+## Links
+- [GitHub Repository](https://github.com/justthzz/preference-tuned-summarizer)
+- [Model on Hugging Face](https://huggingface.co/justthzz/preference-tuned-summarizer)
+---
 ### Framework versions