CatoG
commited on
Enhance README with project overview and purpose
Browse files
README.md
CHANGED
|
@@ -16,4 +16,8 @@ https://huggingface.co/spaces/CatoG/DPO_Demo
|
|
| 16 |
Allows for LLM model selection, preference tuning of LLM responses, model response tuning with LoRA and Direct Preference Optimization (DPO).
|
| 17 |
Tuned model / policies can be downloaded for further use.
|
| 18 |
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
Allows for LLM model selection, preference tuning of LLM responses, model response tuning with LoRA and Direct Preference Optimization (DPO).
|
| 17 |
Tuned model / policies can be downloaded for further use.
|
| 18 |
|
| 19 |
+
This project is an interactive Direct Preference Optimization (DPO) playground for experimenting with real LLM behavior-tuning. The app lets you load a variety of open models, generate multiple candidate answers, and explicitly encode human preferences (chosen vs. rejected responses) through an intuitive Gradio interface.
|
| 20 |
+
|
| 21 |
+
Using these preference pairs, the app trains a LoRA-adapted policy model against a frozen reference model, shifting the model’s behavior toward your desired style, tone, or reasoning pattern. You can explore how DPO changes alignment by collecting preferences, running training rounds, and immediately testing the tuned policy model on new prompts.
|
| 22 |
+
|
| 23 |
+
Purpose: Experimental tool for understanding alignment, safety, and model personalization techniques, without requiring deep ML infrastructure. It supports multiple models, adjustable generation parameters, preference visualization, and downloadable tuned LoRA adapters for further use.
|