File size: 3,434 Bytes
8b6e796 3bf42c1 8b6e796 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | ---
library_name: peft
license: apache-2.0
base_model: meta-llama/Llama-3.1-8B-Instruct
tags:
- llama
- lora
- prompt-optimization
- sft
- grpo
- rlhf
language:
- en
pipeline_tag: text-generation
---
# PPOpt-Llama-3.1-8B-Instruct-LoRA
A LoRA adapter for **Llama-3.1-8B-Instruct** fine-tuned for **Prompt Optimization** task.
## Model Description
This model is trained to optimize user prompts based on their interaction history and preferences. Given a user's conversation history and current query, it rewrites the query into a clearer, more specific, and better-structured prompt.
### Training Pipeline
- **Stage 1: SFT (Supervised Fine-Tuning)** - Trained on curated prompt optimization examples
- **Stage 2: GRPO (Group Relative Policy Optimization)** - Reinforcement learning with GPT-4o-mini as judge
### LoRA Configuration
| Parameter | Value |
|-----------|-------|
| r (rank) | 32 |
| lora_alpha | 32 |
| target_modules | all-linear |
| lora_dropout | 0 |
| bias | none |
## Usage
### Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model_id = "meta-llama/Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "HowieHwong/ppopt")
```
### Inference Example
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load model
base_model_id = "meta-llama/Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(model, "HowieHwong/ppopt")
# Prepare input
conversation_history = """User: How do I center a div?
Assistant: You can use flexbox: display: flex; justify-content: center; align-items: center;
User: What about grid?
Assistant: With grid: display: grid; place-items: center;"""
current_query = "how to make it responsive"
prompt = f"""Based on the conversation history and user preferences, optimize the following query into a clearer, more specific prompt.
Conversation History:
{conversation_history}
Current Query: {current_query}
Optimized Prompt:"""
# Generate
messages = [{"role": "user", "content": prompt}]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=256,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```
### Merge LoRA (Optional)
If you want to merge the adapter into the base model:
```python
merged_model = model.merge_and_unload()
merged_model.save_pretrained("merged_ppopt_llama8b")
tokenizer.save_pretrained("merged_ppopt_llama8b")
```
## Intended Use
This model is designed for:
- Prompt optimization/rewriting systems
- Personalized query enhancement based on user history
- Research on prompt engineering automation
## License
This model is released under the Apache 2.0 license.
|