File size: 3,434 Bytes
8b6e796
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3bf42c1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8b6e796
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---

library_name: peft
license: apache-2.0
base_model: meta-llama/Llama-3.1-8B-Instruct
tags:
  - llama
  - lora
  - prompt-optimization
  - sft
  - grpo
  - rlhf
language:
  - en
pipeline_tag: text-generation
---


# PPOpt-Llama-3.1-8B-Instruct-LoRA

A LoRA adapter for **Llama-3.1-8B-Instruct** fine-tuned for **Prompt Optimization** task.

## Model Description

This model is trained to optimize user prompts based on their interaction history and preferences. Given a user's conversation history and current query, it rewrites the query into a clearer, more specific, and better-structured prompt.

### Training Pipeline

- **Stage 1: SFT (Supervised Fine-Tuning)** - Trained on curated prompt optimization examples
- **Stage 2: GRPO (Group Relative Policy Optimization)** - Reinforcement learning with GPT-4o-mini as judge

### LoRA Configuration

| Parameter | Value |
|-----------|-------|
| r (rank) | 32 |
| lora_alpha | 32 |

| target_modules | all-linear |
| lora_dropout | 0 |

| bias | none |



## Usage



### Quick Start



```python

from transformers import AutoModelForCausalLM, AutoTokenizer

from peft import PeftModel



# Load base model

base_model_id = "meta-llama/Llama-3.1-8B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,

    torch_dtype="auto",

    device_map="auto"

)

tokenizer = AutoTokenizer.from_pretrained(base_model_id)


# Load LoRA adapter
model = PeftModel.from_pretrained(model, "HowieHwong/ppopt")

```



### Inference Example



```python

from transformers import AutoModelForCausalLM, AutoTokenizer

from peft import PeftModel



# Load model

base_model_id = "meta-llama/Llama-3.1-8B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,

    torch_dtype="auto",

    device_map="auto"

)

tokenizer = AutoTokenizer.from_pretrained(base_model_id)

model = PeftModel.from_pretrained(model, "HowieHwong/ppopt")


# Prepare input
conversation_history = """User: How do I center a div?

Assistant: You can use flexbox: display: flex; justify-content: center; align-items: center;

User: What about grid?

Assistant: With grid: display: grid; place-items: center;"""



current_query = "how to make it responsive"

prompt = f"""Based on the conversation history and user preferences, optimize the following query into a clearer, more specific prompt.

Conversation History:
{conversation_history}



Current Query: {current_query}

Optimized Prompt:"""

# Generate
messages = [{"role": "user", "content": prompt}]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

outputs = model.generate(
    input_ids,

    max_new_tokens=256,

    temperature=0.7,

    do_sample=True,

    pad_token_id=tokenizer.eos_token_id

)


response = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```



### Merge LoRA (Optional)



If you want to merge the adapter into the base model:



```python

merged_model = model.merge_and_unload()

merged_model.save_pretrained("merged_ppopt_llama8b")

tokenizer.save_pretrained("merged_ppopt_llama8b")

```

## Intended Use

This model is designed for:
- Prompt optimization/rewriting systems
- Personalized query enhancement based on user history
- Research on prompt engineering automation


## License

This model is released under the Apache 2.0 license.