File size: 3,434 Bytes

---

library_name: peft
license: apache-2.0
base_model: meta-llama/Llama-3.1-8B-Instruct
tags:
  - llama
  - lora
  - prompt-optimization
  - sft
  - grpo
  - rlhf
language:
  - en
pipeline_tag: text-generation
---


# PPOpt-Llama-3.1-8B-Instruct-LoRA

A LoRA adapter for **Llama-3.1-8B-Instruct** fine-tuned for **Prompt Optimization** task.

## Model Description

This model is trained to optimize user prompts based on their interaction history and preferences. Given a user's conversation history and current query, it rewrites the query into a clearer, more specific, and better-structured prompt.

### Training Pipeline

- **Stage 1: SFT (Supervised Fine-Tuning)** - Trained on curated prompt optimization examples
- **Stage 2: GRPO (Group Relative Policy Optimization)** - Reinforcement learning with GPT-4o-mini as judge

### LoRA Configuration

| Parameter | Value |
|-----------|-------|
| r (rank) | 32 |
| lora_alpha | 32 |

| target_modules | all-linear |
| lora_dropout | 0 |

| bias | none |



## Usage



### Quick Start



```python

from transformers import AutoModelForCausalLM, AutoTokenizer

from peft import PeftModel



# Load base model

base_model_id = "meta-llama/Llama-3.1-8B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,

    torch_dtype="auto",

    device_map="auto"

)

tokenizer = AutoTokenizer.from_pretrained(base_model_id)


# Load LoRA adapter
model = PeftModel.from_pretrained(model, "HowieHwong/ppopt")

```



### Inference Example



```python

from transformers import AutoModelForCausalLM, AutoTokenizer

from peft import PeftModel



# Load model

base_model_id = "meta-llama/Llama-3.1-8B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,

    torch_dtype="auto",

    device_map="auto"

)

tokenizer = AutoTokenizer.from_pretrained(base_model_id)

model = PeftModel.from_pretrained(model, "HowieHwong/ppopt")


# Prepare input
conversation_history = """User: How do I center a div?

Assistant: You can use flexbox: display: flex; justify-content: center; align-items: center;

User: What about grid?

Assistant: With grid: display: grid; place-items: center;"""



current_query = "how to make it responsive"

prompt = f"""Based on the conversation history and user preferences, optimize the following query into a clearer, more specific prompt.

Conversation History:
{conversation_history}



Current Query: {current_query}

Optimized Prompt:"""

# Generate
messages = [{"role": "user", "content": prompt}]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

outputs = model.generate(
    input_ids,

    max_new_tokens=256,

    temperature=0.7,

    do_sample=True,

    pad_token_id=tokenizer.eos_token_id

)


response = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```



### Merge LoRA (Optional)



If you want to merge the adapter into the base model:



```python

merged_model = model.merge_and_unload()

merged_model.save_pretrained("merged_ppopt_llama8b")

tokenizer.save_pretrained("merged_ppopt_llama8b")

```

## Intended Use

This model is designed for:
- Prompt optimization/rewriting systems
- Personalized query enhancement based on user history
- Research on prompt engineering automation


## License

This model is released under the Apache 2.0 license.