HowieHwong
/

ppopt

Text Generation

prompt-optimization

Model card Files Files and versions

ppopt / README.md

HowieHwong's picture

Upload README.md with huggingface_hub

3bf42c1 verified about 1 month ago

|

history blame contribute delete

3.43 kB

	---
	library_name: peft
	license: apache-2.0
	base_model: meta-llama/Llama-3.1-8B-Instruct
	tags:
	- llama
	- lora
	- prompt-optimization
	- sft
	- grpo
	- rlhf
	language:
	- en
	pipeline_tag: text-generation
	---

	# PPOpt-Llama-3.1-8B-Instruct-LoRA

	A LoRA adapter for Llama-3.1-8B-Instruct fine-tuned for Prompt Optimization task.

	## Model Description

	This model is trained to optimize user prompts based on their interaction history and preferences. Given a user's conversation history and current query, it rewrites the query into a clearer, more specific, and better-structured prompt.

	### Training Pipeline

	- Stage 1: SFT (Supervised Fine-Tuning) - Trained on curated prompt optimization examples
	- Stage 2: GRPO (Group Relative Policy Optimization) - Reinforcement learning with GPT-4o-mini as judge

	### LoRA Configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| r (rank) \| 32 \|
	\| lora_alpha \| 32 \|
	\| target_modules \| all-linear \|
	\| lora_dropout \| 0 \|
	\| bias \| none \|

	## Usage

	### Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	# Load base model
	base_model_id = "meta-llama/Llama-3.1-8B-Instruct"
	model = AutoModelForCausalLM.from_pretrained(
	base_model_id,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(base_model_id)

	# Load LoRA adapter
	model = PeftModel.from_pretrained(model, "HowieHwong/ppopt")
	```

	### Inference Example

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	# Load model
	base_model_id = "meta-llama/Llama-3.1-8B-Instruct"
	model = AutoModelForCausalLM.from_pretrained(
	base_model_id,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(base_model_id)
	model = PeftModel.from_pretrained(model, "HowieHwong/ppopt")

	# Prepare input
	conversation_history = """User: How do I center a div?
	Assistant: You can use flexbox: display: flex; justify-content: center; align-items: center;
	User: What about grid?
	Assistant: With grid: display: grid; place-items: center;"""

	current_query = "how to make it responsive"

	prompt = f"""Based on the conversation history and user preferences, optimize the following query into a clearer, more specific prompt.

	Conversation History:
	{conversation_history}

	Current Query: {current_query}

	Optimized Prompt:"""

	# Generate
	messages = [{"role": "user", "content": prompt}]
	input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

	outputs = model.generate(
	input_ids,
	max_new_tokens=256,
	temperature=0.7,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id
	)

	response = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
	print(response)
	```

	### Merge LoRA (Optional)

	If you want to merge the adapter into the base model:

	```python
	merged_model = model.merge_and_unload()
	merged_model.save_pretrained("merged_ppopt_llama8b")
	tokenizer.save_pretrained("merged_ppopt_llama8b")
	```

	## Intended Use

	This model is designed for:
	- Prompt optimization/rewriting systems
	- Personalized query enhancement based on user history
	- Research on prompt engineering automation


	## License

	This model is released under the Apache 2.0 license.