| | ---
|
| | library_name: peft
|
| | license: apache-2.0
|
| | base_model: meta-llama/Llama-3.1-8B-Instruct
|
| | tags:
|
| | - llama
|
| | - lora
|
| | - prompt-optimization
|
| | - sft
|
| | - grpo
|
| | - rlhf
|
| | language:
|
| | - en
|
| | pipeline_tag: text-generation
|
| | ---
|
| |
|
| | # PPOpt-Llama-3.1-8B-Instruct-LoRA
|
| |
|
| | A LoRA adapter for **Llama-3.1-8B-Instruct** fine-tuned for **Prompt Optimization** task.
|
| |
|
| | ## Model Description
|
| |
|
| | This model is trained to optimize user prompts based on their interaction history and preferences. Given a user's conversation history and current query, it rewrites the query into a clearer, more specific, and better-structured prompt.
|
| |
|
| | ### Training Pipeline
|
| |
|
| | - **Stage 1: SFT (Supervised Fine-Tuning)** - Trained on curated prompt optimization examples
|
| | - **Stage 2: GRPO (Group Relative Policy Optimization)** - Reinforcement learning with GPT-4o-mini as judge
|
| |
|
| | ### LoRA Configuration
|
| |
|
| | | Parameter | Value |
|
| | |-----------|-------|
|
| | | r (rank) | 32 |
|
| | | lora_alpha | 32 |
|
| | | target_modules | all-linear |
|
| | | lora_dropout | 0 |
|
| | | bias | none |
|
| |
|
| | ## Usage
|
| |
|
| | ### Quick Start
|
| |
|
| | ```python
|
| | from transformers import AutoModelForCausalLM, AutoTokenizer
|
| | from peft import PeftModel
|
| |
|
| | # Load base model
|
| | base_model_id = "meta-llama/Llama-3.1-8B-Instruct"
|
| | model = AutoModelForCausalLM.from_pretrained(
|
| | base_model_id,
|
| | torch_dtype="auto",
|
| | device_map="auto"
|
| | )
|
| | tokenizer = AutoTokenizer.from_pretrained(base_model_id)
|
| |
|
| | # Load LoRA adapter
|
| | model = PeftModel.from_pretrained(model, "HowieHwong/ppopt")
|
| | ```
|
| |
|
| | ### Inference Example
|
| |
|
| | ```python
|
| | from transformers import AutoModelForCausalLM, AutoTokenizer
|
| | from peft import PeftModel
|
| |
|
| | # Load model
|
| | base_model_id = "meta-llama/Llama-3.1-8B-Instruct"
|
| | model = AutoModelForCausalLM.from_pretrained(
|
| | base_model_id,
|
| | torch_dtype="auto",
|
| | device_map="auto"
|
| | )
|
| | tokenizer = AutoTokenizer.from_pretrained(base_model_id)
|
| | model = PeftModel.from_pretrained(model, "HowieHwong/ppopt")
|
| |
|
| | # Prepare input
|
| | conversation_history = """User: How do I center a div?
|
| | Assistant: You can use flexbox: display: flex; justify-content: center; align-items: center;
|
| | User: What about grid?
|
| | Assistant: With grid: display: grid; place-items: center;"""
|
| |
|
| | current_query = "how to make it responsive"
|
| |
|
| | prompt = f"""Based on the conversation history and user preferences, optimize the following query into a clearer, more specific prompt.
|
| |
|
| | Conversation History:
|
| | {conversation_history}
|
| |
|
| | Current Query: {current_query}
|
| |
|
| | Optimized Prompt:"""
|
| |
|
| | # Generate
|
| | messages = [{"role": "user", "content": prompt}]
|
| | input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
|
| |
|
| | outputs = model.generate(
|
| | input_ids,
|
| | max_new_tokens=256,
|
| | temperature=0.7,
|
| | do_sample=True,
|
| | pad_token_id=tokenizer.eos_token_id
|
| | )
|
| |
|
| | response = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
|
| | print(response)
|
| | ```
|
| |
|
| | ### Merge LoRA (Optional)
|
| |
|
| | If you want to merge the adapter into the base model:
|
| |
|
| | ```python
|
| | merged_model = model.merge_and_unload()
|
| | merged_model.save_pretrained("merged_ppopt_llama8b")
|
| | tokenizer.save_pretrained("merged_ppopt_llama8b")
|
| | ```
|
| |
|
| | ## Intended Use
|
| |
|
| | This model is designed for:
|
| | - Prompt optimization/rewriting systems
|
| | - Personalized query enhancement based on user history
|
| | - Research on prompt engineering automation
|
| |
|
| |
|
| | ## License
|
| |
|
| | This model is released under the Apache 2.0 license.
|
| |
|