--- library_name: peft license: apache-2.0 base_model: meta-llama/Llama-3.1-8B-Instruct tags: - llama - lora - prompt-optimization - sft - grpo - rlhf language: - en pipeline_tag: text-generation --- # PPOpt-Llama-3.1-8B-Instruct-LoRA A LoRA adapter for **Llama-3.1-8B-Instruct** fine-tuned for **Prompt Optimization** task. ## Model Description This model is trained to optimize user prompts based on their interaction history and preferences. Given a user's conversation history and current query, it rewrites the query into a clearer, more specific, and better-structured prompt. ### Training Pipeline - **Stage 1: SFT (Supervised Fine-Tuning)** - Trained on curated prompt optimization examples - **Stage 2: GRPO (Group Relative Policy Optimization)** - Reinforcement learning with GPT-4o-mini as judge ### LoRA Configuration | Parameter | Value | |-----------|-------| | r (rank) | 32 | | lora_alpha | 32 | | target_modules | all-linear | | lora_dropout | 0 | | bias | none | ## Usage ### Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Load base model base_model_id = "meta-llama/Llama-3.1-8B-Instruct" model = AutoModelForCausalLM.from_pretrained( base_model_id, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(base_model_id) # Load LoRA adapter model = PeftModel.from_pretrained(model, "HowieHwong/ppopt") ``` ### Inference Example ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Load model base_model_id = "meta-llama/Llama-3.1-8B-Instruct" model = AutoModelForCausalLM.from_pretrained( base_model_id, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(base_model_id) model = PeftModel.from_pretrained(model, "HowieHwong/ppopt") # Prepare input conversation_history = """User: How do I center a div? Assistant: You can use flexbox: display: flex; justify-content: center; align-items: center; User: What about grid? Assistant: With grid: display: grid; place-items: center;""" current_query = "how to make it responsive" prompt = f"""Based on the conversation history and user preferences, optimize the following query into a clearer, more specific prompt. Conversation History: {conversation_history} Current Query: {current_query} Optimized Prompt:""" # Generate messages = [{"role": "user", "content": prompt}] input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device) outputs = model.generate( input_ids, max_new_tokens=256, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True) print(response) ``` ### Merge LoRA (Optional) If you want to merge the adapter into the base model: ```python merged_model = model.merge_and_unload() merged_model.save_pretrained("merged_ppopt_llama8b") tokenizer.save_pretrained("merged_ppopt_llama8b") ``` ## Intended Use This model is designed for: - Prompt optimization/rewriting systems - Personalized query enhancement based on user history - Research on prompt engineering automation ## License This model is released under the Apache 2.0 license.