Upload PPOpt LoRA adapter (Llama-3.1-8B-Instruct)

Browse files

Files changed (3) hide show

README.md +81 -0
adapter_config.json +31 -0
adapter_model.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+library_name: peft
+license: apache-2.0
+base_model: meta-llama/Llama-3.1-8B-Instruct
+tags:
+  - llama
+  - lora
+  - prompt-optimization
+  - sft
+  - grpo
+  - rlhf
+language:
+  - en
+pipeline_tag: text-generation
+---
+# PPOpt-Llama-3.1-8B-Instruct-LoRA
+A LoRA adapter for **Llama-3.1-8B-Instruct** fine-tuned for **Prompt Optimization** task.
+## Model Description
+This model is trained to optimize user prompts based on their interaction history and preferences. Given a user's conversation history and current query, it rewrites the query into a clearer, more specific, and better-structured prompt.
+### Training Pipeline
+- **Stage 1: SFT (Supervised Fine-Tuning)** - Trained on curated prompt optimization examples
+- **Stage 2: GRPO (Group Relative Policy Optimization)** - Reinforcement learning with GPT-4o-mini as judge
+### LoRA Configuration
+| Parameter | Value |
+|-----------|-------|
+| r (rank) | 32 |
+| lora_alpha | 32 |
+| target_modules | all-linear |
+| lora_dropout | 0 |
+| bias | none |
+## Usage
+### Quick Start
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+# Load base model
+base_model_id = "meta-llama/Llama-3.1-8B-Instruct"
+model = AutoModelForCausalLM.from_pretrained(
+    base_model_id,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(base_model_id)
+# Load LoRA adapter
+model = PeftModel.from_pretrained(model, "YOUR_USERNAME/ppopt-llama-3.1-8b-lora")
+```
+### Merge LoRA (Optional)
+If you want to merge the adapter into the base model:
+```python
+merged_model = model.merge_and_unload()
+merged_model.save_pretrained("merged_ppopt_llama8b")
+tokenizer.save_pretrained("merged_ppopt_llama8b")
+```
+## Intended Use
+This model is designed for:
+- Prompt optimization/rewriting systems
+- Personalized query enhancement based on user history
+- Research on prompt engineering automation
+## License
+This model is released under the Apache 2.0 license.

adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "meta-llama/Llama-3.1-8B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": false,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": "all-linear",
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ce96b4ccec049b92522221ef19e5ff865759f56ef40383e96a081ea5d97ba112
+size 352546008