HowieHwong commited on
Commit
8b6e796
·
verified ·
1 Parent(s): 892d21a

Upload PPOpt LoRA adapter (Llama-3.1-8B-Instruct)

Browse files
Files changed (3) hide show
  1. README.md +81 -0
  2. adapter_config.json +31 -0
  3. adapter_model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: meta-llama/Llama-3.1-8B-Instruct
5
+ tags:
6
+ - llama
7
+ - lora
8
+ - prompt-optimization
9
+ - sft
10
+ - grpo
11
+ - rlhf
12
+ language:
13
+ - en
14
+ pipeline_tag: text-generation
15
+ ---
16
+
17
+ # PPOpt-Llama-3.1-8B-Instruct-LoRA
18
+
19
+ A LoRA adapter for **Llama-3.1-8B-Instruct** fine-tuned for **Prompt Optimization** task.
20
+
21
+ ## Model Description
22
+
23
+ This model is trained to optimize user prompts based on their interaction history and preferences. Given a user's conversation history and current query, it rewrites the query into a clearer, more specific, and better-structured prompt.
24
+
25
+ ### Training Pipeline
26
+
27
+ - **Stage 1: SFT (Supervised Fine-Tuning)** - Trained on curated prompt optimization examples
28
+ - **Stage 2: GRPO (Group Relative Policy Optimization)** - Reinforcement learning with GPT-4o-mini as judge
29
+
30
+ ### LoRA Configuration
31
+
32
+ | Parameter | Value |
33
+ |-----------|-------|
34
+ | r (rank) | 32 |
35
+ | lora_alpha | 32 |
36
+ | target_modules | all-linear |
37
+ | lora_dropout | 0 |
38
+ | bias | none |
39
+
40
+ ## Usage
41
+
42
+ ### Quick Start
43
+
44
+ ```python
45
+ from transformers import AutoModelForCausalLM, AutoTokenizer
46
+ from peft import PeftModel
47
+
48
+ # Load base model
49
+ base_model_id = "meta-llama/Llama-3.1-8B-Instruct"
50
+ model = AutoModelForCausalLM.from_pretrained(
51
+ base_model_id,
52
+ torch_dtype="auto",
53
+ device_map="auto"
54
+ )
55
+ tokenizer = AutoTokenizer.from_pretrained(base_model_id)
56
+
57
+ # Load LoRA adapter
58
+ model = PeftModel.from_pretrained(model, "YOUR_USERNAME/ppopt-llama-3.1-8b-lora")
59
+ ```
60
+
61
+ ### Merge LoRA (Optional)
62
+
63
+ If you want to merge the adapter into the base model:
64
+
65
+ ```python
66
+ merged_model = model.merge_and_unload()
67
+ merged_model.save_pretrained("merged_ppopt_llama8b")
68
+ tokenizer.save_pretrained("merged_ppopt_llama8b")
69
+ ```
70
+
71
+ ## Intended Use
72
+
73
+ This model is designed for:
74
+ - Prompt optimization/rewriting systems
75
+ - Personalized query enhancement based on user history
76
+ - Research on prompt engineering automation
77
+
78
+
79
+ ## License
80
+
81
+ This model is released under the Apache 2.0 license.
adapter_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "meta-llama/Llama-3.1-8B-Instruct",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": false,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 32,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": "all-linear",
27
+ "task_type": "CAUSAL_LM",
28
+ "trainable_token_indices": null,
29
+ "use_dora": false,
30
+ "use_rslora": false
31
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce96b4ccec049b92522221ef19e5ff865759f56ef40383e96a081ea5d97ba112
3
+ size 352546008