Update README.md
Browse files
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
-
license:
|
| 5 |
tags:
|
| 6 |
- text-generation-inference
|
| 7 |
- transformers
|
|
@@ -9,14 +9,50 @@ tags:
|
|
| 9 |
- mistral
|
| 10 |
- trl
|
| 11 |
base_model: alnrg2arg/blockchainlabs_7B_merged_test2_4
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
|
| 17 |
-
- **License:** apache-2.0
|
| 18 |
-
- **Finetuned from model :** alnrg2arg/blockchainlabs_7B_merged_test2_4
|
| 19 |
|
| 20 |
-
|
| 21 |
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
+
license: cc-by-nc-4.0
|
| 5 |
tags:
|
| 6 |
- text-generation-inference
|
| 7 |
- transformers
|
|
|
|
| 9 |
- mistral
|
| 10 |
- trl
|
| 11 |
base_model: alnrg2arg/blockchainlabs_7B_merged_test2_4
|
| 12 |
+
datasets:
|
| 13 |
+
- Intel/orca_dpo_pairs
|
| 14 |
---
|
| 15 |
|
| 16 |
+
This is a model from blockchainlab test 2.4 - alnrg2arg/blockchainlabs_7B_merged_test2_4.
|
| 17 |
|
| 18 |
+
The project is running to make a small LLM for a on-device purpose.
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
Overall pipeline for this iteration is
|
| 21 |
|
| 22 |
+
1.Merging to make a base model (7B) 2.Prune the model to reduce the parameter (50% sparcity) 3.For recovery phase of the pruning, the DPO is chosen.
|
| 23 |
+
|
| 24 |
+
This model which is not pruned is intended to compare with the pruned model.
|
| 25 |
+
|
| 26 |
+
This is the code and parameters I chose for this model(DPO).
|
| 27 |
+
```
|
| 28 |
+
from transformers import TrainingArguments, AutoModelForCausalLM
|
| 29 |
+
from trl import DPOTrainer
|
| 30 |
+
|
| 31 |
+
dpo_trainer = DPOTrainer(
|
| 32 |
+
model = model,
|
| 33 |
+
|
| 34 |
+
ref_model = None,
|
| 35 |
+
args = TrainingArguments(
|
| 36 |
+
per_device_train_batch_size = 8,
|
| 37 |
+
gradient_accumulation_steps = 8,
|
| 38 |
+
warmup_ratio = 0.1,
|
| 39 |
+
num_train_epochs = 3,
|
| 40 |
+
learning_rate = 5e-6,
|
| 41 |
+
fp16 = not torch.cuda.is_bf16_supported(),
|
| 42 |
+
bf16 = torch.cuda.is_bf16_supported(),
|
| 43 |
+
logging_steps = 1,
|
| 44 |
+
optim = "adamw_8bit",
|
| 45 |
+
weight_decay = 0.0,
|
| 46 |
+
lr_scheduler_type = "linear",
|
| 47 |
+
seed = 42,
|
| 48 |
+
output_dir = "output_DPO",
|
| 49 |
+
),
|
| 50 |
+
beta = 0.1,
|
| 51 |
+
train_dataset = dataset,
|
| 52 |
+
# eval_dataset = raw_datasets["test"],
|
| 53 |
+
tokenizer = tokenizer,
|
| 54 |
+
max_length = 1024,
|
| 55 |
+
max_prompt_length = 512,
|
| 56 |
+
)
|
| 57 |
+
```
|
| 58 |
+
The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing
|