alnrg2arg
/

test3_sft_16bit

@@ -17,6 +17,60 @@ datasets:
 - **Finetuned from model :** alnrg2arg/blockchainlabs_7B_merged_test2_4
 Benchmark scores
 |    Tasks    |Version|Filter|n-shot| Metric |Value |   |Stderr|
@@ -49,51 +103,6 @@ Benchmark scores
 |-----|------:|----------|-----:|-----------|-----:|---|-----:|
 |gsm8k|      2|get-answer|     5|exact_match|0.7468|±  | 0.012|
-|      Tasks      |Version|Filter|n-shot|  Metric   | Value |   |Stderr|
-|-----------------|-------|------|-----:|-----------|------:|---|-----:|
-|truthfulqa       |N/A    |none  |     0|bleu_max   |16.3339|±  |0.3451|
-|                 |       |none  |     0|bleu_acc   | 0.4982|±  |0.0003|
-|                 |       |none  |     0|bleu_diff  | 1.2909|±  |0.1919|
-|                 |       |none  |     0|rouge1_max |41.6927|±  |0.5469|
-|                 |       |none  |     0|rouge1_acc | 0.5300|±  |0.0003|
-|                 |       |none  |     0|rouge1_diff| 1.4267|±  |0.3796|
-|                 |       |none  |     0|rouge2_max |27.3013|±  |0.6213|
-|                 |       |none  |     0|rouge2_acc | 0.4272|±  |0.0003|
-|                 |       |none  |     0|rouge2_diff| 1.5314|±  |0.4765|
-|                 |       |none  |     0|rougeL_max |37.8174|±  |0.5443|
-|                 |       |none  |     0|rougeL_acc | 0.4859|±  |0.0003|
-|                 |       |none  |     0|rougeL_diff| 1.2621|±  |0.3898|
-|                 |       |none  |     0|acc        | 0.6613|±  |0.0435|
-| - truthfulqa_gen|      3|none  |     0|bleu_max   |16.3339|±  |0.5874|
-|                 |       |none  |     0|bleu_acc   | 0.4982|±  |0.0175|
-|                 |       |none  |     0|bleu_diff  | 1.2909|±  |0.4381|
-|                 |       |none  |     0|rouge1_max |41.6927|±  |0.7396|
-|                 |       |none  |     0|rouge1_acc | 0.5300|±  |0.0175|
-|                 |       |none  |     0|rouge1_diff| 1.4267|±  |0.6161|
-|                 |       |none  |     0|rouge2_max |27.3013|±  |0.7882|
-|                 |       |none  |     0|rouge2_acc | 0.4272|±  |0.0173|
-|                 |       |none  |     0|rouge2_diff| 1.5314|±  |0.6903|
-|                 |       |none  |     0|rougeL_max |37.8174|±  |0.7378|
-|                 |       |none  |     0|rougeL_acc | 0.4859|±  |0.0175|
-|                 |       |none  |     0|rougeL_diff| 1.2621|±  |0.6243|
-| - truthfulqa_mc1|      2|none  |     0|acc        | 0.5753|±  |0.0173|
-| - truthfulqa_mc2|      2|none  |     0|acc        | 0.7043|±  |0.0150|
-|  Groups  |Version|Filter|n-shot|  Metric   | Value |   |Stderr|
-|----------|-------|------|-----:|-----------|------:|---|-----:|
-|truthfulqa|N/A    |none  |     0|bleu_max   |16.3339|±  |0.3451|
-|          |       |none  |     0|bleu_acc   | 0.4982|±  |0.0003|
-|          |       |none  |     0|bleu_diff  | 1.2909|±  |0.1919|
-|          |       |none  |     0|rouge1_max |41.6927|±  |0.5469|
-|          |       |none  |     0|rouge1_acc | 0.5300|±  |0.0003|
-|          |       |none  |     0|rouge1_diff| 1.4267|±  |0.3796|
-|          |       |none  |     0|rouge2_max |27.3013|±  |0.6213|
-|          |       |none  |     0|rouge2_acc | 0.4272|±  |0.0003|
-|          |       |none  |     0|rouge2_diff| 1.5314|±  |0.4765|
-|          |       |none  |     0|rougeL_max |37.8174|±  |0.5443|
-|          |       |none  |     0|rougeL_acc | 0.4859|±  |0.0003|
-|          |       |none  |     0|rougeL_diff| 1.2621|±  |0.3898|
-|          |       |none  |     0|acc        | 0.6613|±  |0.0435|
 Average 75.94

 - **Finetuned from model :** alnrg2arg/blockchainlabs_7B_merged_test2_4
+This is a SFT version of the model from blockchainlab test 2.4 - alnrg2arg/blockchainlabs_7B_merged_test2_4.
+The project is running to make a small LLM for a on-device purpose.
+Overall pipeline for this iteration is
+1.Merging to make a base model (7B)
+2.Prune the model to reduce the parameter (50% sparcity)
+3.For recovery phase of the pruning, the DPO is chosen.
+This model which is not pruned is intended to compare with the pruned model.
+DPO consists of two parts : SFT and DPO - Now this model is the intermediate format (SFT)
+This model can also be compared to the DPO version of the model.
+This is the code and parameters I chose for this model(SFT).
+```
+from transformers import TrainingArguments
+from trl import SFTTrainer
+from datasets import load_dataset
+from unsloth import FastLanguageModel, FastMistralModel
+max_seq_length = 2048 # Supports automatic RoPE Scaling, so choose any number
+# Load model
+model, tokenizer = FastMistralModel.from_pretrained(
+    model_name = "alnrg2arg/blockchainlabs_7B_merged_test2_4,
+    max_seq_length = max_seq_length,
+    dtype = None, # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
+    load_in_4bit = True, # Use 4bit quantization to reduce memory usage. Can be False
+    #device_map = "balanced"
+    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
+)
+model = FastMistralModel.get_peft_model(
+    model,
+    r = 16,
+    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
+                      "gate_proj", "up_proj", "down_proj",],
+    lora_alpha = 16,
+    lora_dropout = 0, # Dropout = 0 is currently optimized
+    bias = "none",    # Bias = "none" is currently optimized
+    use_gradient_checkpointing = True,
+    random_state = 3407,
+    max_seq_length = max_seq_length,
+)
+```
+The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing
 Benchmark scores
 |    Tasks    |Version|Filter|n-shot| Metric |Value |   |Stderr|
 |-----|------:|----------|-----:|-----------|-----:|---|-----:|
 |gsm8k|      2|get-answer|     5|exact_match|0.7468|±  | 0.012|
 Average 75.94