|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: cc-by-nc-4.0 |
|
|
tags: |
|
|
- text-generation-inference |
|
|
- transformers |
|
|
- unsloth |
|
|
- mistral |
|
|
- trl |
|
|
base_model: alnrg2arg/blockchainlabs_7B_merged_test2_4 |
|
|
datasets: |
|
|
- Open-Orca/SlimOrca |
|
|
--- |
|
|
|
|
|
# Uploaded model |
|
|
|
|
|
- **Finetuned from model :** alnrg2arg/blockchainlabs_7B_merged_test2_4 |
|
|
|
|
|
This is a SFT version of the model from blockchainlab test 2.4 - alnrg2arg/blockchainlabs_7B_merged_test2_4. |
|
|
|
|
|
The project is running to make a small LLM for a on-device purpose. |
|
|
|
|
|
Overall pipeline for this iteration is |
|
|
|
|
|
1.Merging to make a base model (7B) |
|
|
2.Prune the model to reduce the parameter (50% sparcity) |
|
|
3.For recovery phase of the pruning, the DPO is chosen. |
|
|
|
|
|
This model which is not pruned is intended to compare with the pruned model. |
|
|
|
|
|
DPO consists of two parts : SFT and DPO - Now this model is the intermediate format (SFT) |
|
|
This model can also be compared to the DPO version of the model. |
|
|
|
|
|
|
|
|
This is the code and parameters I chose for this model(SFT). |
|
|
|
|
|
``` |
|
|
from transformers import TrainingArguments |
|
|
from trl import SFTTrainer |
|
|
from datasets import load_dataset |
|
|
from unsloth import FastLanguageModel, FastMistralModel |
|
|
|
|
|
|
|
|
max_seq_length = 2048 # Supports automatic RoPE Scaling, so choose any number |
|
|
|
|
|
# Load model |
|
|
model, tokenizer = FastMistralModel.from_pretrained( |
|
|
model_name = "alnrg2arg/blockchainlabs_7B_merged_test2_4, |
|
|
max_seq_length = max_seq_length, |
|
|
dtype = None, # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ |
|
|
load_in_4bit = True, # Use 4bit quantization to reduce memory usage. Can be False |
|
|
#device_map = "balanced" |
|
|
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf |
|
|
) |
|
|
|
|
|
model = FastMistralModel.get_peft_model( |
|
|
model, |
|
|
r = 16, |
|
|
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", |
|
|
"gate_proj", "up_proj", "down_proj",], |
|
|
lora_alpha = 16, |
|
|
lora_dropout = 0, # Dropout = 0 is currently optimized |
|
|
bias = "none", # Bias = "none" is currently optimized |
|
|
use_gradient_checkpointing = True, |
|
|
random_state = 3407, |
|
|
max_seq_length = max_seq_length, |
|
|
) |
|
|
``` |
|
|
|
|
|
The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing |
|
|
|
|
|
|
|
|
Benchmark scores |
|
|
|
|
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |
|
|
|-------------|------:|------|-----:|--------|-----:|---|-----:| |
|
|
|arc_challenge| 1|none | 25|acc |0.7116|± |0.0132| |
|
|
| | |none | 25|acc_norm|0.7346|± |0.0129| |
|
|
|
|
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |
|
|
|---------|------:|------|-----:|--------|-----:|---|-----:| |
|
|
|hellaswag| 1|none | 10|acc |0.7222|± |0.0045| |
|
|
| | |none | 10|acc_norm|0.8865|± |0.0032| |
|
|
|
|
|
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr| |
|
|
|--------------|------:|------|-----:|------|-----:|---|-----:| |
|
|
|truthfulqa_mc2| 2|none | 0|acc |0.7043|± | 0.015| |
|
|
|
|
|
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |
|
|
|------------------|-------|------|-----:|------|-----:|---|-----:| |
|
|
|mmlu |N/A |none | 0|acc |0.6367|± |0.1258| |
|
|
| - humanities |N/A |none | 5|acc |0.5968|± |0.1122| |
|
|
| - other |N/A |none | 5|acc |0.7049|± |0.1123| |
|
|
| - social_sciences|N/A |none | 5|acc |0.7374|± |0.0774| |
|
|
| - stem |N/A |none | 5|acc |0.5309|± |0.1373| |
|
|
|
|
|
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr| |
|
|
|----------|------:|------|-----:|------|-----:|---|-----:| |
|
|
|winogrande| 1|none | 5|acc |0.8477|± |0.0101| |
|
|
|
|
|
|Tasks|Version| Filter |n-shot| Metric |Value | |Stderr| |
|
|
|-----|------:|----------|-----:|-----------|-----:|---|-----:| |
|
|
|gsm8k| 2|get-answer| 5|exact_match|0.7468|± | 0.012| |
|
|
|
|
|
|
|
|
|
|
|
Average 75.94 |