File size: 2,628 Bytes
06e8851
c919494
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
06e8851
c919494
06e8851
c919494
 
06e8851
c919494
06e8851
c919494
06e8851
744587a
c919494
06e8851
c919494
06e8851
c919494
06e8851
c919494
06e8851
c919494
06e8851
c919494
06e8851
 
c919494
06e8851
c919494
 
06e8851
 
c919494
06e8851
c919494
06e8851
c7248ad
 
 
 
 
 
c919494
 
 
 
 
 
 
 
 
 
 
 
 
06e8851
 
 
c919494
06e8851
c919494
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
license: cc-by-nc-4.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- generated_from_trainer
- classification
- Transformer-heads
- finetune
- chatml
- gpt4
- synthetic data
- distillation
model-index:
- name: Mistral_classification_head_qlora
  results: []
datasets:
- dair-ai/emotion
language:
- en
library_name: transformers
pipeline_tag: text-generation
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Mistral_classification_head_qlora

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e09e72e43b9464c835735f/qna1wMB7CLTe7lfpRy5x3.png)

**Mistral_classification_head_qlora** has a new transformer head attached to it for sequence classification task and then resulting model has been finetuned on [dair-ai/emotion](https://huggingface.co/datasets/dair-ai/emotion) 
dataset using QloRA. The model has been trained for 1 epoch on 1x A40 GPU. The evaluation loss for the **emotion-head-3** attached to it was **1.313**. The base model used was

* **[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)**

This experiment was performed using **[Transformer-heads library](https://github.com/center-for-humans-and-machines/transformer-heads/tree/main)**

### Training Script

The training script for attaching a new transformer head for classification task using QLoRA is following:

[Training Script Colab](https://colab.research.google.com/drive/1rPaG-Q6d_CutPOlKzjsfmPvwebNg_X6i?usp=sharing)


### Evaluating the Emotion-Head-3

For evaluating the transformer head that has been attached to the base model, you can refer to the following colab notebook
[Colab Notebook for Evaluation](https://colab.research.google.com/drive/15UpNnoKJIWjG3G_WJFOQebjpUWyNoPKT?usp=sharing)


### Training hyperparameters

The following hyperparameters were used during training:

train_epochs = 1
eval_epochs = 1
logging_steps = 1
train_batch_size = 4
eval_batch_size = 4

* output_dir="emotion_linear_probe",
* learning_rate=0.00002,
* num_train_epochs=train_epochs,
* logging_steps=logging_steps,
* do_eval=False,
* remove_unused_columns=False,
* optim="paged_adamw_32bit",
* gradient_checkpointing=True,
* lr_scheduler_type="constant",
* ddp_find_unused_parameters=False,
* per_device_train_batch_size=train_batch_size,
* per_device_eval_batch_size=eval_batch_size,
* report_to=["wandb"]



### Framework versions

- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu118
- Datasets 2.17.0
- Tokenizers 0.15.0
- Transfomer-heads