File size: 13,106 Bytes

---
language: fr
license: apache-2.0
datasets:
- CATIE-AQ/DFP
library_name: peft
co2_eq_emissions: 110
base_model:
- mistralai/Mistral-7B-v0.1
pipeline_tag: text-generation
---

# Adapter for Mistral-7B-v0.1 fine-tuned on DFP

## Adapter Description

This adapter was created by using the [PEFT](https://github.com/huggingface/peft) library
and allows the base model [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) to be fine-tuned on 1,280,000 random rows of the [Dataset of French Prompts (DFP)](https://huggingface.co/datasets/CATIE-AQ/DFP) using the [LoRA](https://arxiv.org/abs/2106.09685) method.
We have trained 21,260,288 parameters out of 7,262,992,384, i.e. 0.23%.

## Usage

### Code

```py
import torch

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

config = PeftConfig.from_pretrained("CATIE-AQ/mistral7B-FR-InstructNLP-LoRA")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
model = PeftModel.from_pretrained(model, "CATIE-AQ/mistral7B-FR-InstructNLP-LoRA")

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenizer.pad_token = tokenizer.eos_token

prompt = '''Prenez l'énoncé suivant comme vrai : "Euh, non, pour être honnête, je n'ai jamais lu aucun des livres que j'étais supposé lire."\n Alors l'énoncé suivant : "Je n'ai pas lu beaucoup de livres." est "vrai", "faux", ou "incertain" ?'''
model_input = tokenizer(prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100, pad_token_id=2)[0], skip_special_tokens=True))
```

### Examples

Some examples from the test split of [Dataset of French Prompts (DFP)](https://huggingface.co/datasets/CATIE-AQ/DFP):  

**Input**:
```
Prenez l'énoncé suivant comme vrai : "Euh, non, pour être honnête, je n'ai jamais lu aucun des livres que j'étais supposé lire."\n Alors l'énoncé suivant : "Je n'ai pas lu beaucoup de livres." est "vrai", "faux", ou "incertain" ?
```
**Output**:
```
vrai
```


**Input**:
```
Commentaire du produit : "Voilà un film excellement tourné, scénarisé et, surtout, joué –il nous importe tellement qu'un film soit bien joué, indépendamment du personnage, du côté de la morale où il est, voire de l'histoire ! Excellement joué y compris par les acteurs secondaires, comme Dean Norris (Under The Dome, Breaking Bad) ou Vincent d\'Onofrio (New York Section Criminelle). C'est d'ailleurs amusant de remarquer, parmi ces acteurs secondaires, le patriarche de la famille de policiers new-yorkais de la série télé « Blue Bloods » (Len Cariou) : c'est lui qui donne justement le la du film, un la qui paraît subversif mais qui n'est que pro-arme, bien américain (Cf. le fameux deuxième amendement de la Constitution des États-Unis) ; c'est lui qui est l'étincelle quand il démontre sur le terrain qu'il « vaut se protéger soi-même ». Protéger les gentils, et les siens, c'est en effet le sujet du film. Et c'est aussi notre problème à nous (comment ferions-nous, nous ?). Dès la lecture du synopsis, Death Wish rappelle "Un justicier dans la ville" avec Charles Bronson ; d'ailleurs au Québec, ils ont repris ce titre, et ce n'est pas idiot vu que le titre en anglais ne vaut pas mieux que sa traduction en français (pulsion de mort). Mais peu importe qu'il s'agisse d'un remake –d'ailleurs qui se souvient du film avec Charles Bronson –à revoir peut-être? Il s'agit avant tout de la rage d'être entouré d'abrutis et de criminels, rage bien mise en scène, peu à peu, dès le début, comme si tout y participait (les ombres de Chicago, les phares dans la nuit, la lourdeur des nuages bas). Mais pas de rage chez le héros principal (Bruce Willis), qui ne se voit pas entouré d'abrutis et de criminels (un peu à cause de son métier). Il s'agit ensuite de la force naturelle de l'intelligence sur l'abruti, et l'on est satisfait que ce dernier se fasse avoir en toute beauté. Il s'agit enfin du risque de glissade (vers la vengeance aveugle), traduite par quelques images à ne pas mettre sous tous les yeux." Ce commentaire dépeint le produit sous un angle négatif ou positif ?
```
**Output**:
```
pos
```


**Input**:
```
Parmi la liste d'intentions suivantes :  "audio_volume_other, play_music, iot_hue_lighton, general_greet, calendar_set, audio_volume_down, social_query, audio_volume_mute, iot_wemo_on, iot_hue_lightup, audio_volume_up, iot_coffee, takeaway_query, qa_maths, play_game, cooking_query, iot_hue_lightdim, iot_wemo_off, music_settings, weather_query, news_query, alarm_remove, social_post, recommendation_events, transport_taxi, takeaway_order, music_query, calendar_query, lists_query, qa_currency, recommendation_movies, general_joke, recommendation_locations, email_querycontact, lists_remove, play_audiobook, email_addcontact, lists_createoradd, play_radio, qa_stock, alarm_query, email_sendemail, general_quirky, music_likeness, cooking_recipe, email_query, datetime_query, transport_traffic, play_podcasts, iot_hue_lightchange, calendar_remove, transport_query, transport_ticket, qa_factoid, iot_cleaning, alarm_set, datetime_convert, iot_hue_lightoff, qa_definition, music_dislikeness",\n indiquer celle présente dans le texte : quel jour de la semaine est le quinze août ? 

```

**Output**:
```
datetime_query
```


**Input**:
```
Simplifier la phrase suivante en la divisant tout en conservant son sens complet : "Le Centre international de science et de technologie a la personnalite juridique et jouit de la capacite juridique la plus etendue reconnue aux personnes morales en vertu des lois applicables dans la Communaute et, en particulier, peut contracter, acquerir ou aliener des biens meubles et immeubles et etre partie a des poursuites judiciaires." Version simplifiée :
```
**Output**:
```
Le Centre international pour la science et la technologie est dote de la personnalite juridique.  Il jouit de toute la capacite reconnue aux personnes morales par les lois applicables dans la Communaute et est ainsi plus particulierement habilite a contracter, a acquerir ou aliener des biens meubles ou immeubles et a ester en justice.
```

### In practice
This adapter was trained quickly (in just 11h), with a view of PoC and testing the recently released Mistral model. 
More complete work would involve training on more data (1M280 lines used, whereas DFP contains over 113M) and for longer (see image below, where the loss function should be able to decrease further).
It would also be possible to test other adapters and hyperparameters.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/637b5532a38fc0e66f7f39af/zi48FTSow2F-u4gIGIg1b.png)


## Training procedure

```
import os
from datasets import load_dataset
import torch
import accelerate
from transformers import AutoTokenizer, MistralForCausalLM, BitsAndBytesConfig, Trainer, TrainingArguments, DataCollatorForLanguageModeling, DataCollatorWithPadding

from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

os.environ["WANDB_PROJECT"] = "mistral-7B-FR-Instruct-LORA"


# Load tokenizer and data
model_name = "mistralai/Mistral-7B-v0.1"
max_length=1024

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    model_max_length=max_length,
    padding_side="left", # inportant for causality
    add_eos_token=True)

tokenizer.pad_token = tokenizer.eos_token

def preprocess_data(x):

    inputs = x["inputs"]
    targets = x["targets"]

    prompts = [inputs[i] + " " + targets[i] for i in range(len(inputs))]

    inputs = tokenizer(
        prompts,
        truncation=True,
        max_length=max_length,
        padding=False
    )

    return inputs

# Load and tokenize data

train_dataset = load_dataset("CATIE-AQ/DFP", split="train", num_proc=16)
valid_dataset = load_dataset("CATIE-AQ/DFP", split="validation", num_proc=16)

# Sample a random subset
train_dataset = train_dataset.shuffle().select(range(1280000))
valid_dataset = valid_dataset.shuffle().select(range(500))

tokenized_train_dataset = train_dataset.map(preprocess_data, remove_columns=train_dataset.column_names, batched=True, batch_size=20)
tokenized_val_dataset = valid_dataset.map(preprocess_data, remove_columns=valid_dataset.column_names, batched=True, batch_size=20)

tokenized_train_dataset = tokenized_train_dataset.with_format("torch")
tokenized_val_dataset = tokenized_val_dataset.with_format("torch")

# Load model

# Optionnal quantization for QLoRA
'''bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)'''

# Flash Attention is only available on Ampere architectures (A100)!
model = MistralForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, use_flash_attention_2=True)

# Prepare LoRA

config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    bias="none",
    lora_dropout=0.05,
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, config)

training_args = TrainingArguments(
        output_dir="mistral7B-FR-Instruct",
        remove_unused_columns=True,
        warmup_steps=1000,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=32,
        max_steps=10000,
        learning_rate=5e-5,
        lr_scheduler_type="linear",
        logging_steps=50,
        fp16=True,
        optim="adamw_torch", # paged_adamw_8bit for QLoRA
        logging_dir="./logs",
        save_strategy="steps",
        save_steps=1000,
        evaluation_strategy="steps",
        eval_steps=500,
        do_eval=True,
        report_to="wandb"
    )

class DynamicDataCollator:

    def __init__(self, tokenizer):

        self.tokenizer = tokenizer

    def __call__(self, features):

        batch = self.tokenizer.pad(
            features,
            padding="longest",
            max_length=max_length,
            pad_to_multiple_of=8
        )

        labels = batch["input_ids"].clone()
        labels[labels == self.tokenizer.pad_token_id] = -100 # ignore padding indices for the loss
        labels[:, -1] = self.tokenizer.eos_token_id  # except final eos
        batch["labels"] = labels

        return batch

trainer = Trainer(
    model=model,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_val_dataset,
    args=training_args,
    data_collator=DynamicDataCollator(tokenizer)
)

model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()
```

## Environmental Impact

*Carbon emissions were estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.*

- **Hardware Type:** A100 PCIe 40/80GB
- **Hours used:** 11h
- **Cloud Provider:** Private Infrastructure
- **Carbon Efficiency (kg/kWh):** 0.041kg (estimated from [electricitymaps](https://app.electricitymaps.com/zone/FR) for the day of October 6, 2023.)
- **Carbon Emitted** *(Power consumption x Time x Carbon produced based on location of power grid)*: 0.11 kg eq. CO2

## Citations
### PEFT library
```
@Misc{peft,
  title =        {PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods},
  author =       {Sourab Mangrulkar and Sylvain Gugger and Lysandre Debut and Younes Belkada and Sayak Paul and Benjamin Bossan},
  howpublished = {\url{https://github.com/huggingface/peft}},
  year =         {2022}
}
```

### Mistral-7B-Instruct-v0.1
```
@misc{jiang2023mistral,
      title={Mistral 7B}, 
      author={Albert Q. Jiang and Alexandre Sablayrolles and Arthur Mensch and Chris Bamford and Devendra Singh Chaplot and Diego de las Casas and Florian Bressand and Gianna Lengyel and Guillaume Lample and Lucile Saulnier and Lélio Renard Lavaud and Marie-Anne Lachaux and Pierre Stock and Teven Le Scao and Thibaut Lavril and Thomas Wang and Timothée Lacroix and William El Sayed},
      year={2023},
      eprint={2310.06825},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

### DFP
```
@misc {centre_aquitain_des_technologies_de_l'information_et_electroniques_2023,
	author       = { {Centre Aquitain des Technologies de l'Information et Electroniques} },
	title        = { DFP (Revision 1d24c09) },
	year         = 2023,
	url          = { https://huggingface.co/datasets/CATIE-AQ/DFP },
	doi          = { 10.57967/hf/1200 },
	publisher    = { Hugging Face }
}
```

### LoRA
```
@misc{hu2021lora,
      title={LoRA: Low-Rank Adaptation of Large Language Models}, 
      author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
      year={2021},
      eprint={2106.09685},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```