| | --- |
| | library_name: transformers |
| | license: apache-2.0 |
| | datasets: |
| | - Intel/orca_dpo_pairs |
| | language: |
| | - en |
| | --- |
| | |
| | # Model Card for Model ID |
| |
|
| | This is a DPO finetune of Mistral 7b-instruct0.2 following the article: https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac |
| |
|
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. |
| |
|
| | - **Developed by:** Corianas |
| | - **Model type:** [More Information Needed] |
| | - **License:** Apache 2.0 |
| | - **Finetuned from model: mistralai/Mistral-7B-Instruct-v0.2 |
| | |
| | |
| | ## Instruction format |
| | |
| | In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id. |
| | |
| | E.g. |
| | ``` |
| | text = "<s>[INST] What is your favourite condiment? [/INST]" |
| | "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> " |
| | "[INST] Do you have mayonnaise recipes? [/INST]" |
| | ``` |
| | |
| | This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method: |
| | |
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | device = "cuda" # the device to load the model onto |
| | |
| | model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") |
| | tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") |
| | |
| | messages = [ |
| | {"role": "user", "content": "What is your favourite condiment?"}, |
| | {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, |
| | {"role": "user", "content": "Do you have mayonnaise recipes?"} |
| | ] |
| | |
| | encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt") |
| | |
| | model_inputs = encodeds.to(device) |
| | model.to(device) |
| | |
| | generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True) |
| | decoded = tokenizer.batch_decode(generated_ids) |
| | print(decoded[0]) |
| | ``` |
| | |
| | ## Model Architecture |
| | This instruction model is based on Mistral-7B-v0.1, a transformer model with the following architecture choices: |
| | - Grouped-Query Attention |
| | - Sliding-Window Attention |
| | - Byte-fallback BPE tokenizer |
| | |
| | ## How to Get Started with the Model |
| | |
| | Use the code below to get started with the model. |
| | |
| | [More Information Needed] |
| | |
| | ## Training Details |
| | |
| | ### Training Data |
| | |
| | Intel/orca_dpo_pairs |
| | |
| | ### Training Procedure |
| | |
| | https://medium.com/towards-data-science/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac |
| | |
| | #### Preprocessing [optional] |
| | |
| | def chatml_format(example): |
| | # Format system |
| | if len(example['system']) > 0: |
| | message = {"role": "user", "content": f"{example['system']}\n{example['question']}"} |
| | prompt = tokenizer.apply_chat_template([message], tokenize=False) |
| | else: |
| | # Format instruction |
| | message = {"role": "user", "content": example['question']} |
| | prompt = tokenizer.apply_chat_template([message], tokenize=False, add_generation_prompt=True) |
| | |
| | # Format chosen answer |
| | chosen = example['chosen'] + tokenizer.eos_token |
| | |
| | # Format rejected answer |
| | rejected = example['rejected'] + tokenizer.eos_token |
| | |
| | return { |
| | "prompt": prompt, |
| | "chosen": chosen, |
| | "rejected": rejected, |
| | } |
| | |
| | |
| | #### Training Hyperparameters |
| | |
| | training_args = TrainingArguments( |
| | per_device_train_batch_size=4, |
| | gradient_accumulation_steps=4, |
| | gradient_checkpointing=True, |
| | learning_rate=5e-5, |
| | lr_scheduler_type="cosine", |
| | max_steps=200, |
| | save_strategy="no", |
| | logging_steps=1, |
| | output_dir=new_model, |
| | optim="paged_adamw_32bit", |
| | warmup_steps=100, |
| | bf16=True, |
| | report_to="wandb", |
| | ) |
| | |
| | ## Evaluation |
| | |
| | <!-- This section describes the evaluation protocols and provides the results. --> |
| | |
| | ### Testing Data, Factors & Metrics |
| | |
| | #### Testing Data |
| | |
| | <!-- This should link to a Dataset Card if possible. --> |
| | |
| | [More Information Needed] |
| | |
| | #### Factors |
| | |
| | <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> |
| | |
| | [More Information Needed] |
| | |
| | #### Metrics |
| | |
| | <!-- These are the evaluation metrics being used, ideally with a description of why. --> |
| | |
| | [More Information Needed] |
| | |
| | ### Results |
| | |
| | [More Information Needed] |
| | |
| | #### Summary |
| | |
| | |
| | |
| | ## Model Examination [optional] |
| | |
| | <!-- Relevant interpretability work for the model goes here --> |
| | |
| | [More Information Needed] |
| | |
| | ## Environmental Impact |
| | |
| | <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
| | |
| | Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
| | |
| | - **Hardware Type:** [More Information Needed] |
| | - **Hours used:** [More Information Needed] |
| | - **Cloud Provider:** [More Information Needed] |
| | - **Compute Region:** [More Information Needed] |
| | - **Carbon Emitted:** [More Information Needed] |
| |
|
| | ## Technical Specifications [optional] |
| |
|
| | ### Model Architecture and Objective |
| |
|
| | [More Information Needed] |
| |
|
| | ### Compute Infrastructure |
| |
|
| | [More Information Needed] |
| |
|
| | #### Hardware |
| |
|
| | [More Information Needed] |
| |
|
| | #### Software |
| |
|
| | [More Information Needed] |
| |
|
| | ## Citation [optional] |
| |
|
| | <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
| |
|
| | **BibTeX:** |
| |
|
| | [More Information Needed] |
| |
|
| | **APA:** |
| |
|
| | [More Information Needed] |
| |
|
| | ## Glossary [optional] |
| |
|
| | <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> |
| |
|
| | [More Information Needed] |
| |
|
| | ## More Information [optional] |
| |
|
| | [More Information Needed] |
| |
|
| | ## Model Card Authors [optional] |
| |
|
| | [More Information Needed] |
| |
|
| | ## Model Card Contact |
| |
|
| | [More Information Needed] |