| --- |
| license: mit |
| datasets: |
| - Replete-AI/code_bagel |
| --- |
| # Phi-nut-Butter-Codebagel-v1 |
|
|
|  |
|
|
| ## Model Details |
|
|
| **Model Name:** Phi-nut-Butter-Codebagel-v1 |
| **Base Model:** [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) |
| **Fine-tuning Method:** Supervised Fine-Tuning (SFT) |
| **Dataset:** [Code Bagel](https://huggingface.co/datasets/Replete-AI/code_bagel) |
| **Training Data:** 75,000 randomly selected rows from Code Bagel dataset |
| **Training Duration:** 23 hours |
| **Hardware:** Nvidia RTX A4500 |
| **Epochs:** 3 |
|
|
| ## Training Procedure |
|
|
| This model was fine-tuned to provide better instructions on code. |
|
|
| The training was conducted using PEFT and SFTTrainer on the Code Bagel dataset. |
| Training was completed in 3 epochs over a span of 23 hours on an Nvidia A4500 GPU. |
|
|
| ## Intended Use |
|
|
| This model is designed to improve instruction-following capabilities, particularly for code-related tasks. |
|
|
| ## Getting Started |
|
|
| ## Instruct Template |
| ```bash |
| <|system|> |
| {system_message} <|end|> |
| <|user|> |
| {Prompt) <|end|> |
| <|assistant|> |
| ``` |
|
|
| ### Transfromers |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
| |
| model_name_or_path = "thesven/Phi-nut-Butter-Codebagel-v1" |
| |
| # BitsAndBytesConfig for loading the model in 4-bit precision |
| bnb_config = BitsAndBytesConfig( |
| load_in_4bit=True, |
| bnb_4bit_quant_type="nf4", |
| bnb_4bit_compute_dtype="float16", |
| ) |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_name_or_path, |
| device_map="auto", |
| trust_remote_code=False, |
| revision="main", |
| quantization_config=bnb_config |
| ) |
| model.pad_token = model.config.eos_token_id |
| |
| prompt_template = ''' |
| <|system|> |
| You are an expert developer. Please help me with any coding questions.<|end|> |
| <|user|> |
| Create a function to get the total sum from an array of ints.<|end|> |
| <|assistant|> |
| ''' |
| |
| input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() |
| output = model.generate(inputs=input_ids, temperature=0.1, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=256) |
| |
| generated_text = tokenizer.decode(output[0, len(input_ids[0]):], skip_special_tokens=True) |
| print(generated_text) |
| ``` |