phi-2

@@ -1,858 +0,0 @@
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "Github Project: dstoolkit-phi2-finetune "
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      },
-      "source": [
-        "# Finetune Phi-2"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "This finetuning setup was performed on Azure Machine Learning on a VM with a GPU (Standard_NC6s_v3 - 1 x NVIDIA Tesla V100). Once the enviroment is created using the associated requirements.txt file on the VM, its much cleaner to open this notebook on VS code if you are using AML, or any other means to open a notebook. \n",
-        "\n",
-        "!NOTE: Compared to GPT models, there has been little work done to make the content of Phi-2 safe, therefore beware about using this model out of the box without safety checks/ logic such as Azure Content Moderator."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "#### Enviroment Setup"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "Clone this repo \n",
-        "\n",
-        "    git clone https://github.com/microsoft/dstoolkit-phi2-finetune.git\n",
-        "\n",
-        "Once the repo has been cloned, create a new Python enviroment and activate it\n",
-        "\n",
-        "    python -m virtualenv env\n",
-        "    env\\Scripts\\Activate\n",
-        "\n",
-        "Install Python requirements from requirements.txt\n",
-        "\n",
-        "    pip install -r requirements.txt"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "#### Setup"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 1,
-      "metadata": {
-        "gather": {
-          "logged": 1708528771565
-        }
-      },
-      "outputs": [
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
-            "  from .autonotebook import tqdm as notebook_tqdm\n"
-          ]
-        }
-      ],
-      "source": [
-        "# Import Libraries\n",
-        "from datasets import load_dataset\n",
-        "import torch\n",
-        "from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, DataCollatorForLanguageModeling\n",
-        "from peft import LoraConfig, get_peft_model"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "#### Data Prep\n",
-        "Prepare data into training and validation sets. Usually, in data science, we also have a test set to test our model on unseen data which replicates the real world population. In this example, we can vertify this during inference. Validation set helps us to check that our model isnt overfitting to the train set during the training process. Because our use case involves training a QnA agent, it doesnt make sense to withhold questions from the training set, therefore i will use the entire data for training, and a random subset to vertify it is learning alongside our training metrics. If your use case differs (such as using continuous text), then you may change your files and split type to suit your needs. For our training and validation we need two jsonl files which operate as line level json files.\n",
-        "\n",
-        "This can be formatted as key-value pairs if the text we wish to finetune is in question-answer format, or as a note if we are trying to finetune continuous text.\n",
-        "\n",
-        "see below for both examples (jsonl file looks like so):\n",
-        "\n",
-        "        {\"question\": \"Does the Sun rise in the East or West?\", \"answer\": \"The Sun rises in the East.\"}\n",
-        "        {\"question\": \"What is the biggest UK festival?\", \"answer\": \"The largest UK music festival is Glastonbury.\"}\n",
-        "OR\n",
-        "\n",
-        "        {\"note\": \"continuousTextExample2\"}\n",
-        "        {\"note\": \"continuousTextExample2\"}\n",
-        "\n",
-        "We can break down sections of continuous text to make the training quicker, so each example is smaller. Whether this is done or not, there should be some form of data quality check or preprocessing step. The original paper which introduced Phi-1 (Textbooks are all you need) emphasized the need for high quality data over quantity, and the Phi models are all originally trained on relativly low quantity but high quality Python textbooks.\n",
-        "\n",
-        "For our example, we will use a QnA dataset and therefore the former formatting example above. I have selected the [Microsoft 365 FAQ](https://www.microsoft.com/en-us/microsoft-365/microsoft-365-for-home-and-school-faq) which contains question-answer pairs of commonly asked questions for 365 products. To take this web link and put it in the above format, i utilized  [Azure Language Studio's Question-Answering](https://learn.microsoft.com/en-us/azure/ai-services/language-service/question-answering/overview) service which has the ability to parse a FAQ HTML page into question-answer pairs. These were then exported into a csv file. We can then use code or GPT (with examples) to automate the generation of the required format as a jsonl file. A code example is provided in the cell below."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 2,
-      "metadata": {
-        "gather": {
-          "logged": 1708512707736
-        },
-        "jupyter": {
-          "outputs_hidden": false,
-          "source_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      },
-      "outputs": [],
-      "source": [
-        "# To convert csv/ excel to jsonl - you do not need to run this cell if you have your own data formated as above.\n",
-        "# !NOTE: in csv, every row was another QnA pair and each column was: Question, Answer.\n",
-        "import pandas as pd\n",
-        "import json\n",
-        "from numpy import random\n",
-        "\n",
-        "trainSplit:float=0.7 # set our split between train and validation\n",
-        "\n",
-        "qnaData:object=pd.read_excel(\"QnA_MSFT365.xlsx\") # read in data\n",
-        "jsonList=list() # create empty list to store jsonl structure\n",
-        "for index, row in qnaData.iterrows(): # iterate over rows\n",
-        "    jsonList.append({\"question\": row['Question'], \"answer\": row['Answer']}) # append in required format\n",
-        "\n",
-        "indexSplit:int=int(trainSplit*len(jsonList)) # get the index where the train-val split will occur\n",
-        "random.shuffle(jsonList) # randomise list order so we can split it randomly\n",
-        "\n",
-        "# format into train and validation set\n",
-        "trainSet:list=jsonList # [:indexSplit] # commented as we wish to train over entire set\n",
-        "valSet:list=jsonList[indexSplit:]\n",
-        "\n",
-        "# save train and val\n",
-        "with open(\"train.jsonl\", 'w') as f:\n",
-        "    for item in trainSet:\n",
-        "        f.write(json.dumps(item) + \"\\n\")\n",
-        "with open(\"val.jsonl\", 'w') as f:\n",
-        "    for item in valSet:\n",
-        "        f.write(json.dumps(item) + \"\\n\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 3,
-      "metadata": {},
-      "outputs": [
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "Generating train split: 56 examples [00:00, 744.42 examples/s]\n",
-            "Generating train split: 17 examples [00:00, 6966.60 examples/s]\n"
-          ]
-        }
-      ],
-      "source": [
-        "# Load and Format Data - saved as \"train.jsonl\", \"val.jsonl\"\n",
-        "dataName:str=\"train.jsonl\"\n",
-        "valName:str=\"val.jsonl\"\n",
-        "trainDataset, evalDataset = load_dataset('json', data_files=dataName, split='train'), load_dataset('json', data_files=valName, split='train')\n",
-        "\n",
-        "def formattingFunc(textExample:str) -> str:\n",
-        "    \"\"\"\n",
-        "    This function formats our text to be continuous rather than in json format. The output of this function is submitted directly to Phi-2 for finetuning.\n",
-        "    \"\"\"\n",
-        "    text:str=f\"Question: {textExample['question']}\\nAnswer: {textExample['answer']}\" # if QnA\n",
-        "    # text:str=f\"{example['note']}\" # if continuous text\n",
-        "    return text"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "#### Load Model and Tokenizer\n",
-        "This is the model which will be finetuned - will will be usig Phi-2. We will also adjust the padding in the input data so that we can determine the appropriate max_length of our input tokens. Larger max_length would be more computationally expensive so it may be worth adjusting your training, validation examples if you have large data examples. Each input will be padded with our end of sequence (eos) token."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 7,
-      "metadata": {},
-      "outputs": [
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████��█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00,  3.36s/it]\n",
-            "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
-          ]
-        }
-      ],
-      "source": [
-        "# Load our base model\n",
-        "baseModelName:str=\"microsoft/phi-2\"\n",
-        "\n",
-        "# Load our base model\n",
-        "model:object=AutoModelForCausalLM.from_pretrained(baseModelName,\n",
-        "                                             torch_dtype=torch.float32, # fixes issue in inference related to float16 values producing \"!!!!\" rather than output.\n",
-        "                                             device_map=\"auto\",\n",
-        "                                             trust_remote_code=True,\n",
-        "                                             load_in_8bit=True)\n",
-        "\n",
-        "# Load our tokenizer\n",
-        "tokenizer:object=AutoTokenizer.from_pretrained(\n",
-        "    baseModelName,\n",
-        "    padding_side=\"left\", # add padding so that our input sequences are all the same length. Left means that pad token is repeated until we reach our input text.\n",
-        "    add_eos_token=True, # end of sequence token\n",
-        "    add_bos_token=True, # beginning of sequence token\n",
-        "    use_fast=False,\n",
-        ")\n",
-        "tokenizer.pad_token = tokenizer.eos_token # set out pad token to be the same as eos token"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 8,
-      "metadata": {},
-      "outputs": [
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 56/56 [00:00<00:00, 625.80 examples/s]\n",
-            "Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 511.50 examples/s]\n",
-            "Map: 100%|██████████████████████████████████████████████████████████████████████████████████████��██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 56/56 [00:00<00:00, 627.35 examples/s]\n",
-            "Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████���███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 482.16 examples/s]"
-          ]
-        },
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "| Diff Token Size |\n",
-            "Original Lengths: 261\n",
-            "Adjusted Lengths: 0\n"
-          ]
-        },
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "\n"
-          ]
-        }
-      ],
-      "source": [
-        "def tokenizePrompt(prompt:object) -> dict:\n",
-        "    \"\"\"\n",
-        "    Tokenizes prompt based on prompt and tokenizer.\n",
-        "    \"\"\"\n",
-        "    tokenizedPrompt:dict=tokenizer(formattingFunc(prompt))\n",
-        "    return tokenizedPrompt\n",
-        "\n",
-        "# Format and Tokenize datasets.\n",
-        "tokenizedTrain:dict=trainDataset.map(tokenizePrompt)\n",
-        "tokenizedVal:dict=evalDataset.map(tokenizePrompt)\n",
-        "\n",
-        "# count lengths of both datasets so we can adjust max length\n",
-        "lengthTokens:list=[len(x['input_ids']) for x in tokenizedTrain] # count lengths of tokenizedTrain\n",
-        "if tokenizedVal != None:\n",
-        "    lengthTokens += [len(x['input_ids']) for x in tokenizedVal] # count lengths of tokenizedVal\n",
-        "maxLengthTokens:int=max(lengthTokens) + 2 #  we could also visualise lengthTokens using matplotlib if we wish to see the distribution\n",
-        "tokenDiffOriginal:int=maxLengthTokens-min(lengthTokens) # create metric original\n",
-        "\n",
-        "# this function will set all tokens to the same length using left hand padding and the eos token (setup above)\n",
-        "def tokenizePromptAdjustedLengths(prompt:object):\n",
-        "    \"\"\"\n",
-        "    Tokenizes prompt with adjusted lengths with left handed padding. All sequences will be of the same length which will assist training.\n",
-        "    \"\"\"\n",
-        "    tokenizedResponse = tokenizer(\n",
-        "        formattingFunc(prompt),\n",
-        "        truncation=True,\n",
-        "        max_length=maxLengthTokens,\n",
-        "        padding=\"max_length\",\n",
-        "    )\n",
-        "    return tokenizedResponse\n",
-        "\n",
-        "del tokenizedTrain; del tokenizedVal # clean up old variables\n",
-        "tokenizedTrain:dict=trainDataset.map(tokenizePromptAdjustedLengths) # apply adjusted size tokenization\n",
-        "tokenizedVal:dict=evalDataset.map(tokenizePromptAdjustedLengths)\n",
-        "\n",
-        "# count adjusted size difference\n",
-        "lengthTokens:list=[len(x['input_ids']) for x in tokenizedTrain] # count lengths of tokenizedTrain\n",
-        "if tokenizedVal != None:\n",
-        "    lengthTokens += [len(x['input_ids']) for x in tokenizedVal] # count lengths of tokenizedVal\n",
-        "tokenDiffAdjusted:int=max(lengthTokens)-min(lengthTokens) # create metric adjusted\n",
-        "\n",
-        "print(f\"| Diff Token Size |\\nOriginal Lengths: {tokenDiffOriginal}\\nAdjusted Lengths: {tokenDiffAdjusted}\") # compare size differences using metrics from original and adjusted lengths."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "#### Get Model Infomation and Set up LoRA layers for finetuning.\n",
-        "LoRA (Low-Rank Adaptation) is a finetuning technique which freezes the pre-trained model weights and instead interjects trainable matrices into each layer of the Transformer architecture (https://arxiv.org/abs/2106.09685)."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 9,
-      "metadata": {},
-      "outputs": [
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "Model Architecture:\n",
-            "PeftModelForCausalLM(\n",
-            "  (base_model): LoraModel(\n",
-            "    (model): PhiForCausalLM(\n",
-            "      (model): PhiModel(\n",
-            "        (embed_tokens): Embedding(51200, 2560)\n",
-            "        (embed_dropout): Dropout(p=0.0, inplace=False)\n",
-            "        (layers): ModuleList(\n",
-            "          (0-31): 32 x PhiDecoderLayer(\n",
-            "            (self_attn): PhiAttention(\n",
-            "              (q_proj): Linear8bitLt(in_features=2560, out_features=2560, bias=True)\n",
-            "              (k_proj): Linear8bitLt(in_features=2560, out_features=2560, bias=True)\n",
-            "              (v_proj): Linear8bitLt(in_features=2560, out_features=2560, bias=True)\n",
-            "              (dense): Linear8bitLt(in_features=2560, out_features=2560, bias=True)\n",
-            "              (rotary_emb): PhiRotaryEmbedding()\n",
-            "            )\n",
-            "            (mlp): PhiMLP(\n",
-            "              (activation_fn): NewGELUActivation()\n",
-            "              (fc1): lora.Linear8bitLt(\n",
-            "                (base_layer): Linear8bitLt(in_features=2560, out_features=10240, bias=True)\n",
-            "                (lora_dropout): ModuleDict(\n",
-            "                  (default): Dropout(p=0.05, inplace=False)\n",
-            "                )\n",
-            "                (lora_A): ModuleDict(\n",
-            "                  (default): Linear(in_features=2560, out_features=64, bias=False)\n",
-            "                )\n",
-            "                (lora_B): ModuleDict(\n",
-            "                  (default): Linear(in_features=64, out_features=10240, bias=False)\n",
-            "                )\n",
-            "                (lora_embedding_A): ParameterDict()\n",
-            "                (lora_embedding_B): ParameterDict()\n",
-            "              )\n",
-            "              (fc2): lora.Linear8bitLt(\n",
-            "                (base_layer): Linear8bitLt(in_features=10240, out_features=2560, bias=True)\n",
-            "                (lora_dropout): ModuleDict(\n",
-            "                  (default): Dropout(p=0.05, inplace=False)\n",
-            "                )\n",
-            "                (lora_A): ModuleDict(\n",
-            "                  (default): Linear(in_features=10240, out_features=64, bias=False)\n",
-            "                )\n",
-            "                (lora_B): ModuleDict(\n",
-            "                  (default): Linear(in_features=64, out_features=2560, bias=False)\n",
-            "                )\n",
-            "                (lora_embedding_A): ParameterDict()\n",
-            "                (lora_embedding_B): ParameterDict()\n",
-            "              )\n",
-            "            )\n",
-            "            (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)\n",
-            "            (resid_dropout): Dropout(p=0.1, inplace=False)\n",
-            "          )\n",
-            "        )\n",
-            "        (final_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)\n",
-            "      )\n",
-            "      (lm_head): Linear(in_features=2560, out_features=51200, bias=True)\n",
-            "    )\n",
-            "  )\n",
-            ")\n",
-            "trainable params: 52,428,800 || all params: 2,832,112,640 || trainable%: 1.8512258043521885\n"
-          ]
-        }
-      ],
-      "source": [
-        "loraConfig:object=LoraConfig(\n",
-        "    r=64, # Rank of low-rank matrix, controls the number of parameters trained - a higher rank allowing more parameters to be trained and larger update matrices (and more compute cost). Play with this and see how it effects number of trainable params.\n",
-        "    lora_alpha=16, # LoRA scaing factor of learned weights: alpha/r\n",
-        "    target_modules=[ # modules (eg attention blocks) to apply LoRA matrices.\n",
-        "        \"Wqkv\",\n",
-        "        \"fc1\",\n",
-        "        \"fc2\",\n",
-        "    ],\n",
-        "    bias=\"none\", # should bias parameters also be trained: none, all, lora_only\n",
-        "    lora_dropout=0.05,  # Conventional\n",
-        "    task_type=\"CAUSAL_LM\",\n",
-        ")\n",
-        "\n",
-        "model:object=get_peft_model(model, loraConfig) # parameter-efficient fine tune - freeze pretrained model parameters and add small number of tunable adapters on top.\n",
-        "print(f\"Model Architecture:\\n{model}\")\n",
-        "model.print_trainable_parameters() # print trainable parameters"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "#### Training"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 10,
-      "metadata": {},
-      "outputs": [
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "GPU COUNT: 1\n"
-          ]
-        }
-      ],
-      "source": [
-        "# Setup train run parameters\n",
-        "project:str=\"Finetune\"\n",
-        "modelName:str=baseModelName.replace(\"\\\\\", \"_\").replace(\"/\", \"_\")\n",
-        "run_name:str=f\"{project}-{modelName}\"\n",
-        "output_dir:str=\"./\" + run_name # this will be the dir to store run infomation and model weights\n",
-        "\n",
-        "# get GPU count for CUDA.\n",
-        "print(f\"GPU COUNT: {torch.cuda.device_count()}\")\n",
-        "if torch.cuda.device_count() > 1: # If more than 1 GPU\n",
-        "    model.is_parallelizable = True\n",
-        "    model.model_parallel = True"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 11,
-      "metadata": {},
-      "outputs": [
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n"
-          ]
-        },
-        {
-          "data": {
-            "text/html": [
-              "\n",
-              "    <div>\n",
-              "      \n",
-              "      <progress value='1000' max='1000' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
-              "      [1000/1000 18:37, Epoch 35/36]\n",
-              "    </div>\n",
-              "    <table border=\"1\" class=\"dataframe\">\n",
-              "  <thead>\n",
-              " <tr style=\"text-align: left;\">\n",
-              "      <th>Step</th>\n",
-              "      <th>Training Loss</th>\n",
-              "      <th>Validation Loss</th>\n",
-              "    </tr>\n",
-              "  </thead>\n",
-              "  <tbody>\n",
-              "    <tr>\n",
-              "      <td>50</td>\n",
-              "      <td>2.162500</td>\n",
-              "      <td>2.003207</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>100</td>\n",
-              "      <td>1.779000</td>\n",
-              "      <td>1.718166</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>150</td>\n",
-              "      <td>1.593500</td>\n",
-              "      <td>1.527336</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>200</td>\n",
-              "      <td>1.382200</td>\n",
-              "      <td>1.365474</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>250</td>\n",
-              "      <td>1.283400</td>\n",
-              "      <td>1.209522</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>300</td>\n",
-              "      <td>1.150800</td>\n",
-              "      <td>1.050137</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>350</td>\n",
-              "      <td>1.016200</td>\n",
-              "      <td>0.932797</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>400</td>\n",
-              "      <td>0.947200</td>\n",
-              "      <td>0.808629</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>450</td>\n",
-              "      <td>0.815600</td>\n",
-              "      <td>0.709524</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>500</td>\n",
-              "      <td>0.757900</td>\n",
-              "      <td>0.616750</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>550</td>\n",
-              "      <td>0.663500</td>\n",
-              "      <td>0.544908</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>600</td>\n",
-              "      <td>0.612800</td>\n",
-              "      <td>0.482578</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>650</td>\n",
-              "      <td>0.604100</td>\n",
-              "      <td>0.434236</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>700</td>\n",
-              "      <td>0.513200</td>\n",
-              "      <td>0.397516</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>750</td>\n",
-              "      <td>0.484000</td>\n",
-              "      <td>0.360740</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>800</td>\n",
-              "      <td>0.458000</td>\n",
-              "      <td>0.340060</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>850</td>\n",
-              "      <td>0.453700</td>\n",
-              "      <td>0.317527</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>900</td>\n",
-              "      <td>0.426400</td>\n",
-              "      <td>0.308004</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>950</td>\n",
-              "      <td>0.417900</td>\n",
-              "      <td>0.297917</td>\n",
-              "    </tr>\n",
-              "    <tr>\n",
-              "      <td>1000</td>\n",
-              "      <td>0.422100</td>\n",
-              "      <td>0.298028</td>\n",
-              "    </tr>\n",
-              "  </tbody>\n",
-              "</table><p>"
-            ],
-            "text/plain": [
-              "<IPython.core.display.HTML object>"
-            ]
-          },
-          "metadata": {},
-          "output_type": "display_data"
-        },
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-50 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-100 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-150 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-200 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-250 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-300 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-350 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-400 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-450 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-500 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-550 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-600 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-650 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "Checkpoint destination directory ./Finetune-microsoft_phi-2/checkpoint-700 already exists and is non-empty.Saving will proceed but saved results may be invalid.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n"
-          ]
-        },
-        {
-          "data": {
-            "text/plain": [
-              "TrainOutput(global_step=1000, training_loss=0.8971948947906494, metrics={'train_runtime': 1119.2081, 'train_samples_per_second': 1.787, 'train_steps_per_second': 0.893, 'total_flos': 9658921328640000.0, 'train_loss': 0.8971948947906494, 'epoch': 35.71})"
-            ]
-          },
-          "execution_count": 11,
-          "metadata": {},
-          "output_type": "execute_result"
-        }
-      ],
-      "source": [
-        "stepsSaveEvalLoss:int=50\n",
-        "numberStepPartitions:int=20 # stepsSaveEvalLoss muliplied by numberStepPartitions gets max_steps - done so that the last step is always a multiple of stepsSaveEvalLoss and it saves.\n",
-        "max_steps:int=stepsSaveEvalLoss*numberStepPartitions\n",
-        "trainer:object=Trainer(\n",
-        "    model=model,\n",
-        "    train_dataset=tokenizedTrain,\n",
-        "    eval_dataset=tokenizedVal,\n",
-        "    args=TrainingArguments(\n",
-        "        output_dir=output_dir, # output dir defined above\n",
-        "        warmup_steps=1, # number of steps for the warmup phase where the learning rate is gradually increased from a low value to the maximum value where normal schedule begins - can improve the stability and performance.\n",
-        "        per_device_train_batch_size=2, # specifies the batch size per device for training. It should be an integer that is greater than zero.\n",
-        "        gradient_accumulation_steps=1, # specifies the number of steps to accumulate gradients before performing a backward and an optimizer step. It should be an integer that is greater than zero. The effective batch size is the product of this argument and the per_device_train_batch_size\n",
-        "        max_steps=max_steps, # max number of training steps\n",
-        "        learning_rate=2.5e-5, # aim for small LR for finetuning scenarios\n",
-        "        optim=\"paged_adamw_8bit\", # optimiser type to adjust LR during training\n",
-        "        logging_dir=f\"{output_dir}/logs\", # Where logs are stored for training\n",
-        "        logging_steps=stepsSaveEvalLoss, # train loss cadence\n",
-        "        do_eval=True, # perform eval on eval set\n",
-        "        evaluation_strategy=\"steps\", # eval model loss set to steps\n",
-        "        eval_steps=stepsSaveEvalLoss, # eval loss cadence\n",
-        "        save_strategy=\"steps\", # checkpoint model progress strategy set to steps\n",
-        "        save_steps=stepsSaveEvalLoss, # save every x steps cadence\n",
-        "    ),\n",
-        "    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False), # mlm - masked language modeling\n",
-        ")\n",
-        "model.config.use_cache = False  # silence warnings for training\n",
-        "\n",
-        "# Train - The output should be a table with a row at stepsSaveEvalLoss cadence and columns as Step, Training loss and Validation Loss.\n",
-        "trainer.train()"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "#### Inference of trained model"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "Kill the GPU process to completely clear memory:\n",
-        "\n",
-        "    nvidia smi > kill [PID]\n",
-        "OR\n",
-        "\n",
-        "    Kernel > Restart Kernel"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 1,
-      "metadata": {},
-      "outputs": [
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
-            "  from .autonotebook import tqdm as notebook_tqdm\n"
-          ]
-        }
-      ],
-      "source": [
-        "# Empty VRAM and clear model, trainer variables\n",
-        "try: \n",
-        "    del model\n",
-        "    del tokenizer\n",
-        "    del trainer\n",
-        "    import gc\n",
-        "    gc.collect()\n",
-        "except:\n",
-        "    pass\n",
-        "\n",
-        "# load libraries for inference\n",
-        "import torch\n",
-        "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
-        "from peft import PeftModel\n",
-        "\n",
-        "# memory cleared so recreate parameters\n",
-        "baseModelName:str=\"microsoft/phi-2\"\n",
-        "project:str=\"Finetune\"\n",
-        "max_steps:int=1000\n",
-        "\n",
-        "modelName:str=baseModelName.replace(\"\\\\\", \"_\").replace(\"/\", \"_\")\n",
-        "run_name:str=f\"{project}-{modelName}\"\n",
-        "output_dir:str=\"./\" + run_name # this will be the dir to store run infomation and model weights"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 4,
-      "metadata": {},
-      "outputs": [
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████��█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00,  3.32s/it]\n",
-            "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
-          ]
-        }
-      ],
-      "source": [
-        "# reload our base model and tokeniser\n",
-        "modelInference:object=AutoModelForCausalLM.from_pretrained(\n",
-        "    baseModelName,  # Phi2, same as before\n",
-        "    torch_dtype=torch.float32, # fixes issue in inference related to float16 values producing \"!!!!\" rather than output.\n",
-        "    device_map=\"auto\",                                      \n",
-        "    trust_remote_code=True,\n",
-        "    load_in_8bit=True,\n",
-        ")\n",
-        "tokenizerInference:object=AutoTokenizer.from_pretrained(baseModelName,\n",
-        "                                               add_bos_token=True,\n",
-        "                                               trust_remote_code=True,\n",
-        "                                               use_fast=False)\n",
-        "tokenizerInference.pad_token = tokenizerInference.eos_token\n",
-        "\n",
-        "# load finetuned QLoRA adapters which were saved during training\n",
-        "finetunedFolder:str=f\"{output_dir}/checkpoint-{max_steps}\" # get latest model by default (can change if you see better performance on other models)\n",
-        "FTmodel:object=PeftModel.from_pretrained(modelInference, finetunedFolder) # load FT model"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "We can play with the repetition penalty, which can influence the likelihood of repeated content. A higher repetition penalty makes the model less likely to generate repeated phrases or words in the text, while a lower repetition penalty allows more repetition.\n",
-        "\n",
-        "!Note if an issue persists where \"!!!!\" is produced instead of text output from the model it is related to an issue setting torch_dtype=torch.float16 rather than torch.float32 when loading the model. See here for more details: https://huggingface.co/microsoft/phi-2/discussions/89\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 5,
-      "metadata": {},
-      "outputs": [
-        {
-          "name": "stderr",
-          "output_type": "stream",
-          "text": [
-            "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n",
-            "/anaconda/envs/phi2ENV/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
-            "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n"
-          ]
-        },
-        {
-          "name": "stdout",
-          "output_type": "stream",
-          "text": [
-            "question: How do I install Microsoft 365 or Office?\n",
-            "answer:  [Install Office](https://go.microsoft.com/fwlink/p/?LinkID=403719) and [Install Microsoft 365](https://go.microsoft.com/fwlink/p/?LinkID=808164) are the best ways to install Microsoft 365 or Office. You can also download and install older versions of Office on PC or Mac for free. Learn more about installing Office apps. [Office Home & Business](https://go.microsoft.com/fwlink/p/?LinkID=808164) and [Office Home & Student](https://go.microsoft.com/fwlink/p/?LinkID=808164) are subscription plans that include the Office apps, along with additional features. Learn more about Microsoft 365 subscriptions. [Office for Mac](https://go.microsoft.com/fwlink/p/?LinkID=808164) and [Office for Windows tablets](https://\n"
-          ]
-        }
-      ],
-      "source": [
-        "# model hyperparameters\n",
-        "repetition_penalty:float=1.0\n",
-        "max_tokens:int=200\n",
-        "\n",
-        "# test a prompt\n",
-        "testPrompt:str=\"How do I install Microsoft 365 or Office?\"\n",
-        "\n",
-        "formattedPrompt:str=f\"question: {testPrompt}\\nanswer: \" # format like training set formatting, see above.\n",
-        "tokenisedPrompt:dict=tokenizerInference(formattedPrompt, return_tensors=\"pt\").to(\"cuda\") # tokenise prompt\n",
-        "FTmodel.eval() # set in inference mode\n",
-        "with torch.no_grad():\n",
-        "    response:str=tokenizerInference.decode(FTmodel.generate(**tokenisedPrompt, max_new_tokens=max_tokens, repetition_penalty=repetition_penalty)[0], skip_special_tokens=True)\n",
-        "    print(response)"
-      ]
-    }
-  ],
-  "metadata": {
-    "kernel_info": {
-      "name": "python3"
-    },
-    "kernelspec": {
-      "display_name": "Python 3 (ipykernel)",
-      "language": "python",
-      "name": "python3"
-    },
-    "language_info": {
-      "codemirror_mode": {
-        "name": "ipython",
-        "version": 3
-      },
-      "file_extension": ".py",
-      "mimetype": "text/x-python",
-      "name": "python",
-      "nbconvert_exporter": "python",
-      "pygments_lexer": "ipython3",
-      "version": "3.8.18"
-    },
-    "microsoft": {
-      "ms_spell_check": {
-        "ms_spell_check_language": "en"
-      }
-    },
-    "nteract": {
-      "version": "nteract-front-end@1.0.0"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 2
-}