debuggin memory leak in notebook...

Browse files

Files changed (13) hide show

.gitattributes +12 -3
checkpoint_dir/checkpoint-1200/README.md +202 -0
checkpoint_dir/checkpoint-1200/adapter_config.json +31 -0
checkpoint_dir/checkpoint-1200/adapter_model.safetensors +3 -0
checkpoint_dir/checkpoint-1200/optimizer.pt +3 -0
checkpoint_dir/checkpoint-1200/rng_state.pth +3 -0
checkpoint_dir/checkpoint-1200/scheduler.pt +3 -0
checkpoint_dir/checkpoint-1200/special_tokens_map.json +24 -0
checkpoint_dir/checkpoint-1200/tokenizer.json +0 -0
checkpoint_dir/checkpoint-1200/tokenizer_config.json +129 -0
checkpoint_dir/checkpoint-1200/trainer_state.json +453 -0
checkpoint_dir/checkpoint-1200/training_args.bin +3 -0
trainingNotebook.ipynb +301 -0

.gitattributes CHANGED Viewed

@@ -1,3 +1,14 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
@@ -33,10 +44,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
-# Auto detect text files and perform LF to CRLF conversion for Windows
-* text=auto
-# Handle specific file types
 *.ipynb text eol=lf
 *.json text eol=lf
 *.py text eol=lf

+# Treat all text files as text and auto-detect line endings
+* text=auto
+# Specific handling for different file types
+*.md text eol=lf
+*.json text eol=lf
+*.bin binary
+*.pt binary
+*.pth binary
+# Existing binary file types with LFS
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+# Specific file types to use LF line endings
 *.ipynb text eol=lf
 *.json text eol=lf
 *.py text eol=lf

checkpoint_dir/checkpoint-1200/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: microsoft/Phi-3-mini-4k-instruct
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint_dir/checkpoint-1200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "microsoft/Phi-3-mini-4k-instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "qkv_proj",
+    "gate_up_proj",
+    "o_proj",
+    "down_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint_dir/checkpoint-1200/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:509e03e583766096cb2af316df026b5160028032ac428d01fc9ab3221c1165e7
+size 50366024

checkpoint_dir/checkpoint-1200/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9643de235a33a2501cf495a506083c7277f374bcfbe93c717fa09b2e9edcbedd
+size 100878458

checkpoint_dir/checkpoint-1200/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fb6f83e7b0b934716a93a31924ee6f0df61bfca245f9915406b8453534851a41
+size 14180

checkpoint_dir/checkpoint-1200/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5683ab95794a7a05318f91f170307760f847ea5b44bfa96e35dd99041b516c87
+size 1064

checkpoint_dir/checkpoint-1200/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<unk>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint_dir/checkpoint-1200/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint_dir/checkpoint-1200/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,129 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": false
+    },
+    "32000": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32001": {
+      "content": "<|assistant|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32002": {
+      "content": "<|placeholder1|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32003": {
+      "content": "<|placeholder2|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32004": {
+      "content": "<|placeholder3|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32005": {
+      "content": "<|placeholder4|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32006": {
+      "content": "<|system|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32007": {
+      "content": "<|end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32008": {
+      "content": "<|placeholder5|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32009": {
+      "content": "<|placeholder6|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32010": {
+      "content": "<|user|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|user|>' + '\n' + message['content'] + '<|end|>' + '\n' + '<|assistant|>' + '\n'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|end|>' + '\n'}}{% endif %}{% endfor %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 2048,
+  "pad_token": "<unk>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

checkpoint_dir/checkpoint-1200/trainer_state.json ADDED Viewed

	@@ -0,0 +1,453 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.01710376282782212,
+  "eval_steps": 500,
+  "global_step": 1200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00028506271379703536,
+      "grad_norm": NaN,
+      "learning_rate": 7.126567844925884e-09,
+      "loss": 0.7555,
+      "step": 20
+    },
+    {
+      "epoch": 0.0005701254275940707,
+      "grad_norm": NaN,
+      "learning_rate": 1.4253135689851768e-08,
+      "loss": 0.0,
+      "step": 40
+    },
+    {
+      "epoch": 0.0008551881413911061,
+      "grad_norm": NaN,
+      "learning_rate": 2.1379703534777654e-08,
+      "loss": 0.0,
+      "step": 60
+    },
+    {
+      "epoch": 0.0011402508551881414,
+      "grad_norm": NaN,
+      "learning_rate": 2.8506271379703537e-08,
+      "loss": 0.0,
+      "step": 80
+    },
+    {
+      "epoch": 0.0014253135689851768,
+      "grad_norm": NaN,
+      "learning_rate": 3.563283922462942e-08,
+      "loss": 0.0,
+      "step": 100
+    },
+    {
+      "epoch": 0.0017103762827822121,
+      "grad_norm": NaN,
+      "learning_rate": 4.275940706955531e-08,
+      "loss": 0.0,
+      "step": 120
+    },
+    {
+      "epoch": 0.0019954389965792475,
+      "grad_norm": NaN,
+      "learning_rate": 4.988597491448119e-08,
+      "loss": 0.0,
+      "step": 140
+    },
+    {
+      "epoch": 0.002280501710376283,
+      "grad_norm": NaN,
+      "learning_rate": 5.701254275940707e-08,
+      "loss": 0.0,
+      "step": 160
+    },
+    {
+      "epoch": 0.002565564424173318,
+      "grad_norm": NaN,
+      "learning_rate": 6.413911060433296e-08,
+      "loss": 0.0,
+      "step": 180
+    },
+    {
+      "epoch": 0.0028506271379703536,
+      "grad_norm": NaN,
+      "learning_rate": 7.126567844925884e-08,
+      "loss": 0.0,
+      "step": 200
+    },
+    {
+      "epoch": 0.003135689851767389,
+      "grad_norm": NaN,
+      "learning_rate": 7.839224629418473e-08,
+      "loss": 0.0,
+      "step": 220
+    },
+    {
+      "epoch": 0.0034207525655644243,
+      "grad_norm": NaN,
+      "learning_rate": 8.551881413911062e-08,
+      "loss": 0.0,
+      "step": 240
+    },
+    {
+      "epoch": 0.0037058152793614596,
+      "grad_norm": NaN,
+      "learning_rate": 9.264538198403649e-08,
+      "loss": 0.0,
+      "step": 260
+    },
+    {
+      "epoch": 0.003990877993158495,
+      "grad_norm": NaN,
+      "learning_rate": 9.977194982896237e-08,
+      "loss": 0.0,
+      "step": 280
+    },
+    {
+      "epoch": 0.00427594070695553,
+      "grad_norm": NaN,
+      "learning_rate": 1.0689851767388827e-07,
+      "loss": 0.0,
+      "step": 300
+    },
+    {
+      "epoch": 0.004561003420752566,
+      "grad_norm": NaN,
+      "learning_rate": 1.1402508551881415e-07,
+      "loss": 0.0,
+      "step": 320
+    },
+    {
+      "epoch": 0.004846066134549601,
+      "grad_norm": NaN,
+      "learning_rate": 1.2115165336374005e-07,
+      "loss": 0.0,
+      "step": 340
+    },
+    {
+      "epoch": 0.005131128848346636,
+      "grad_norm": NaN,
+      "learning_rate": 1.2827822120866592e-07,
+      "loss": 0.0,
+      "step": 360
+    },
+    {
+      "epoch": 0.005416191562143672,
+      "grad_norm": NaN,
+      "learning_rate": 1.3540478905359182e-07,
+      "loss": 0.0,
+      "step": 380
+    },
+    {
+      "epoch": 0.005701254275940707,
+      "grad_norm": NaN,
+      "learning_rate": 1.425313568985177e-07,
+      "loss": 0.0,
+      "step": 400
+    },
+    {
+      "epoch": 0.0059863169897377425,
+      "grad_norm": NaN,
+      "learning_rate": 1.4965792474344356e-07,
+      "loss": 0.0,
+      "step": 420
+    },
+    {
+      "epoch": 0.006271379703534778,
+      "grad_norm": NaN,
+      "learning_rate": 1.5678449258836946e-07,
+      "loss": 0.0,
+      "step": 440
+    },
+    {
+      "epoch": 0.006556442417331813,
+      "grad_norm": NaN,
+      "learning_rate": 1.6391106043329536e-07,
+      "loss": 0.0,
+      "step": 460
+    },
+    {
+      "epoch": 0.0068415051311288486,
+      "grad_norm": NaN,
+      "learning_rate": 1.7103762827822123e-07,
+      "loss": 0.0,
+      "step": 480
+    },
+    {
+      "epoch": 0.007126567844925884,
+      "grad_norm": NaN,
+      "learning_rate": 1.781641961231471e-07,
+      "loss": 0.0,
+      "step": 500
+    },
+    {
+      "epoch": 0.007411630558722919,
+      "grad_norm": NaN,
+      "learning_rate": 1.8529076396807298e-07,
+      "loss": 0.0,
+      "step": 520
+    },
+    {
+      "epoch": 0.007696693272519955,
+      "grad_norm": NaN,
+      "learning_rate": 1.9241733181299888e-07,
+      "loss": 0.0,
+      "step": 540
+    },
+    {
+      "epoch": 0.00798175598631699,
+      "grad_norm": NaN,
+      "learning_rate": 1.9954389965792475e-07,
+      "loss": 0.0,
+      "step": 560
+    },
+    {
+      "epoch": 0.008266818700114024,
+      "grad_norm": NaN,
+      "learning_rate": 2.0667046750285062e-07,
+      "loss": 0.0,
+      "step": 580
+    },
+    {
+      "epoch": 0.00855188141391106,
+      "grad_norm": NaN,
+      "learning_rate": 2.1379703534777655e-07,
+      "loss": 0.0,
+      "step": 600
+    },
+    {
+      "epoch": 0.008836944127708095,
+      "grad_norm": NaN,
+      "learning_rate": 2.2092360319270242e-07,
+      "loss": 0.0,
+      "step": 620
+    },
+    {
+      "epoch": 0.009122006841505131,
+      "grad_norm": NaN,
+      "learning_rate": 2.280501710376283e-07,
+      "loss": 0.0,
+      "step": 640
+    },
+    {
+      "epoch": 0.009407069555302166,
+      "grad_norm": NaN,
+      "learning_rate": 2.351767388825542e-07,
+      "loss": 0.0,
+      "step": 660
+    },
+    {
+      "epoch": 0.009692132269099202,
+      "grad_norm": NaN,
+      "learning_rate": 2.423033067274801e-07,
+      "loss": 0.0,
+      "step": 680
+    },
+    {
+      "epoch": 0.009977194982896237,
+      "grad_norm": NaN,
+      "learning_rate": 2.4942987457240596e-07,
+      "loss": 0.0,
+      "step": 700
+    },
+    {
+      "epoch": 0.010262257696693273,
+      "grad_norm": NaN,
+      "learning_rate": 2.5655644241733184e-07,
+      "loss": 0.0,
+      "step": 720
+    },
+    {
+      "epoch": 0.010547320410490307,
+      "grad_norm": NaN,
+      "learning_rate": 2.636830102622577e-07,
+      "loss": 0.0,
+      "step": 740
+    },
+    {
+      "epoch": 0.010832383124287344,
+      "grad_norm": NaN,
+      "learning_rate": 2.7080957810718363e-07,
+      "loss": 0.0,
+      "step": 760
+    },
+    {
+      "epoch": 0.011117445838084378,
+      "grad_norm": NaN,
+      "learning_rate": 2.779361459521095e-07,
+      "loss": 0.0,
+      "step": 780
+    },
+    {
+      "epoch": 0.011402508551881414,
+      "grad_norm": NaN,
+      "learning_rate": 2.850627137970354e-07,
+      "loss": 0.0,
+      "step": 800
+    },
+    {
+      "epoch": 0.011687571265678449,
+      "grad_norm": NaN,
+      "learning_rate": 2.9218928164196125e-07,
+      "loss": 0.0,
+      "step": 820
+    },
+    {
+      "epoch": 0.011972633979475485,
+      "grad_norm": NaN,
+      "learning_rate": 2.993158494868871e-07,
+      "loss": 0.0,
+      "step": 840
+    },
+    {
+      "epoch": 0.01225769669327252,
+      "grad_norm": NaN,
+      "learning_rate": 3.06442417331813e-07,
+      "loss": 0.0,
+      "step": 860
+    },
+    {
+      "epoch": 0.012542759407069556,
+      "grad_norm": NaN,
+      "learning_rate": 3.135689851767389e-07,
+      "loss": 0.0,
+      "step": 880
+    },
+    {
+      "epoch": 0.01282782212086659,
+      "grad_norm": NaN,
+      "learning_rate": 3.206955530216648e-07,
+      "loss": 0.0,
+      "step": 900
+    },
+    {
+      "epoch": 0.013112884834663626,
+      "grad_norm": NaN,
+      "learning_rate": 3.278221208665907e-07,
+      "loss": 0.0,
+      "step": 920
+    },
+    {
+      "epoch": 0.013397947548460661,
+      "grad_norm": NaN,
+      "learning_rate": 3.349486887115166e-07,
+      "loss": 0.0,
+      "step": 940
+    },
+    {
+      "epoch": 0.013683010262257697,
+      "grad_norm": NaN,
+      "learning_rate": 3.4207525655644247e-07,
+      "loss": 0.0,
+      "step": 960
+    },
+    {
+      "epoch": 0.013968072976054732,
+      "grad_norm": NaN,
+      "learning_rate": 3.4920182440136834e-07,
+      "loss": 0.0,
+      "step": 980
+    },
+    {
+      "epoch": 0.014253135689851768,
+      "grad_norm": NaN,
+      "learning_rate": 3.563283922462942e-07,
+      "loss": 0.0,
+      "step": 1000
+    },
+    {
+      "epoch": 0.014538198403648802,
+      "grad_norm": NaN,
+      "learning_rate": 3.634549600912201e-07,
+      "loss": 0.0,
+      "step": 1020
+    },
+    {
+      "epoch": 0.014823261117445839,
+      "grad_norm": NaN,
+      "learning_rate": 3.7058152793614596e-07,
+      "loss": 0.0,
+      "step": 1040
+    },
+    {
+      "epoch": 0.015108323831242873,
+      "grad_norm": NaN,
+      "learning_rate": 3.777080957810719e-07,
+      "loss": 0.0,
+      "step": 1060
+    },
+    {
+      "epoch": 0.01539338654503991,
+      "grad_norm": NaN,
+      "learning_rate": 3.8483466362599775e-07,
+      "loss": 0.0,
+      "step": 1080
+    },
+    {
+      "epoch": 0.015678449258836945,
+      "grad_norm": NaN,
+      "learning_rate": 3.919612314709236e-07,
+      "loss": 0.0,
+      "step": 1100
+    },
+    {
+      "epoch": 0.01596351197263398,
+      "grad_norm": NaN,
+      "learning_rate": 3.990877993158495e-07,
+      "loss": 0.0,
+      "step": 1120
+    },
+    {
+      "epoch": 0.016248574686431014,
+      "grad_norm": NaN,
+      "learning_rate": 4.0621436716077537e-07,
+      "loss": 0.0,
+      "step": 1140
+    },
+    {
+      "epoch": 0.01653363740022805,
+      "grad_norm": NaN,
+      "learning_rate": 4.1334093500570124e-07,
+      "loss": 0.0,
+      "step": 1160
+    },
+    {
+      "epoch": 0.016818700114025087,
+      "grad_norm": NaN,
+      "learning_rate": 4.204675028506271e-07,
+      "loss": 0.0,
+      "step": 1180
+    },
+    {
+      "epoch": 0.01710376282782212,
+      "grad_norm": NaN,
+      "learning_rate": 4.275940706955531e-07,
+      "loss": 0.0,
+      "step": 1200
+    }
+  ],
+  "logging_steps": 20,
+  "max_steps": 70160,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 100,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.105254905020416e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint_dir/checkpoint-1200/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa52f43a2566203d13a960b76a6ee31b373d6f336edbcfc792f40231efec3b62
+size 5112

trainingNotebook.ipynb ADDED Viewed

	@@ -0,0 +1,301 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sys\n",
+    "# import logging\n",
+    "\n",
+    "import datasets\n",
+    "from datasets import load_dataset\n",
+    "from peft import LoraConfig\n",
+    "import torch\n",
+    "import transformers\n",
+    "from trl import SFTTrainer\n",
+    "from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "training_config = {\n",
+    "    \"bf16\": False,\n",
+    "    \"do_eval\": False,\n",
+    "    \"learning_rate\": 5.0e-06,\n",
+    "    \"log_level\": \"info\",\n",
+    "    \"logging_steps\": 20,\n",
+    "    \"logging_strategy\": \"steps\",\n",
+    "    \"lr_scheduler_type\": \"cosine\",\n",
+    "    \"num_train_epochs\": 1,\n",
+    "    \"max_steps\": -1,\n",
+    "    \"output_dir\": \"./checkpoint_dir\",\n",
+    "    \"overwrite_output_dir\": True,\n",
+    "    \"per_device_eval_batch_size\": 2,  # Reduce batch size to lower memory usage\n",
+    "    \"per_device_train_batch_size\": 2,  # Reduce batch size to lower memory usage\n",
+    "    \"remove_unused_columns\": True,\n",
+    "    \"save_steps\": 100,\n",
+    "    \"save_total_limit\": 1,\n",
+    "    \"seed\": 0,\n",
+    "    \"gradient_checkpointing\": True,\n",
+    "    \"gradient_checkpointing_kwargs\":{\"use_reentrant\": False},\n",
+    "    \"gradient_accumulation_steps\": 1,\n",
+    "    \"warmup_ratio\": 0.2,\n",
+    "}\n",
+    "\n",
+    "peft_config = {\n",
+    "    \"r\": 16,\n",
+    "    \"lora_alpha\": 32,\n",
+    "    \"lora_dropout\": 0.05,\n",
+    "    \"bias\": \"none\",\n",
+    "    \"task_type\": \"CAUSAL_LM\",\n",
+    "    \"target_modules\": \"all-linear\",\n",
+    "    \"modules_to_save\": None,\n",
+    "}\n",
+    "train_conf = TrainingArguments(**training_config)\n",
+    "peft_conf = LoraConfig(**peft_config)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.\n",
+      "Loading checkpoint shards: 100%|██████████| 2/2 [01:34<00:00, 47.42s/it]\n",
+      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Phi3ForCausalLM(\n",
+       "  (model): Phi3Model(\n",
+       "    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)\n",
+       "    (embed_dropout): Dropout(p=0.0, inplace=False)\n",
+       "    (layers): ModuleList(\n",
+       "      (0-31): 32 x Phi3DecoderLayer(\n",
+       "        (self_attn): Phi3FlashAttention2(\n",
+       "          (o_proj): Linear(in_features=3072, out_features=3072, bias=False)\n",
+       "          (qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)\n",
+       "          (rotary_emb): Phi3RotaryEmbedding()\n",
+       "        )\n",
+       "        (mlp): Phi3MLP(\n",
+       "          (gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)\n",
+       "          (down_proj): Linear(in_features=8192, out_features=3072, bias=False)\n",
+       "          (activation_fn): SiLU()\n",
+       "        )\n",
+       "        (input_layernorm): Phi3RMSNorm()\n",
+       "        (resid_attn_dropout): Dropout(p=0.0, inplace=False)\n",
+       "        (resid_mlp_dropout): Dropout(p=0.0, inplace=False)\n",
+       "        (post_attention_layernorm): Phi3RMSNorm()\n",
+       "      )\n",
+       "    )\n",
+       "    (norm): Phi3RMSNorm()\n",
+       "  )\n",
+       "  (lm_head): Linear(in_features=3072, out_features=32064, bias=False)\n",
+       ")"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "################\n",
+    "# Model Loading\n",
+    "################\n",
+    "checkpoint_path = \"microsoft/Phi-3-mini-4k-instruct\"\n",
+    "# checkpoint_path = \"microsoft/Phi-3-mini-128k-instruct\"\n",
+    "model_kwargs = dict(\n",
+    "    use_cache=False,\n",
+    "    trust_remote_code=True,\n",
+    "    attn_implementation=\"flash_attention_2\",  # loading the model with flash-attention support\n",
+    "    torch_dtype=torch.float16,  # Changed to float16\n",
+    "    device_map=None\n",
+    ")\n",
+    "model = AutoModelForCausalLM.from_pretrained(checkpoint_path, **model_kwargs)\n",
+    "tokenizer = AutoTokenizer.from_pretrained(checkpoint_path)\n",
+    "tokenizer.model_max_length = 2048\n",
+    "tokenizer.pad_token = tokenizer.unk_token  # use unk rather than eos token to prevent endless generation\n",
+    "tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)\n",
+    "tokenizer.padding_side = 'right'\n",
+    "\n",
+    "# Move the model to GPU\n",
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "model.to(device)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Applying chat template to train_sft (num_proc=10): 100%|██████████| 207865/207865 [00:05<00:00, 37564.48 examples/s] \n",
+      "Applying chat template to test_sft (num_proc=10): 100%|██████████| 23110/23110 [00:03<00:00, 7597.23 examples/s] \n"
+     ]
+    }
+   ],
+   "source": [
+    "##################\n",
+    "# Data Processing\n",
+    "##################\n",
+    "def apply_chat_template(example, tokenizer):\n",
+    "    messages = example[\"messages\"]\n",
+    "    # Add an empty system message if there is none\n",
+    "    if messages[0][\"role\"] != \"system\":\n",
+    "        messages.insert(0, {\"role\": \"system\", \"content\": \"\"})\n",
+    "    example[\"text\"] = tokenizer.apply_chat_template(\n",
+    "        messages, tokenize=False, add_generation_prompt=False)\n",
+    "    return example\n",
+    "\n",
+    "raw_dataset = load_dataset(\"HuggingFaceH4/ultrachat_200k\")\n",
+    "train_dataset = raw_dataset[\"train_sft\"]\n",
+    "test_dataset = raw_dataset[\"test_sft\"]\n",
+    "column_names = list(train_dataset.features)\n",
+    "\n",
+    "processed_train_dataset = train_dataset.map(\n",
+    "    apply_chat_template,\n",
+    "    fn_kwargs={\"tokenizer\": tokenizer},\n",
+    "    num_proc=10,\n",
+    "    remove_columns=column_names,\n",
+    "    desc=\"Applying chat template to train_sft\",\n",
+    ")\n",
+    "\n",
+    "processed_test_dataset = test_dataset.map(\n",
+    "    apply_chat_template,\n",
+    "    fn_kwargs={\"tokenizer\": tokenizer},\n",
+    "    num_proc=10,\n",
+    "    remove_columns=column_names,\n",
+    "    desc=\"Applying chat template to test_sft\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Generating train split: 875 examples [00:02, 313.15 examples/s]\n"
+     ]
+    },
+    {
+     "ename": "KeyboardInterrupt",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[1;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
+      "Cell \u001b[1;32mIn[7], line 4\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[38;5;66;03m###########\u001b[39;00m\n\u001b[0;32m      2\u001b[0m \u001b[38;5;66;03m# Training\u001b[39;00m\n\u001b[0;32m      3\u001b[0m \u001b[38;5;66;03m###########\u001b[39;00m\n\u001b[1;32m----> 4\u001b[0m trainer \u001b[38;5;241m=\u001b[39m \u001b[43mSFTTrainer\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m      5\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmodel\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmodel\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m      6\u001b[0m \u001b[43m    \u001b[49m\u001b[43margs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtrain_conf\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m      7\u001b[0m \u001b[43m    \u001b[49m\u001b[43mpeft_config\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpeft_conf\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m      8\u001b[0m \u001b[43m    \u001b[49m\u001b[43mtrain_dataset\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mprocessed_train_dataset\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m      9\u001b[0m \u001b[43m    \u001b[49m\u001b[43meval_dataset\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mprocessed_test_dataset\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     10\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmax_seq_length\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;241;43m2048\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[0;32m     11\u001b[0m \u001b[43m    \u001b[49m\u001b[43mdataset_text_field\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mtext\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[0;32m     12\u001b[0m \u001b[43m    \u001b[49m\u001b[43mtokenizer\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtokenizer\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     13\u001b[0m \u001b[43m    \u001b[49m\u001b[43mpacking\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\n\u001b[0;32m     14\u001b[0m \u001b[43m)\u001b[49m\n\u001b[0;32m     15\u001b[0m train_result \u001b[38;5;241m=\u001b[39m trainer\u001b[38;5;241m.\u001b[39mtrain()\n\u001b[0;32m     16\u001b[0m metrics \u001b[38;5;241m=\u001b[39m train_result\u001b[38;5;241m.\u001b[39mmetrics\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\trl\\trainer\\sft_trainer.py:283\u001b[0m, in \u001b[0;36mSFTTrainer.__init__\u001b[1;34m(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers, preprocess_logits_for_metrics, peft_config, dataset_text_field, packing, formatting_func, max_seq_length, infinite, num_of_sequences, chars_per_token, dataset_num_proc, dataset_batch_size, neftune_noise_alpha, model_init_kwargs, dataset_kwargs, eval_packing)\u001b[0m\n\u001b[0;32m    281\u001b[0m     dataset_kwargs \u001b[38;5;241m=\u001b[39m {}\n\u001b[0;32m    282\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m train_dataset \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m--> 283\u001b[0m     train_dataset \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_prepare_dataset(\n\u001b[0;32m    284\u001b[0m         train_dataset,\n\u001b[0;32m    285\u001b[0m         tokenizer,\n\u001b[0;32m    286\u001b[0m         packing,\n\u001b[0;32m    287\u001b[0m         dataset_text_field,\n\u001b[0;32m    288\u001b[0m         max_seq_length,\n\u001b[0;32m    289\u001b[0m         formatting_func,\n\u001b[0;32m    290\u001b[0m         num_of_sequences,\n\u001b[0;32m    291\u001b[0m         chars_per_token,\n\u001b[0;32m    292\u001b[0m         remove_unused_columns\u001b[38;5;241m=\u001b[39margs\u001b[38;5;241m.\u001b[39mremove_unused_columns \u001b[38;5;28;01mif\u001b[39;00m args \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m,\n\u001b[0;32m    293\u001b[0m         \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mdataset_kwargs,\n\u001b[0;32m    294\u001b[0m     )\n\u001b[0;32m    295\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m eval_dataset \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m    296\u001b[0m     _multiple \u001b[38;5;241m=\u001b[39m \u001b[38;5;28misinstance\u001b[39m(eval_dataset, \u001b[38;5;28mdict\u001b[39m)\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\trl\\trainer\\sft_trainer.py:435\u001b[0m, in \u001b[0;36mSFTTrainer._prepare_dataset\u001b[1;34m(self, dataset, tokenizer, packing, dataset_text_field, max_seq_length, formatting_func, num_of_sequences, chars_per_token, remove_unused_columns, append_concat_token, add_special_tokens, skip_prepare_dataset)\u001b[0m\n\u001b[0;32m    424\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_prepare_non_packed_dataloader(\n\u001b[0;32m    425\u001b[0m         tokenizer,\n\u001b[0;32m    426\u001b[0m         dataset,\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m    431\u001b[0m         remove_unused_columns,\n\u001b[0;32m    432\u001b[0m     )\n\u001b[0;32m    434\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m--> 435\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_prepare_packed_dataloader\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m    436\u001b[0m \u001b[43m        \u001b[49m\u001b[43mtokenizer\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    437\u001b[0m \u001b[43m        \u001b[49m\u001b[43mdataset\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    438\u001b[0m \u001b[43m        \u001b[49m\u001b[43mdataset_text_field\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    439\u001b[0m \u001b[43m        \u001b[49m\u001b[43mmax_seq_length\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    440\u001b[0m \u001b[43m        \u001b[49m\u001b[43mnum_of_sequences\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    441\u001b[0m \u001b[43m        \u001b[49m\u001b[43mchars_per_token\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    442\u001b[0m \u001b[43m        \u001b[49m\u001b[43mformatting_func\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    443\u001b[0m \u001b[43m        \u001b[49m\u001b[43mappend_concat_token\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    444\u001b[0m \u001b[43m        \u001b[49m\u001b[43madd_special_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    445\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\trl\\trainer\\sft_trainer.py:539\u001b[0m, in \u001b[0;36mSFTTrainer._prepare_packed_dataloader\u001b[1;34m(self, tokenizer, dataset, dataset_text_field, max_seq_length, num_of_sequences, chars_per_token, formatting_func, append_concat_token, add_special_tokens)\u001b[0m\n\u001b[0;32m    536\u001b[0m     \u001b[38;5;28;01myield from\u001b[39;00m constant_length_iterator\n\u001b[0;32m    538\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m--> 539\u001b[0m     packed_dataset \u001b[38;5;241m=\u001b[39m \u001b[43mDataset\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfrom_generator\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m    540\u001b[0m \u001b[43m        \u001b[49m\u001b[43mdata_generator\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mgen_kwargs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m{\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mconstant_length_iterator\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[43mconstant_length_iterator\u001b[49m\u001b[43m}\u001b[49m\n\u001b[0;32m    541\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    542\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m (DatasetGenerationError, SchemaInferenceError) \u001b[38;5;28;01mas\u001b[39;00m exc:\n\u001b[0;32m    543\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[0;32m    544\u001b[0m         \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mError occurred while packing the dataset. \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m    545\u001b[0m         \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mMake sure that your dataset has enough samples to at least yield one packed sequence.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m    546\u001b[0m     ) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mexc\u001b[39;00m\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\datasets\\arrow_dataset.py:1125\u001b[0m, in \u001b[0;36mDataset.from_generator\u001b[1;34m(generator, features, cache_dir, keep_in_memory, gen_kwargs, num_proc, **kwargs)\u001b[0m\n\u001b[0;32m   1068\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Create a Dataset from a generator.\u001b[39;00m\n\u001b[0;32m   1069\u001b[0m \n\u001b[0;32m   1070\u001b[0m \u001b[38;5;124;03mArgs:\u001b[39;00m\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m   1113\u001b[0m \u001b[38;5;124;03m```\u001b[39;00m\n\u001b[0;32m   1114\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m   1115\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mio\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mgenerator\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m GeneratorDatasetInputStream\n\u001b[0;32m   1117\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mGeneratorDatasetInputStream\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m   1118\u001b[0m \u001b[43m    \u001b[49m\u001b[43mgenerator\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mgenerator\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m   1119\u001b[0m \u001b[43m    \u001b[49m\u001b[43mfeatures\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mfeatures\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m   1120\u001b[0m \u001b[43m    \u001b[49m\u001b[43mcache_dir\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcache_dir\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m   1121\u001b[0m \u001b[43m    \u001b[49m\u001b[43mkeep_in_memory\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mkeep_in_memory\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m   1122\u001b[0m \u001b[43m    \u001b[49m\u001b[43mgen_kwargs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mgen_kwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m   1123\u001b[0m \u001b[43m    \u001b[49m\u001b[43mnum_proc\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mnum_proc\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m   1124\u001b[0m \u001b[43m    \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m-> 1125\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\datasets\\io\\generator.py:47\u001b[0m, in \u001b[0;36mGeneratorDatasetInputStream.read\u001b[1;34m(self)\u001b[0m\n\u001b[0;32m     44\u001b[0m     verification_mode \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m     45\u001b[0m     base_path \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m---> 47\u001b[0m     \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mbuilder\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdownload_and_prepare\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m     48\u001b[0m \u001b[43m        \u001b[49m\u001b[43mdownload_config\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdownload_config\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     49\u001b[0m \u001b[43m        \u001b[49m\u001b[43mdownload_mode\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdownload_mode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     50\u001b[0m \u001b[43m        \u001b[49m\u001b[43mverification_mode\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mverification_mode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     51\u001b[0m \u001b[43m        \u001b[49m\u001b[43mbase_path\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbase_path\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     52\u001b[0m \u001b[43m        \u001b[49m\u001b[43mnum_proc\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mnum_proc\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     53\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m     54\u001b[0m     dataset \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbuilder\u001b[38;5;241m.\u001b[39mas_dataset(\n\u001b[0;32m     55\u001b[0m         split\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtrain\u001b[39m\u001b[38;5;124m\"\u001b[39m, verification_mode\u001b[38;5;241m=\u001b[39mverification_mode, in_memory\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mkeep_in_memory\n\u001b[0;32m     56\u001b[0m     )\n\u001b[0;32m     57\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m dataset\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\datasets\\builder.py:1027\u001b[0m, in \u001b[0;36mDatasetBuilder.download_and_prepare\u001b[1;34m(self, output_dir, download_config, download_mode, verification_mode, ignore_verifications, try_from_hf_gcs, dl_manager, base_path, use_auth_token, file_format, max_shard_size, num_proc, storage_options, **download_and_prepare_kwargs)\u001b[0m\n\u001b[0;32m   1025\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m num_proc \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m   1026\u001b[0m         prepare_split_kwargs[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnum_proc\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m num_proc\n\u001b[1;32m-> 1027\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_download_and_prepare(\n\u001b[0;32m   1028\u001b[0m         dl_manager\u001b[38;5;241m=\u001b[39mdl_manager,\n\u001b[0;32m   1029\u001b[0m         verification_mode\u001b[38;5;241m=\u001b[39mverification_mode,\n\u001b[0;32m   1030\u001b[0m         \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mprepare_split_kwargs,\n\u001b[0;32m   1031\u001b[0m         \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mdownload_and_prepare_kwargs,\n\u001b[0;32m   1032\u001b[0m     )\n\u001b[0;32m   1033\u001b[0m \u001b[38;5;66;03m# Sync info\u001b[39;00m\n\u001b[0;32m   1034\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39minfo\u001b[38;5;241m.\u001b[39mdataset_size \u001b[38;5;241m=\u001b[39m \u001b[38;5;28msum\u001b[39m(split\u001b[38;5;241m.\u001b[39mnum_bytes \u001b[38;5;28;01mfor\u001b[39;00m split \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39minfo\u001b[38;5;241m.\u001b[39msplits\u001b[38;5;241m.\u001b[39mvalues())\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\datasets\\builder.py:1789\u001b[0m, in \u001b[0;36mGeneratorBasedBuilder._download_and_prepare\u001b[1;34m(self, dl_manager, verification_mode, **prepare_splits_kwargs)\u001b[0m\n\u001b[0;32m   1788\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_download_and_prepare\u001b[39m(\u001b[38;5;28mself\u001b[39m, dl_manager, verification_mode, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mprepare_splits_kwargs):\n\u001b[1;32m-> 1789\u001b[0m     \u001b[38;5;28msuper\u001b[39m()\u001b[38;5;241m.\u001b[39m_download_and_prepare(\n\u001b[0;32m   1790\u001b[0m         dl_manager,\n\u001b[0;32m   1791\u001b[0m         verification_mode,\n\u001b[0;32m   1792\u001b[0m         check_duplicate_keys\u001b[38;5;241m=\u001b[39mverification_mode \u001b[38;5;241m==\u001b[39m VerificationMode\u001b[38;5;241m.\u001b[39mBASIC_CHECKS\n\u001b[0;32m   1793\u001b[0m         \u001b[38;5;129;01mor\u001b[39;00m verification_mode \u001b[38;5;241m==\u001b[39m VerificationMode\u001b[38;5;241m.\u001b[39mALL_CHECKS,\n\u001b[0;32m   1794\u001b[0m         \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mprepare_splits_kwargs,\n\u001b[0;32m   1795\u001b[0m     )\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\datasets\\builder.py:1122\u001b[0m, in \u001b[0;36mDatasetBuilder._download_and_prepare\u001b[1;34m(self, dl_manager, verification_mode, **prepare_split_kwargs)\u001b[0m\n\u001b[0;32m   1118\u001b[0m split_dict\u001b[38;5;241m.\u001b[39madd(split_generator\u001b[38;5;241m.\u001b[39msplit_info)\n\u001b[0;32m   1120\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m   1121\u001b[0m     \u001b[38;5;66;03m# Prepare split will record examples associated to the split\u001b[39;00m\n\u001b[1;32m-> 1122\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_prepare_split(split_generator, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mprepare_split_kwargs)\n\u001b[0;32m   1123\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mOSError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[0;32m   1124\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mOSError\u001b[39;00m(\n\u001b[0;32m   1125\u001b[0m         \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCannot find data file. \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m   1126\u001b[0m         \u001b[38;5;241m+\u001b[39m (\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmanual_download_instructions \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m   1127\u001b[0m         \u001b[38;5;241m+\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124mOriginal error:\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m   1128\u001b[0m         \u001b[38;5;241m+\u001b[39m \u001b[38;5;28mstr\u001b[39m(e)\n\u001b[0;32m   1129\u001b[0m     ) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\datasets\\builder.py:1627\u001b[0m, in \u001b[0;36mGeneratorBasedBuilder._prepare_split\u001b[1;34m(self, split_generator, check_duplicate_keys, file_format, num_proc, max_shard_size)\u001b[0m\n\u001b[0;32m   1625\u001b[0m job_id \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m\n\u001b[0;32m   1626\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m pbar:\n\u001b[1;32m-> 1627\u001b[0m     \u001b[38;5;28;01mfor\u001b[39;00m job_id, done, content \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_prepare_split_single(\n\u001b[0;32m   1628\u001b[0m         gen_kwargs\u001b[38;5;241m=\u001b[39mgen_kwargs, job_id\u001b[38;5;241m=\u001b[39mjob_id, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39m_prepare_split_args\n\u001b[0;32m   1629\u001b[0m     ):\n\u001b[0;32m   1630\u001b[0m         \u001b[38;5;28;01mif\u001b[39;00m done:\n\u001b[0;32m   1631\u001b[0m             result \u001b[38;5;241m=\u001b[39m content\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\datasets\\builder.py:1748\u001b[0m, in \u001b[0;36mGeneratorBasedBuilder._prepare_split_single\u001b[1;34m(self, gen_kwargs, fpath, file_format, max_shard_size, split_info, check_duplicate_keys, job_id)\u001b[0m\n\u001b[0;32m   1746\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m   1747\u001b[0m     _time \u001b[38;5;241m=\u001b[39m time\u001b[38;5;241m.\u001b[39mtime()\n\u001b[1;32m-> 1748\u001b[0m     \u001b[38;5;28;01mfor\u001b[39;00m key, record \u001b[38;5;129;01min\u001b[39;00m generator:\n\u001b[0;32m   1749\u001b[0m         \u001b[38;5;28;01mif\u001b[39;00m max_shard_size \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m writer\u001b[38;5;241m.\u001b[39m_num_bytes \u001b[38;5;241m>\u001b[39m max_shard_size:\n\u001b[0;32m   1750\u001b[0m             num_examples, num_bytes \u001b[38;5;241m=\u001b[39m writer\u001b[38;5;241m.\u001b[39mfinalize()\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\datasets\\packaged_modules\\generator\\generator.py:30\u001b[0m, in \u001b[0;36mGenerator._generate_examples\u001b[1;34m(self, **gen_kwargs)\u001b[0m\n\u001b[0;32m     29\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_generate_examples\u001b[39m(\u001b[38;5;28mself\u001b[39m, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mgen_kwargs):\n\u001b[1;32m---> 30\u001b[0m     \u001b[38;5;28;01mfor\u001b[39;00m idx, ex \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28menumerate\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mconfig\u001b[38;5;241m.\u001b[39mgenerator(\u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mgen_kwargs)):\n\u001b[0;32m     31\u001b[0m         \u001b[38;5;28;01myield\u001b[39;00m idx, ex\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\trl\\trainer\\sft_trainer.py:536\u001b[0m, in \u001b[0;36mSFTTrainer._prepare_packed_dataloader.<locals>.data_generator\u001b[1;34m(constant_length_iterator)\u001b[0m\n\u001b[0;32m    535\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mdata_generator\u001b[39m(constant_length_iterator):\n\u001b[1;32m--> 536\u001b[0m     \u001b[38;5;28;01myield from\u001b[39;00m constant_length_iterator\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\trl\\trainer\\utils.py:466\u001b[0m, in \u001b[0;36mConstantLengthDataset.__iter__\u001b[1;34m(self)\u001b[0m\n\u001b[0;32m    464\u001b[0m             more_examples \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mFalse\u001b[39;00m\n\u001b[0;32m    465\u001b[0m             \u001b[38;5;28;01mbreak\u001b[39;00m\n\u001b[1;32m--> 466\u001b[0m tokenized_inputs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtokenizer\u001b[49m\u001b[43m(\u001b[49m\u001b[43mbuffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43madd_special_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43madd_special_tokens\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtruncation\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m[\n\u001b[0;32m    467\u001b[0m     \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minput_ids\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m    468\u001b[0m ]\n\u001b[0;32m    469\u001b[0m all_token_ids \u001b[38;5;241m=\u001b[39m []\n\u001b[0;32m    470\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m tokenized_input \u001b[38;5;129;01min\u001b[39;00m tokenized_inputs:\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\transformers\\tokenization_utils_base.py:2883\u001b[0m, in \u001b[0;36mPreTrainedTokenizerBase.__call__\u001b[1;34m(self, text, text_pair, text_target, text_pair_target, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)\u001b[0m\n\u001b[0;32m   2881\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_in_target_context_manager:\n\u001b[0;32m   2882\u001b[0m         \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_switch_to_input_mode()\n\u001b[1;32m-> 2883\u001b[0m     encodings \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_call_one(text\u001b[38;5;241m=\u001b[39mtext, text_pair\u001b[38;5;241m=\u001b[39mtext_pair, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mall_kwargs)\n\u001b[0;32m   2884\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m text_target \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m   2885\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_switch_to_target_mode()\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\transformers\\tokenization_utils_base.py:2969\u001b[0m, in \u001b[0;36mPreTrainedTokenizerBase._call_one\u001b[1;34m(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)\u001b[0m\n\u001b[0;32m   2964\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[0;32m   2965\u001b[0m             \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mbatch length of `text`: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mlen\u001b[39m(text)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m does not match batch length of `text_pair`:\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m   2966\u001b[0m             \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mlen\u001b[39m(text_pair)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m   2967\u001b[0m         )\n\u001b[0;32m   2968\u001b[0m     batch_text_or_text_pairs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mlist\u001b[39m(\u001b[38;5;28mzip\u001b[39m(text, text_pair)) \u001b[38;5;28;01mif\u001b[39;00m text_pair \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;28;01melse\u001b[39;00m text\n\u001b[1;32m-> 2969\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbatch_encode_plus(\n\u001b[0;32m   2970\u001b[0m         batch_text_or_text_pairs\u001b[38;5;241m=\u001b[39mbatch_text_or_text_pairs,\n\u001b[0;32m   2971\u001b[0m         add_special_tokens\u001b[38;5;241m=\u001b[39madd_special_tokens,\n\u001b[0;32m   2972\u001b[0m         padding\u001b[38;5;241m=\u001b[39mpadding,\n\u001b[0;32m   2973\u001b[0m         truncation\u001b[38;5;241m=\u001b[39mtruncation,\n\u001b[0;32m   2974\u001b[0m         max_length\u001b[38;5;241m=\u001b[39mmax_length,\n\u001b[0;32m   2975\u001b[0m         stride\u001b[38;5;241m=\u001b[39mstride,\n\u001b[0;32m   2976\u001b[0m         is_split_into_words\u001b[38;5;241m=\u001b[39mis_split_into_words,\n\u001b[0;32m   2977\u001b[0m         pad_to_multiple_of\u001b[38;5;241m=\u001b[39mpad_to_multiple_of,\n\u001b[0;32m   2978\u001b[0m         return_tensors\u001b[38;5;241m=\u001b[39mreturn_tensors,\n\u001b[0;32m   2979\u001b[0m         return_token_type_ids\u001b[38;5;241m=\u001b[39mreturn_token_type_ids,\n\u001b[0;32m   2980\u001b[0m         return_attention_mask\u001b[38;5;241m=\u001b[39mreturn_attention_mask,\n\u001b[0;32m   2981\u001b[0m         return_overflowing_tokens\u001b[38;5;241m=\u001b[39mreturn_overflowing_tokens,\n\u001b[0;32m   2982\u001b[0m         return_special_tokens_mask\u001b[38;5;241m=\u001b[39mreturn_special_tokens_mask,\n\u001b[0;32m   2983\u001b[0m         return_offsets_mapping\u001b[38;5;241m=\u001b[39mreturn_offsets_mapping,\n\u001b[0;32m   2984\u001b[0m         return_length\u001b[38;5;241m=\u001b[39mreturn_length,\n\u001b[0;32m   2985\u001b[0m         verbose\u001b[38;5;241m=\u001b[39mverbose,\n\u001b[0;32m   2986\u001b[0m         \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[0;32m   2987\u001b[0m     )\n\u001b[0;32m   2988\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m   2989\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mencode_plus(\n\u001b[0;32m   2990\u001b[0m         text\u001b[38;5;241m=\u001b[39mtext,\n\u001b[0;32m   2991\u001b[0m         text_pair\u001b[38;5;241m=\u001b[39mtext_pair,\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m   3007\u001b[0m         \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[0;32m   3008\u001b[0m     )\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\transformers\\tokenization_utils_base.py:3160\u001b[0m, in \u001b[0;36mPreTrainedTokenizerBase.batch_encode_plus\u001b[1;34m(self, batch_text_or_text_pairs, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)\u001b[0m\n\u001b[0;32m   3150\u001b[0m \u001b[38;5;66;03m# Backward compatibility for 'truncation_strategy', 'pad_to_max_length'\u001b[39;00m\n\u001b[0;32m   3151\u001b[0m padding_strategy, truncation_strategy, max_length, kwargs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_get_padding_truncation_strategies(\n\u001b[0;32m   3152\u001b[0m     padding\u001b[38;5;241m=\u001b[39mpadding,\n\u001b[0;32m   3153\u001b[0m     truncation\u001b[38;5;241m=\u001b[39mtruncation,\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m   3157\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[0;32m   3158\u001b[0m )\n\u001b[1;32m-> 3160\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_batch_encode_plus(\n\u001b[0;32m   3161\u001b[0m     batch_text_or_text_pairs\u001b[38;5;241m=\u001b[39mbatch_text_or_text_pairs,\n\u001b[0;32m   3162\u001b[0m     add_special_tokens\u001b[38;5;241m=\u001b[39madd_special_tokens,\n\u001b[0;32m   3163\u001b[0m     padding_strategy\u001b[38;5;241m=\u001b[39mpadding_strategy,\n\u001b[0;32m   3164\u001b[0m     truncation_strategy\u001b[38;5;241m=\u001b[39mtruncation_strategy,\n\u001b[0;32m   3165\u001b[0m     max_length\u001b[38;5;241m=\u001b[39mmax_length,\n\u001b[0;32m   3166\u001b[0m     stride\u001b[38;5;241m=\u001b[39mstride,\n\u001b[0;32m   3167\u001b[0m     is_split_into_words\u001b[38;5;241m=\u001b[39mis_split_into_words,\n\u001b[0;32m   3168\u001b[0m     pad_to_multiple_of\u001b[38;5;241m=\u001b[39mpad_to_multiple_of,\n\u001b[0;32m   3169\u001b[0m     return_tensors\u001b[38;5;241m=\u001b[39mreturn_tensors,\n\u001b[0;32m   3170\u001b[0m     return_token_type_ids\u001b[38;5;241m=\u001b[39mreturn_token_type_ids,\n\u001b[0;32m   3171\u001b[0m     return_attention_mask\u001b[38;5;241m=\u001b[39mreturn_attention_mask,\n\u001b[0;32m   3172\u001b[0m     return_overflowing_tokens\u001b[38;5;241m=\u001b[39mreturn_overflowing_tokens,\n\u001b[0;32m   3173\u001b[0m     return_special_tokens_mask\u001b[38;5;241m=\u001b[39mreturn_special_tokens_mask,\n\u001b[0;32m   3174\u001b[0m     return_offsets_mapping\u001b[38;5;241m=\u001b[39mreturn_offsets_mapping,\n\u001b[0;32m   3175\u001b[0m     return_length\u001b[38;5;241m=\u001b[39mreturn_length,\n\u001b[0;32m   3176\u001b[0m     verbose\u001b[38;5;241m=\u001b[39mverbose,\n\u001b[0;32m   3177\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[0;32m   3178\u001b[0m )\n",
+      "File \u001b[1;32me:\\Users\\frink\\Documents\\GitHub\\LLM Things\\Phi-3-training-Low-Ram\\venv\\lib\\site-packages\\transformers\\tokenization_utils_fast.py:511\u001b[0m, in \u001b[0;36mPreTrainedTokenizerFast._batch_encode_plus\u001b[1;34m(self, batch_text_or_text_pairs, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose)\u001b[0m\n\u001b[0;32m    502\u001b[0m \u001b[38;5;66;03m# Set the truncation and padding strategy and restore the initial configuration\u001b[39;00m\n\u001b[0;32m    503\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mset_truncation_and_padding(\n\u001b[0;32m    504\u001b[0m     padding_strategy\u001b[38;5;241m=\u001b[39mpadding_strategy,\n\u001b[0;32m    505\u001b[0m     truncation_strategy\u001b[38;5;241m=\u001b[39mtruncation_strategy,\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m    508\u001b[0m     pad_to_multiple_of\u001b[38;5;241m=\u001b[39mpad_to_multiple_of,\n\u001b[0;32m    509\u001b[0m )\n\u001b[1;32m--> 511\u001b[0m encodings \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_tokenizer\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mencode_batch\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m    512\u001b[0m \u001b[43m    \u001b[49m\u001b[43mbatch_text_or_text_pairs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    513\u001b[0m \u001b[43m    \u001b[49m\u001b[43madd_special_tokens\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43madd_special_tokens\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    514\u001b[0m \u001b[43m    \u001b[49m\u001b[43mis_pretokenized\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mis_split_into_words\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m    515\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    517\u001b[0m \u001b[38;5;66;03m# Convert encoding to dict\u001b[39;00m\n\u001b[0;32m    518\u001b[0m \u001b[38;5;66;03m# `Tokens` has type: Tuple[\u001b[39;00m\n\u001b[0;32m    519\u001b[0m \u001b[38;5;66;03m#                       List[Dict[str, List[List[int]]]] or List[Dict[str, 2D-Tensor]],\u001b[39;00m\n\u001b[0;32m    520\u001b[0m \u001b[38;5;66;03m#                       List[EncodingFast]\u001b[39;00m\n\u001b[0;32m    521\u001b[0m \u001b[38;5;66;03m#                    ]\u001b[39;00m\n\u001b[0;32m    522\u001b[0m \u001b[38;5;66;03m# with nested dimensions corresponding to batch, overflows, sequence length\u001b[39;00m\n\u001b[0;32m    523\u001b[0m tokens_and_encodings \u001b[38;5;241m=\u001b[39m [\n\u001b[0;32m    524\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_convert_encoding(\n\u001b[0;32m    525\u001b[0m         encoding\u001b[38;5;241m=\u001b[39mencoding,\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m    534\u001b[0m     \u001b[38;5;28;01mfor\u001b[39;00m encoding \u001b[38;5;129;01min\u001b[39;00m encodings\n\u001b[0;32m    535\u001b[0m ]\n",
+      "\u001b[1;31mKeyboardInterrupt\u001b[0m: "
+     ]
+    }
+   ],
+   "source": [
+    "###########\n",
+    "# Training\n",
+    "###########\n",
+    "trainer = SFTTrainer(\n",
+    "    model=model,\n",
+    "    args=train_conf,\n",
+    "    peft_config=peft_conf,\n",
+    "    train_dataset=processed_train_dataset,\n",
+    "    eval_dataset=processed_test_dataset,\n",
+    "    max_seq_length=2048,\n",
+    "    dataset_text_field=\"text\",\n",
+    "    tokenizer=tokenizer,\n",
+    "    packing=True\n",
+    ")\n",
+    "train_result = trainer.train()\n",
+    "metrics = train_result.metrics\n",
+    "trainer.log_metrics(\"train\", metrics)\n",
+    "trainer.save_metrics(\"train\", metrics)\n",
+    "trainer.save_state()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#############\n",
+    "# Evaluation\n",
+    "#############\n",
+    "tokenizer.padding_side = 'left'\n",
+    "metrics = trainer.evaluate()\n",
+    "metrics[\"eval_samples\"] = len(processed_test_dataset)\n",
+    "trainer.log_metrics(\"eval\", metrics)\n",
+    "trainer.save_metrics(\"eval\", metrics)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "############\n",
+    "# Save model\n",
+    "############\n",
+    "trainer.save_model(train_conf.output_dir)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}