openpql commited on Dec 30, 2025

Commit

4057342

verified ·

1 Parent(s): bfec813

Upload nishka-base-model-v1 - Stage 2: PQL code generation (10,038 examples, 99.3% loss reduction)

Browse files

Files changed (35) hide show

README.md +207 -0
adapter_config.json +46 -0
adapter_model.safetensors +3 -0
chat_template.jinja +8 -0
checkpoint-1000/README.md +207 -0
checkpoint-1000/adapter_config.json +46 -0
checkpoint-1000/adapter_model.safetensors +3 -0
checkpoint-1000/chat_template.jinja +8 -0
checkpoint-1000/optimizer.pt +3 -0
checkpoint-1000/rng_state.pth +3 -0
checkpoint-1000/scheduler.pt +3 -0
checkpoint-1000/special_tokens_map.json +24 -0
checkpoint-1000/tokenizer.json +0 -0
checkpoint-1000/tokenizer_config.json +135 -0
checkpoint-1000/trainer_state.json +734 -0
checkpoint-1000/training_args.bin +3 -0
checkpoint-1256/README.md +207 -0
checkpoint-1256/adapter_config.json +46 -0
checkpoint-1256/adapter_model.safetensors +3 -0
checkpoint-1256/chat_template.jinja +8 -0
checkpoint-1256/optimizer.pt +3 -0
checkpoint-1256/rng_state.pth +3 -0
checkpoint-1256/scheduler.pt +3 -0
checkpoint-1256/special_tokens_map.json +24 -0
checkpoint-1256/tokenizer.json +0 -0
checkpoint-1256/tokenizer_config.json +135 -0
checkpoint-1256/trainer_state.json +909 -0
checkpoint-1256/training_args.bin +3 -0
logs/events.out.tfevents.1766981779.3043de093a88.2371.0 +3 -0
merged/generation_config.json +11 -0
special_tokens_map.json +24 -0
tokenizer.json +0 -0
tokenizer_config.json +135 -0
training_args.bin +3 -0
training_metadata.json +30 -0

README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: microsoft/Phi-3-mini-4k-instruct
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:microsoft/Phi-3-mini-4k-instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "microsoft/Phi-3-mini-4k-instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "k_proj",
+    "o_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7841eb0730a6387ec528d5537ed2175514da9fc2dc2ea6c6fc166ab9c2ea72e
+size 71320216

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,8 @@

+{% for message in messages %}{% if message['role'] == 'system' %}{{'<|system|>
+' + message['content'] + '<|end|>
+'}}{% elif message['role'] == 'user' %}{{'<|user|>
+' + message['content'] + '<|end|>
+'}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>
+' + message['content'] + '<|end|>
+'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
+' }}{% else %}{{ eos_token }}{% endif %}

checkpoint-1000/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: microsoft/Phi-3-mini-4k-instruct
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:microsoft/Phi-3-mini-4k-instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

checkpoint-1000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "microsoft/Phi-3-mini-4k-instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "k_proj",
+    "o_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-1000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e8064320b428eca165019a0bfeef0a40d8f9525834d4dbc21195bdc540c9ab9
+size 71320216

checkpoint-1000/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,8 @@

+{% for message in messages %}{% if message['role'] == 'system' %}{{'<|system|>
+' + message['content'] + '<|end|>
+'}}{% elif message['role'] == 'user' %}{{'<|user|>
+' + message['content'] + '<|end|>
+'}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>
+' + message['content'] + '<|end|>
+'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
+' }}{% else %}{{ eos_token }}{% endif %}

checkpoint-1000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9361823b38b6bac577a234b53bce408a5f5336110168dc0186d2981b38646326
+size 36361850

checkpoint-1000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:44ad2edfc022232d7559482a4f9f63f72d6f3103b63747b20b16cf10331846f3
+size 14244

checkpoint-1000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6601cd067fdcef15ed2bcfc38e28e0f41a0eb7ffd9977b1eec7b9ed404e56ff7
+size 1064

checkpoint-1000/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1000/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1000/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,135 @@

+{
+  "add_bos_token": false,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": false
+    },
+    "32000": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32001": {
+      "content": "<|assistant|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32002": {
+      "content": "<|placeholder1|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32003": {
+      "content": "<|placeholder2|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32004": {
+      "content": "<|placeholder3|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32005": {
+      "content": "<|placeholder4|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32006": {
+      "content": "<|system|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32007": {
+      "content": "<|end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32008": {
+      "content": "<|placeholder5|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32009": {
+      "content": "<|placeholder6|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32010": {
+      "content": "<|user|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "extra_special_tokens": {},
+  "legacy": false,
+  "max_length": 2048,
+  "model_max_length": 4096,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "stride": 0,
+  "tokenizer_class": "LlamaTokenizerFast",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

checkpoint-1000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,734 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.5929468021518232,
+  "eval_steps": 500,
+  "global_step": 1000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.01593943016537159,
+      "grad_norm": 0.12403535097837448,
+      "learning_rate": 1.8e-05,
+      "loss": 1.1133,
+      "step": 10
+    },
+    {
+      "epoch": 0.03187886033074318,
+      "grad_norm": 0.21290278434753418,
+      "learning_rate": 3.8e-05,
+      "loss": 1.0413,
+      "step": 20
+    },
+    {
+      "epoch": 0.04781829049611477,
+      "grad_norm": 0.1645481288433075,
+      "learning_rate": 5.8e-05,
+      "loss": 0.9616,
+      "step": 30
+    },
+    {
+      "epoch": 0.06375772066148636,
+      "grad_norm": 0.2903112471103668,
+      "learning_rate": 7.800000000000001e-05,
+      "loss": 0.874,
+      "step": 40
+    },
+    {
+      "epoch": 0.07969715082685794,
+      "grad_norm": 0.2749190628528595,
+      "learning_rate": 9.8e-05,
+      "loss": 0.7337,
+      "step": 50
+    },
+    {
+      "epoch": 0.09563658099222953,
+      "grad_norm": 0.4625573754310608,
+      "learning_rate": 9.99862592554908e-05,
+      "loss": 0.5768,
+      "step": 60
+    },
+    {
+      "epoch": 0.11157601115760112,
+      "grad_norm": 0.31431907415390015,
+      "learning_rate": 9.993877008154289e-05,
+      "loss": 0.4941,
+      "step": 70
+    },
+    {
+      "epoch": 0.1275154413229727,
+      "grad_norm": 0.43303918838500977,
+      "learning_rate": 9.985739505534436e-05,
+      "loss": 0.3854,
+      "step": 80
+    },
+    {
+      "epoch": 0.1434548714883443,
+      "grad_norm": 0.5850837826728821,
+      "learning_rate": 9.974218939375599e-05,
+      "loss": 0.2906,
+      "step": 90
+    },
+    {
+      "epoch": 0.15939430165371588,
+      "grad_norm": 0.4940532147884369,
+      "learning_rate": 9.959323126934831e-05,
+      "loss": 0.2056,
+      "step": 100
+    },
+    {
+      "epoch": 0.17533373181908746,
+      "grad_norm": 0.3987452983856201,
+      "learning_rate": 9.94106217573578e-05,
+      "loss": 0.1315,
+      "step": 110
+    },
+    {
+      "epoch": 0.19127316198445907,
+      "grad_norm": 0.42741864919662476,
+      "learning_rate": 9.919448476710246e-05,
+      "loss": 0.0859,
+      "step": 120
+    },
+    {
+      "epoch": 0.20721259214983065,
+      "grad_norm": 0.24628710746765137,
+      "learning_rate": 9.894496695790344e-05,
+      "loss": 0.0663,
+      "step": 130
+    },
+    {
+      "epoch": 0.22315202231520223,
+      "grad_norm": 0.2623507082462311,
+      "learning_rate": 9.866223763956955e-05,
+      "loss": 0.0447,
+      "step": 140
+    },
+    {
+      "epoch": 0.2390914524805738,
+      "grad_norm": 0.14876964688301086,
+      "learning_rate": 9.834648865751254e-05,
+      "loss": 0.0372,
+      "step": 150
+    },
+    {
+      "epoch": 0.2550308826459454,
+      "grad_norm": 0.1500687152147293,
+      "learning_rate": 9.799793426257071e-05,
+      "loss": 0.0356,
+      "step": 160
+    },
+    {
+      "epoch": 0.270970312811317,
+      "grad_norm": 0.16156131029129028,
+      "learning_rate": 9.76168109656295e-05,
+      "loss": 0.0293,
+      "step": 170
+    },
+    {
+      "epoch": 0.2869097429766886,
+      "grad_norm": 0.20239904522895813,
+      "learning_rate": 9.720337737713739e-05,
+      "loss": 0.025,
+      "step": 180
+    },
+    {
+      "epoch": 0.30284917314206017,
+      "grad_norm": 0.10117224603891373,
+      "learning_rate": 9.675791403162645e-05,
+      "loss": 0.0209,
+      "step": 190
+    },
+    {
+      "epoch": 0.31878860330743175,
+      "grad_norm": 0.3235134482383728,
+      "learning_rate": 9.628072319735607e-05,
+      "loss": 0.0247,
+      "step": 200
+    },
+    {
+      "epoch": 0.33472803347280333,
+      "grad_norm": 0.10946714133024216,
+      "learning_rate": 9.577212867120947e-05,
+      "loss": 0.0213,
+      "step": 210
+    },
+    {
+      "epoch": 0.3506674636381749,
+      "grad_norm": 0.08509603142738342,
+      "learning_rate": 9.523247555898204e-05,
+      "loss": 0.0178,
+      "step": 220
+    },
+    {
+      "epoch": 0.3666068938035465,
+      "grad_norm": 0.07975362986326218,
+      "learning_rate": 9.466213004121041e-05,
+      "loss": 0.016,
+      "step": 230
+    },
+    {
+      "epoch": 0.38254632396891813,
+      "grad_norm": 0.061933811753988266,
+      "learning_rate": 9.406147912470143e-05,
+      "loss": 0.0143,
+      "step": 240
+    },
+    {
+      "epoch": 0.3984857541342897,
+      "grad_norm": 0.22777368128299713,
+      "learning_rate": 9.343093037992945e-05,
+      "loss": 0.0294,
+      "step": 250
+    },
+    {
+      "epoch": 0.4144251842996613,
+      "grad_norm": 0.11101654917001724,
+      "learning_rate": 9.277091166448022e-05,
+      "loss": 0.0159,
+      "step": 260
+    },
+    {
+      "epoch": 0.4303646144650329,
+      "grad_norm": 0.27989494800567627,
+      "learning_rate": 9.208187083272894e-05,
+      "loss": 0.0169,
+      "step": 270
+    },
+    {
+      "epoch": 0.44630404463040446,
+      "grad_norm": 0.07749247550964355,
+      "learning_rate": 9.136427543194967e-05,
+      "loss": 0.017,
+      "step": 280
+    },
+    {
+      "epoch": 0.46224347479577604,
+      "grad_norm": 0.14141874015331268,
+      "learning_rate": 9.061861238506194e-05,
+      "loss": 0.0152,
+      "step": 290
+    },
+    {
+      "epoch": 0.4781829049611476,
+      "grad_norm": 0.09233926236629486,
+      "learning_rate": 8.984538766023024e-05,
+      "loss": 0.0148,
+      "step": 300
+    },
+    {
+      "epoch": 0.4941223351265192,
+      "grad_norm": 0.08206266909837723,
+      "learning_rate": 8.904512592754034e-05,
+      "loss": 0.013,
+      "step": 310
+    },
+    {
+      "epoch": 0.5100617652918908,
+      "grad_norm": 0.09603980928659439,
+      "learning_rate": 8.821837020298547e-05,
+      "loss": 0.0122,
+      "step": 320
+    },
+    {
+      "epoch": 0.5260011954572624,
+      "grad_norm": 0.07990539819002151,
+      "learning_rate": 8.736568148000386e-05,
+      "loss": 0.0118,
+      "step": 330
+    },
+    {
+      "epoch": 0.541940625622634,
+      "grad_norm": 0.10424526035785675,
+      "learning_rate": 8.648763834881782e-05,
+      "loss": 0.0125,
+      "step": 340
+    },
+    {
+      "epoch": 0.5578800557880056,
+      "grad_norm": 0.13267765939235687,
+      "learning_rate": 8.558483660383245e-05,
+      "loss": 0.0134,
+      "step": 350
+    },
+    {
+      "epoch": 0.5738194859533772,
+      "grad_norm": 0.11512497812509537,
+      "learning_rate": 8.46578888393606e-05,
+      "loss": 0.0136,
+      "step": 360
+    },
+    {
+      "epoch": 0.5897589161187488,
+      "grad_norm": 0.09562960267066956,
+      "learning_rate": 8.37074240339482e-05,
+      "loss": 0.0115,
+      "step": 370
+    },
+    {
+      "epoch": 0.6056983462841203,
+      "grad_norm": 0.08062624931335449,
+      "learning_rate": 8.273408712358211e-05,
+      "loss": 0.0103,
+      "step": 380
+    },
+    {
+      "epoch": 0.6216377764494919,
+      "grad_norm": 0.055163562297821045,
+      "learning_rate": 8.173853856407011e-05,
+      "loss": 0.0123,
+      "step": 390
+    },
+    {
+      "epoch": 0.6375772066148635,
+      "grad_norm": 0.0700579285621643,
+      "learning_rate": 8.072145388289001e-05,
+      "loss": 0.013,
+      "step": 400
+    },
+    {
+      "epoch": 0.6535166367802351,
+      "grad_norm": 0.053463902324438095,
+      "learning_rate": 7.968352322081169e-05,
+      "loss": 0.0133,
+      "step": 410
+    },
+    {
+      "epoch": 0.6694560669456067,
+      "grad_norm": 0.09448480606079102,
+      "learning_rate": 7.86254508636036e-05,
+      "loss": 0.0142,
+      "step": 420
+    },
+    {
+      "epoch": 0.6853954971109782,
+      "grad_norm": 0.055674418807029724,
+      "learning_rate": 7.7547954764141e-05,
+      "loss": 0.0108,
+      "step": 430
+    },
+    {
+      "epoch": 0.7013349272763498,
+      "grad_norm": 0.07148294895887375,
+      "learning_rate": 7.645176605524049e-05,
+      "loss": 0.011,
+      "step": 440
+    },
+    {
+      "epoch": 0.7172743574417214,
+      "grad_norm": 0.062315683811903,
+      "learning_rate": 7.533762855355126e-05,
+      "loss": 0.011,
+      "step": 450
+    },
+    {
+      "epoch": 0.733213787607093,
+      "grad_norm": 0.13653713464736938,
+      "learning_rate": 7.420629825483993e-05,
+      "loss": 0.0106,
+      "step": 460
+    },
+    {
+      "epoch": 0.7491532177724647,
+      "grad_norm": 0.06053169071674347,
+      "learning_rate": 7.305854282101096e-05,
+      "loss": 0.0112,
+      "step": 470
+    },
+    {
+      "epoch": 0.7650926479378363,
+      "grad_norm": 0.05051583796739578,
+      "learning_rate": 7.189514105921132e-05,
+      "loss": 0.0104,
+      "step": 480
+    },
+    {
+      "epoch": 0.7810320781032078,
+      "grad_norm": 0.054913230240345,
+      "learning_rate": 7.071688239337245e-05,
+      "loss": 0.0092,
+      "step": 490
+    },
+    {
+      "epoch": 0.7969715082685794,
+      "grad_norm": 0.04786158353090286,
+      "learning_rate": 6.95245663285482e-05,
+      "loss": 0.01,
+      "step": 500
+    },
+    {
+      "epoch": 0.812910938433951,
+      "grad_norm": 0.055493079125881195,
+      "learning_rate": 6.831900190841232e-05,
+      "loss": 0.0149,
+      "step": 510
+    },
+    {
+      "epoch": 0.8288503685993226,
+      "grad_norm": 0.035253629088401794,
+      "learning_rate": 6.710100716628344e-05,
+      "loss": 0.0102,
+      "step": 520
+    },
+    {
+      "epoch": 0.8447897987646942,
+      "grad_norm": 0.10126087814569473,
+      "learning_rate": 6.58714085700503e-05,
+      "loss": 0.0102,
+      "step": 530
+    },
+    {
+      "epoch": 0.8607292289300658,
+      "grad_norm": 0.03980156034231186,
+      "learning_rate": 6.46310404613735e-05,
+      "loss": 0.0092,
+      "step": 540
+    },
+    {
+      "epoch": 0.8766686590954373,
+      "grad_norm": 0.05558431148529053,
+      "learning_rate": 6.338074448954471e-05,
+      "loss": 0.0086,
+      "step": 550
+    },
+    {
+      "epoch": 0.8926080892608089,
+      "grad_norm": 0.04321667551994324,
+      "learning_rate": 6.21213690403873e-05,
+      "loss": 0.0108,
+      "step": 560
+    },
+    {
+      "epoch": 0.9085475194261805,
+      "grad_norm": 0.06686638295650482,
+      "learning_rate": 6.0853768660585684e-05,
+      "loss": 0.0099,
+      "step": 570
+    },
+    {
+      "epoch": 0.9244869495915521,
+      "grad_norm": 0.05315929278731346,
+      "learning_rate": 5.957880347783449e-05,
+      "loss": 0.0108,
+      "step": 580
+    },
+    {
+      "epoch": 0.9404263797569237,
+      "grad_norm": 0.04230858013033867,
+      "learning_rate": 5.829733861720059e-05,
+      "loss": 0.0089,
+      "step": 590
+    },
+    {
+      "epoch": 0.9563658099222953,
+      "grad_norm": 0.0437534861266613,
+      "learning_rate": 5.70102436140943e-05,
+      "loss": 0.0083,
+      "step": 600
+    },
+    {
+      "epoch": 0.9723052400876668,
+      "grad_norm": 0.054816000163555145,
+      "learning_rate": 5.571839182424775e-05,
+      "loss": 0.0093,
+      "step": 610
+    },
+    {
+      "epoch": 0.9882446702530384,
+      "grad_norm": 0.03881492838263512,
+      "learning_rate": 5.442265983110123e-05,
+      "loss": 0.0088,
+      "step": 620
+    },
+    {
+      "epoch": 1.0031878860330743,
+      "grad_norm": 0.04418308287858963,
+      "learning_rate": 5.312392685099915e-05,
+      "loss": 0.0104,
+      "step": 630
+    },
+    {
+      "epoch": 1.019127316198446,
+      "grad_norm": 0.05048086866736412,
+      "learning_rate": 5.1823074136599605e-05,
+      "loss": 0.0088,
+      "step": 640
+    },
+    {
+      "epoch": 1.0350667463638175,
+      "grad_norm": 0.05302772670984268,
+      "learning_rate": 5.0520984378902146e-05,
+      "loss": 0.0098,
+      "step": 650
+    },
+    {
+      "epoch": 1.0510061765291892,
+      "grad_norm": 0.056556668132543564,
+      "learning_rate": 4.921854110829962e-05,
+      "loss": 0.0095,
+      "step": 660
+    },
+    {
+      "epoch": 1.0669456066945606,
+      "grad_norm": 0.04491036757826805,
+      "learning_rate": 4.791662809506025e-05,
+      "loss": 0.0091,
+      "step": 670
+    },
+    {
+      "epoch": 1.0828850368599323,
+      "grad_norm": 0.0727638527750969,
+      "learning_rate": 4.66161287496473e-05,
+      "loss": 0.0089,
+      "step": 680
+    },
+    {
+      "epoch": 1.0988244670253038,
+      "grad_norm": 0.06548187881708145,
+      "learning_rate": 4.5317925523282464e-05,
+      "loss": 0.0094,
+      "step": 690
+    },
+    {
+      "epoch": 1.1147638971906755,
+      "grad_norm": 0.04109744727611542,
+      "learning_rate": 4.402289930916053e-05,
+      "loss": 0.0088,
+      "step": 700
+    },
+    {
+      "epoch": 1.130703327356047,
+      "grad_norm": 0.07570062577724457,
+      "learning_rate": 4.2731928844720994e-05,
+      "loss": 0.0093,
+      "step": 710
+    },
+    {
+      "epoch": 1.1466427575214186,
+      "grad_norm": 0.04505423083901405,
+      "learning_rate": 4.1445890115382505e-05,
+      "loss": 0.0085,
+      "step": 720
+    },
+    {
+      "epoch": 1.1625821876867901,
+      "grad_norm": 0.040313106030225754,
+      "learning_rate": 4.016565576014478e-05,
+      "loss": 0.0092,
+      "step": 730
+    },
+    {
+      "epoch": 1.1785216178521618,
+      "grad_norm": 0.07987982034683228,
+      "learning_rate": 3.889209447946116e-05,
+      "loss": 0.0134,
+      "step": 740
+    },
+    {
+      "epoch": 1.1944610480175333,
+      "grad_norm": 0.045985572040081024,
+      "learning_rate": 3.762607044578357e-05,
+      "loss": 0.0093,
+      "step": 750
+    },
+    {
+      "epoch": 1.210400478182905,
+      "grad_norm": 0.0642857551574707,
+      "learning_rate": 3.636844271718016e-05,
+      "loss": 0.0103,
+      "step": 760
+    },
+    {
+      "epoch": 1.2263399083482764,
+      "grad_norm": 0.03940548002719879,
+      "learning_rate": 3.512006465442309e-05,
+      "loss": 0.0113,
+      "step": 770
+    },
+    {
+      "epoch": 1.2422793385136481,
+      "grad_norm": 0.05522974207997322,
+      "learning_rate": 3.388178334194232e-05,
+      "loss": 0.0088,
+      "step": 780
+    },
+    {
+      "epoch": 1.2582187686790198,
+      "grad_norm": 0.04090171679854393,
+      "learning_rate": 3.2654439013038165e-05,
+      "loss": 0.0092,
+      "step": 790
+    },
+    {
+      "epoch": 1.2741581988443913,
+      "grad_norm": 0.04401571303606033,
+      "learning_rate": 3.143886447974269e-05,
+      "loss": 0.008,
+      "step": 800
+    },
+    {
+      "epoch": 1.2900976290097628,
+      "grad_norm": 0.037025950849056244,
+      "learning_rate": 3.0235884567716737e-05,
+      "loss": 0.0074,
+      "step": 810
+    },
+    {
+      "epoch": 1.3060370591751345,
+      "grad_norm": 0.050563715398311615,
+      "learning_rate": 2.904631555656616e-05,
+      "loss": 0.0093,
+      "step": 820
+    },
+    {
+      "epoch": 1.3219764893405062,
+      "grad_norm": 0.10221158713102341,
+      "learning_rate": 2.7870964625956985e-05,
+      "loss": 0.0127,
+      "step": 830
+    },
+    {
+      "epoch": 1.3379159195058776,
+      "grad_norm": 0.05870247632265091,
+      "learning_rate": 2.671062930790511e-05,
+      "loss": 0.0088,
+      "step": 840
+    },
+    {
+      "epoch": 1.3538553496712493,
+      "grad_norm": 0.03522193804383278,
+      "learning_rate": 2.5566096945612727e-05,
+      "loss": 0.0103,
+      "step": 850
+    },
+    {
+      "epoch": 1.3697947798366208,
+      "grad_norm": 0.0318855457007885,
+      "learning_rate": 2.443814415921809e-05,
+      "loss": 0.0097,
+      "step": 860
+    },
+    {
+      "epoch": 1.3857342100019925,
+      "grad_norm": 0.04261662811040878,
+      "learning_rate": 2.3327536318821495e-05,
+      "loss": 0.0085,
+      "step": 870
+    },
+    {
+      "epoch": 1.401673640167364,
+      "grad_norm": 0.06472990661859512,
+      "learning_rate": 2.223502702514487e-05,
+      "loss": 0.0099,
+      "step": 880
+    },
+    {
+      "epoch": 1.4176130703327356,
+      "grad_norm": 0.033623527735471725,
+      "learning_rate": 2.1161357598177696e-05,
+      "loss": 0.0068,
+      "step": 890
+    },
+    {
+      "epoch": 1.4335525004981071,
+      "grad_norm": 0.07391348481178284,
+      "learning_rate": 2.0107256574155564e-05,
+      "loss": 0.0086,
+      "step": 900
+    },
+    {
+      "epoch": 1.4494919306634788,
+      "grad_norm": 0.04469392076134682,
+      "learning_rate": 1.907343921121359e-05,
+      "loss": 0.0082,
+      "step": 910
+    },
+    {
+      "epoch": 1.4654313608288505,
+      "grad_norm": 0.040570441633462906,
+      "learning_rate": 1.8060607004049323e-05,
+      "loss": 0.0078,
+      "step": 920
+    },
+    {
+      "epoch": 1.481370790994222,
+      "grad_norm": 0.03512577340006828,
+      "learning_rate": 1.7069447207924994e-05,
+      "loss": 0.0099,
+      "step": 930
+    },
+    {
+      "epoch": 1.4973102211595934,
+      "grad_norm": 0.05797265097498894,
+      "learning_rate": 1.6100632372331727e-05,
+      "loss": 0.0097,
+      "step": 940
+    },
+    {
+      "epoch": 1.5132496513249651,
+      "grad_norm": 0.05391458421945572,
+      "learning_rate": 1.5154819884632609e-05,
+      "loss": 0.0087,
+      "step": 950
+    },
+    {
+      "epoch": 1.5291890814903368,
+      "grad_norm": 0.04258555918931961,
+      "learning_rate": 1.4232651523993634e-05,
+      "loss": 0.0091,
+      "step": 960
+    },
+    {
+      "epoch": 1.5451285116557083,
+      "grad_norm": 0.05770336836576462,
+      "learning_rate": 1.3334753025905838e-05,
+      "loss": 0.0098,
+      "step": 970
+    },
+    {
+      "epoch": 1.5610679418210798,
+      "grad_norm": 0.1264369785785675,
+      "learning_rate": 1.2461733657593722e-05,
+      "loss": 0.0078,
+      "step": 980
+    },
+    {
+      "epoch": 1.5770073719864515,
+      "grad_norm": 0.03682301193475723,
+      "learning_rate": 1.16141858045982e-05,
+      "loss": 0.0083,
+      "step": 990
+    },
+    {
+      "epoch": 1.5929468021518232,
+      "grad_norm": 0.03113352321088314,
+      "learning_rate": 1.0792684568814503e-05,
+      "loss": 0.0085,
+      "step": 1000
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 1256,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 6.805552409615647e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:246e3c8fa418ba58f4a01f958ecbdae50370f929b9cea9496ce015290e0f4654
+size 5432

checkpoint-1256/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: microsoft/Phi-3-mini-4k-instruct
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:microsoft/Phi-3-mini-4k-instruct
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

checkpoint-1256/adapter_config.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "microsoft/Phi-3-mini-4k-instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "k_proj",
+    "o_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-1256/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7841eb0730a6387ec528d5537ed2175514da9fc2dc2ea6c6fc166ab9c2ea72e
+size 71320216

checkpoint-1256/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,8 @@

+{% for message in messages %}{% if message['role'] == 'system' %}{{'<|system|>
+' + message['content'] + '<|end|>
+'}}{% elif message['role'] == 'user' %}{{'<|user|>
+' + message['content'] + '<|end|>
+'}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>
+' + message['content'] + '<|end|>
+'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
+' }}{% else %}{{ eos_token }}{% endif %}

checkpoint-1256/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2dd3b94735d2a469f6c21dda1ca77ccc30db4962316172379171aa3b1177b1cd
+size 36361850

checkpoint-1256/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1541b4217c6c5351fd311de2e8c44cfa362607df83d0031c65afffeb4c6d446f
+size 14244

checkpoint-1256/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:af21cce5671afa8bb2519c7cf220df32523bbf25d641971cadf3f21e46ed0c52
+size 1064

checkpoint-1256/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1256/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1256/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,135 @@

+{
+  "add_bos_token": false,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": false
+    },
+    "32000": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32001": {
+      "content": "<|assistant|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32002": {
+      "content": "<|placeholder1|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32003": {
+      "content": "<|placeholder2|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32004": {
+      "content": "<|placeholder3|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32005": {
+      "content": "<|placeholder4|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32006": {
+      "content": "<|system|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32007": {
+      "content": "<|end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32008": {
+      "content": "<|placeholder5|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32009": {
+      "content": "<|placeholder6|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32010": {
+      "content": "<|user|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "extra_special_tokens": {},
+  "legacy": false,
+  "max_length": 2048,
+  "model_max_length": 4096,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "stride": 0,
+  "tokenizer_class": "LlamaTokenizerFast",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

checkpoint-1256/trainer_state.json ADDED Viewed

	@@ -0,0 +1,909 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 2.0,
+  "eval_steps": 500,
+  "global_step": 1256,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.01593943016537159,
+      "grad_norm": 0.12403535097837448,
+      "learning_rate": 1.8e-05,
+      "loss": 1.1133,
+      "step": 10
+    },
+    {
+      "epoch": 0.03187886033074318,
+      "grad_norm": 0.21290278434753418,
+      "learning_rate": 3.8e-05,
+      "loss": 1.0413,
+      "step": 20
+    },
+    {
+      "epoch": 0.04781829049611477,
+      "grad_norm": 0.1645481288433075,
+      "learning_rate": 5.8e-05,
+      "loss": 0.9616,
+      "step": 30
+    },
+    {
+      "epoch": 0.06375772066148636,
+      "grad_norm": 0.2903112471103668,
+      "learning_rate": 7.800000000000001e-05,
+      "loss": 0.874,
+      "step": 40
+    },
+    {
+      "epoch": 0.07969715082685794,
+      "grad_norm": 0.2749190628528595,
+      "learning_rate": 9.8e-05,
+      "loss": 0.7337,
+      "step": 50
+    },
+    {
+      "epoch": 0.09563658099222953,
+      "grad_norm": 0.4625573754310608,
+      "learning_rate": 9.99862592554908e-05,
+      "loss": 0.5768,
+      "step": 60
+    },
+    {
+      "epoch": 0.11157601115760112,
+      "grad_norm": 0.31431907415390015,
+      "learning_rate": 9.993877008154289e-05,
+      "loss": 0.4941,
+      "step": 70
+    },
+    {
+      "epoch": 0.1275154413229727,
+      "grad_norm": 0.43303918838500977,
+      "learning_rate": 9.985739505534436e-05,
+      "loss": 0.3854,
+      "step": 80
+    },
+    {
+      "epoch": 0.1434548714883443,
+      "grad_norm": 0.5850837826728821,
+      "learning_rate": 9.974218939375599e-05,
+      "loss": 0.2906,
+      "step": 90
+    },
+    {
+      "epoch": 0.15939430165371588,
+      "grad_norm": 0.4940532147884369,
+      "learning_rate": 9.959323126934831e-05,
+      "loss": 0.2056,
+      "step": 100
+    },
+    {
+      "epoch": 0.17533373181908746,
+      "grad_norm": 0.3987452983856201,
+      "learning_rate": 9.94106217573578e-05,
+      "loss": 0.1315,
+      "step": 110
+    },
+    {
+      "epoch": 0.19127316198445907,
+      "grad_norm": 0.42741864919662476,
+      "learning_rate": 9.919448476710246e-05,
+      "loss": 0.0859,
+      "step": 120
+    },
+    {
+      "epoch": 0.20721259214983065,
+      "grad_norm": 0.24628710746765137,
+      "learning_rate": 9.894496695790344e-05,
+      "loss": 0.0663,
+      "step": 130
+    },
+    {
+      "epoch": 0.22315202231520223,
+      "grad_norm": 0.2623507082462311,
+      "learning_rate": 9.866223763956955e-05,
+      "loss": 0.0447,
+      "step": 140
+    },
+    {
+      "epoch": 0.2390914524805738,
+      "grad_norm": 0.14876964688301086,
+      "learning_rate": 9.834648865751254e-05,
+      "loss": 0.0372,
+      "step": 150
+    },
+    {
+      "epoch": 0.2550308826459454,
+      "grad_norm": 0.1500687152147293,
+      "learning_rate": 9.799793426257071e-05,
+      "loss": 0.0356,
+      "step": 160
+    },
+    {
+      "epoch": 0.270970312811317,
+      "grad_norm": 0.16156131029129028,
+      "learning_rate": 9.76168109656295e-05,
+      "loss": 0.0293,
+      "step": 170
+    },
+    {
+      "epoch": 0.2869097429766886,
+      "grad_norm": 0.20239904522895813,
+      "learning_rate": 9.720337737713739e-05,
+      "loss": 0.025,
+      "step": 180
+    },
+    {
+      "epoch": 0.30284917314206017,
+      "grad_norm": 0.10117224603891373,
+      "learning_rate": 9.675791403162645e-05,
+      "loss": 0.0209,
+      "step": 190
+    },
+    {
+      "epoch": 0.31878860330743175,
+      "grad_norm": 0.3235134482383728,
+      "learning_rate": 9.628072319735607e-05,
+      "loss": 0.0247,
+      "step": 200
+    },
+    {
+      "epoch": 0.33472803347280333,
+      "grad_norm": 0.10946714133024216,
+      "learning_rate": 9.577212867120947e-05,
+      "loss": 0.0213,
+      "step": 210
+    },
+    {
+      "epoch": 0.3506674636381749,
+      "grad_norm": 0.08509603142738342,
+      "learning_rate": 9.523247555898204e-05,
+      "loss": 0.0178,
+      "step": 220
+    },
+    {
+      "epoch": 0.3666068938035465,
+      "grad_norm": 0.07975362986326218,
+      "learning_rate": 9.466213004121041e-05,
+      "loss": 0.016,
+      "step": 230
+    },
+    {
+      "epoch": 0.38254632396891813,
+      "grad_norm": 0.061933811753988266,
+      "learning_rate": 9.406147912470143e-05,
+      "loss": 0.0143,
+      "step": 240
+    },
+    {
+      "epoch": 0.3984857541342897,
+      "grad_norm": 0.22777368128299713,
+      "learning_rate": 9.343093037992945e-05,
+      "loss": 0.0294,
+      "step": 250
+    },
+    {
+      "epoch": 0.4144251842996613,
+      "grad_norm": 0.11101654917001724,
+      "learning_rate": 9.277091166448022e-05,
+      "loss": 0.0159,
+      "step": 260
+    },
+    {
+      "epoch": 0.4303646144650329,
+      "grad_norm": 0.27989494800567627,
+      "learning_rate": 9.208187083272894e-05,
+      "loss": 0.0169,
+      "step": 270
+    },
+    {
+      "epoch": 0.44630404463040446,
+      "grad_norm": 0.07749247550964355,
+      "learning_rate": 9.136427543194967e-05,
+      "loss": 0.017,
+      "step": 280
+    },
+    {
+      "epoch": 0.46224347479577604,
+      "grad_norm": 0.14141874015331268,
+      "learning_rate": 9.061861238506194e-05,
+      "loss": 0.0152,
+      "step": 290
+    },
+    {
+      "epoch": 0.4781829049611476,
+      "grad_norm": 0.09233926236629486,
+      "learning_rate": 8.984538766023024e-05,
+      "loss": 0.0148,
+      "step": 300
+    },
+    {
+      "epoch": 0.4941223351265192,
+      "grad_norm": 0.08206266909837723,
+      "learning_rate": 8.904512592754034e-05,
+      "loss": 0.013,
+      "step": 310
+    },
+    {
+      "epoch": 0.5100617652918908,
+      "grad_norm": 0.09603980928659439,
+      "learning_rate": 8.821837020298547e-05,
+      "loss": 0.0122,
+      "step": 320
+    },
+    {
+      "epoch": 0.5260011954572624,
+      "grad_norm": 0.07990539819002151,
+      "learning_rate": 8.736568148000386e-05,
+      "loss": 0.0118,
+      "step": 330
+    },
+    {
+      "epoch": 0.541940625622634,
+      "grad_norm": 0.10424526035785675,
+      "learning_rate": 8.648763834881782e-05,
+      "loss": 0.0125,
+      "step": 340
+    },
+    {
+      "epoch": 0.5578800557880056,
+      "grad_norm": 0.13267765939235687,
+      "learning_rate": 8.558483660383245e-05,
+      "loss": 0.0134,
+      "step": 350
+    },
+    {
+      "epoch": 0.5738194859533772,
+      "grad_norm": 0.11512497812509537,
+      "learning_rate": 8.46578888393606e-05,
+      "loss": 0.0136,
+      "step": 360
+    },
+    {
+      "epoch": 0.5897589161187488,
+      "grad_norm": 0.09562960267066956,
+      "learning_rate": 8.37074240339482e-05,
+      "loss": 0.0115,
+      "step": 370
+    },
+    {
+      "epoch": 0.6056983462841203,
+      "grad_norm": 0.08062624931335449,
+      "learning_rate": 8.273408712358211e-05,
+      "loss": 0.0103,
+      "step": 380
+    },
+    {
+      "epoch": 0.6216377764494919,
+      "grad_norm": 0.055163562297821045,
+      "learning_rate": 8.173853856407011e-05,
+      "loss": 0.0123,
+      "step": 390
+    },
+    {
+      "epoch": 0.6375772066148635,
+      "grad_norm": 0.0700579285621643,
+      "learning_rate": 8.072145388289001e-05,
+      "loss": 0.013,
+      "step": 400
+    },
+    {
+      "epoch": 0.6535166367802351,
+      "grad_norm": 0.053463902324438095,
+      "learning_rate": 7.968352322081169e-05,
+      "loss": 0.0133,
+      "step": 410
+    },
+    {
+      "epoch": 0.6694560669456067,
+      "grad_norm": 0.09448480606079102,
+      "learning_rate": 7.86254508636036e-05,
+      "loss": 0.0142,
+      "step": 420
+    },
+    {
+      "epoch": 0.6853954971109782,
+      "grad_norm": 0.055674418807029724,
+      "learning_rate": 7.7547954764141e-05,
+      "loss": 0.0108,
+      "step": 430
+    },
+    {
+      "epoch": 0.7013349272763498,
+      "grad_norm": 0.07148294895887375,
+      "learning_rate": 7.645176605524049e-05,
+      "loss": 0.011,
+      "step": 440
+    },
+    {
+      "epoch": 0.7172743574417214,
+      "grad_norm": 0.062315683811903,
+      "learning_rate": 7.533762855355126e-05,
+      "loss": 0.011,
+      "step": 450
+    },
+    {
+      "epoch": 0.733213787607093,
+      "grad_norm": 0.13653713464736938,
+      "learning_rate": 7.420629825483993e-05,
+      "loss": 0.0106,
+      "step": 460
+    },
+    {
+      "epoch": 0.7491532177724647,
+      "grad_norm": 0.06053169071674347,
+      "learning_rate": 7.305854282101096e-05,
+      "loss": 0.0112,
+      "step": 470
+    },
+    {
+      "epoch": 0.7650926479378363,
+      "grad_norm": 0.05051583796739578,
+      "learning_rate": 7.189514105921132e-05,
+      "loss": 0.0104,
+      "step": 480
+    },
+    {
+      "epoch": 0.7810320781032078,
+      "grad_norm": 0.054913230240345,
+      "learning_rate": 7.071688239337245e-05,
+      "loss": 0.0092,
+      "step": 490
+    },
+    {
+      "epoch": 0.7969715082685794,
+      "grad_norm": 0.04786158353090286,
+      "learning_rate": 6.95245663285482e-05,
+      "loss": 0.01,
+      "step": 500
+    },
+    {
+      "epoch": 0.812910938433951,
+      "grad_norm": 0.055493079125881195,
+      "learning_rate": 6.831900190841232e-05,
+      "loss": 0.0149,
+      "step": 510
+    },
+    {
+      "epoch": 0.8288503685993226,
+      "grad_norm": 0.035253629088401794,
+      "learning_rate": 6.710100716628344e-05,
+      "loss": 0.0102,
+      "step": 520
+    },
+    {
+      "epoch": 0.8447897987646942,
+      "grad_norm": 0.10126087814569473,
+      "learning_rate": 6.58714085700503e-05,
+      "loss": 0.0102,
+      "step": 530
+    },
+    {
+      "epoch": 0.8607292289300658,
+      "grad_norm": 0.03980156034231186,
+      "learning_rate": 6.46310404613735e-05,
+      "loss": 0.0092,
+      "step": 540
+    },
+    {
+      "epoch": 0.8766686590954373,
+      "grad_norm": 0.05558431148529053,
+      "learning_rate": 6.338074448954471e-05,
+      "loss": 0.0086,
+      "step": 550
+    },
+    {
+      "epoch": 0.8926080892608089,
+      "grad_norm": 0.04321667551994324,
+      "learning_rate": 6.21213690403873e-05,
+      "loss": 0.0108,
+      "step": 560
+    },
+    {
+      "epoch": 0.9085475194261805,
+      "grad_norm": 0.06686638295650482,
+      "learning_rate": 6.0853768660585684e-05,
+      "loss": 0.0099,
+      "step": 570
+    },
+    {
+      "epoch": 0.9244869495915521,
+      "grad_norm": 0.05315929278731346,
+      "learning_rate": 5.957880347783449e-05,
+      "loss": 0.0108,
+      "step": 580
+    },
+    {
+      "epoch": 0.9404263797569237,
+      "grad_norm": 0.04230858013033867,
+      "learning_rate": 5.829733861720059e-05,
+      "loss": 0.0089,
+      "step": 590
+    },
+    {
+      "epoch": 0.9563658099222953,
+      "grad_norm": 0.0437534861266613,
+      "learning_rate": 5.70102436140943e-05,
+      "loss": 0.0083,
+      "step": 600
+    },
+    {
+      "epoch": 0.9723052400876668,
+      "grad_norm": 0.054816000163555145,
+      "learning_rate": 5.571839182424775e-05,
+      "loss": 0.0093,
+      "step": 610
+    },
+    {
+      "epoch": 0.9882446702530384,
+      "grad_norm": 0.03881492838263512,
+      "learning_rate": 5.442265983110123e-05,
+      "loss": 0.0088,
+      "step": 620
+    },
+    {
+      "epoch": 1.0031878860330743,
+      "grad_norm": 0.04418308287858963,
+      "learning_rate": 5.312392685099915e-05,
+      "loss": 0.0104,
+      "step": 630
+    },
+    {
+      "epoch": 1.019127316198446,
+      "grad_norm": 0.05048086866736412,
+      "learning_rate": 5.1823074136599605e-05,
+      "loss": 0.0088,
+      "step": 640
+    },
+    {
+      "epoch": 1.0350667463638175,
+      "grad_norm": 0.05302772670984268,
+      "learning_rate": 5.0520984378902146e-05,
+      "loss": 0.0098,
+      "step": 650
+    },
+    {
+      "epoch": 1.0510061765291892,
+      "grad_norm": 0.056556668132543564,
+      "learning_rate": 4.921854110829962e-05,
+      "loss": 0.0095,
+      "step": 660
+    },
+    {
+      "epoch": 1.0669456066945606,
+      "grad_norm": 0.04491036757826805,
+      "learning_rate": 4.791662809506025e-05,
+      "loss": 0.0091,
+      "step": 670
+    },
+    {
+      "epoch": 1.0828850368599323,
+      "grad_norm": 0.0727638527750969,
+      "learning_rate": 4.66161287496473e-05,
+      "loss": 0.0089,
+      "step": 680
+    },
+    {
+      "epoch": 1.0988244670253038,
+      "grad_norm": 0.06548187881708145,
+      "learning_rate": 4.5317925523282464e-05,
+      "loss": 0.0094,
+      "step": 690
+    },
+    {
+      "epoch": 1.1147638971906755,
+      "grad_norm": 0.04109744727611542,
+      "learning_rate": 4.402289930916053e-05,
+      "loss": 0.0088,
+      "step": 700
+    },
+    {
+      "epoch": 1.130703327356047,
+      "grad_norm": 0.07570062577724457,
+      "learning_rate": 4.2731928844720994e-05,
+      "loss": 0.0093,
+      "step": 710
+    },
+    {
+      "epoch": 1.1466427575214186,
+      "grad_norm": 0.04505423083901405,
+      "learning_rate": 4.1445890115382505e-05,
+      "loss": 0.0085,
+      "step": 720
+    },
+    {
+      "epoch": 1.1625821876867901,
+      "grad_norm": 0.040313106030225754,
+      "learning_rate": 4.016565576014478e-05,
+      "loss": 0.0092,
+      "step": 730
+    },
+    {
+      "epoch": 1.1785216178521618,
+      "grad_norm": 0.07987982034683228,
+      "learning_rate": 3.889209447946116e-05,
+      "loss": 0.0134,
+      "step": 740
+    },
+    {
+      "epoch": 1.1944610480175333,
+      "grad_norm": 0.045985572040081024,
+      "learning_rate": 3.762607044578357e-05,
+      "loss": 0.0093,
+      "step": 750
+    },
+    {
+      "epoch": 1.210400478182905,
+      "grad_norm": 0.0642857551574707,
+      "learning_rate": 3.636844271718016e-05,
+      "loss": 0.0103,
+      "step": 760
+    },
+    {
+      "epoch": 1.2263399083482764,
+      "grad_norm": 0.03940548002719879,
+      "learning_rate": 3.512006465442309e-05,
+      "loss": 0.0113,
+      "step": 770
+    },
+    {
+      "epoch": 1.2422793385136481,
+      "grad_norm": 0.05522974207997322,
+      "learning_rate": 3.388178334194232e-05,
+      "loss": 0.0088,
+      "step": 780
+    },
+    {
+      "epoch": 1.2582187686790198,
+      "grad_norm": 0.04090171679854393,
+      "learning_rate": 3.2654439013038165e-05,
+      "loss": 0.0092,
+      "step": 790
+    },
+    {
+      "epoch": 1.2741581988443913,
+      "grad_norm": 0.04401571303606033,
+      "learning_rate": 3.143886447974269e-05,
+      "loss": 0.008,
+      "step": 800
+    },
+    {
+      "epoch": 1.2900976290097628,
+      "grad_norm": 0.037025950849056244,
+      "learning_rate": 3.0235884567716737e-05,
+      "loss": 0.0074,
+      "step": 810
+    },
+    {
+      "epoch": 1.3060370591751345,
+      "grad_norm": 0.050563715398311615,
+      "learning_rate": 2.904631555656616e-05,
+      "loss": 0.0093,
+      "step": 820
+    },
+    {
+      "epoch": 1.3219764893405062,
+      "grad_norm": 0.10221158713102341,
+      "learning_rate": 2.7870964625956985e-05,
+      "loss": 0.0127,
+      "step": 830
+    },
+    {
+      "epoch": 1.3379159195058776,
+      "grad_norm": 0.05870247632265091,
+      "learning_rate": 2.671062930790511e-05,
+      "loss": 0.0088,
+      "step": 840
+    },
+    {
+      "epoch": 1.3538553496712493,
+      "grad_norm": 0.03522193804383278,
+      "learning_rate": 2.5566096945612727e-05,
+      "loss": 0.0103,
+      "step": 850
+    },
+    {
+      "epoch": 1.3697947798366208,
+      "grad_norm": 0.0318855457007885,
+      "learning_rate": 2.443814415921809e-05,
+      "loss": 0.0097,
+      "step": 860
+    },
+    {
+      "epoch": 1.3857342100019925,
+      "grad_norm": 0.04261662811040878,
+      "learning_rate": 2.3327536318821495e-05,
+      "loss": 0.0085,
+      "step": 870
+    },
+    {
+      "epoch": 1.401673640167364,
+      "grad_norm": 0.06472990661859512,
+      "learning_rate": 2.223502702514487e-05,
+      "loss": 0.0099,
+      "step": 880
+    },
+    {
+      "epoch": 1.4176130703327356,
+      "grad_norm": 0.033623527735471725,
+      "learning_rate": 2.1161357598177696e-05,
+      "loss": 0.0068,
+      "step": 890
+    },
+    {
+      "epoch": 1.4335525004981071,
+      "grad_norm": 0.07391348481178284,
+      "learning_rate": 2.0107256574155564e-05,
+      "loss": 0.0086,
+      "step": 900
+    },
+    {
+      "epoch": 1.4494919306634788,
+      "grad_norm": 0.04469392076134682,
+      "learning_rate": 1.907343921121359e-05,
+      "loss": 0.0082,
+      "step": 910
+    },
+    {
+      "epoch": 1.4654313608288505,
+      "grad_norm": 0.040570441633462906,
+      "learning_rate": 1.8060607004049323e-05,
+      "loss": 0.0078,
+      "step": 920
+    },
+    {
+      "epoch": 1.481370790994222,
+      "grad_norm": 0.03512577340006828,
+      "learning_rate": 1.7069447207924994e-05,
+      "loss": 0.0099,
+      "step": 930
+    },
+    {
+      "epoch": 1.4973102211595934,
+      "grad_norm": 0.05797265097498894,
+      "learning_rate": 1.6100632372331727e-05,
+      "loss": 0.0097,
+      "step": 940
+    },
+    {
+      "epoch": 1.5132496513249651,
+      "grad_norm": 0.05391458421945572,
+      "learning_rate": 1.5154819884632609e-05,
+      "loss": 0.0087,
+      "step": 950
+    },
+    {
+      "epoch": 1.5291890814903368,
+      "grad_norm": 0.04258555918931961,
+      "learning_rate": 1.4232651523993634e-05,
+      "loss": 0.0091,
+      "step": 960
+    },
+    {
+      "epoch": 1.5451285116557083,
+      "grad_norm": 0.05770336836576462,
+      "learning_rate": 1.3334753025905838e-05,
+      "loss": 0.0098,
+      "step": 970
+    },
+    {
+      "epoch": 1.5610679418210798,
+      "grad_norm": 0.1264369785785675,
+      "learning_rate": 1.2461733657593722e-05,
+      "loss": 0.0078,
+      "step": 980
+    },
+    {
+      "epoch": 1.5770073719864515,
+      "grad_norm": 0.03682301193475723,
+      "learning_rate": 1.16141858045982e-05,
+      "loss": 0.0083,
+      "step": 990
+    },
+    {
+      "epoch": 1.5929468021518232,
+      "grad_norm": 0.03113352321088314,
+      "learning_rate": 1.0792684568814503e-05,
+      "loss": 0.0085,
+      "step": 1000
+    },
+    {
+      "epoch": 1.6088862323171946,
+      "grad_norm": 0.03550567850470543,
+      "learning_rate": 9.997787378258121e-06,
+      "loss": 0.0084,
+      "step": 1010
+    },
+    {
+      "epoch": 1.624825662482566,
+      "grad_norm": 0.03575735166668892,
+      "learning_rate": 9.23003360882293e-06,
+      "loss": 0.009,
+      "step": 1020
+    },
+    {
+      "epoch": 1.6407650926479378,
+      "grad_norm": 0.0562940277159214,
+      "learning_rate": 8.489944218288908e-06,
+      "loss": 0.0097,
+      "step": 1030
+    },
+    {
+      "epoch": 1.6567045228133095,
+      "grad_norm": 0.04722534865140915,
+      "learning_rate": 7.778021392827211e-06,
+      "loss": 0.0078,
+      "step": 1040
+    },
+    {
+      "epoch": 1.672643952978681,
+      "grad_norm": 0.04047123342752457,
+      "learning_rate": 7.094748206242796e-06,
+      "loss": 0.0114,
+      "step": 1050
+    },
+    {
+      "epoch": 1.6885833831440527,
+      "grad_norm": 0.06225157901644707,
+      "learning_rate": 6.440588292185595e-06,
+      "loss": 0.0078,
+      "step": 1060
+    },
+    {
+      "epoch": 1.7045228133094241,
+      "grad_norm": 0.03725333884358406,
+      "learning_rate": 5.815985529552942e-06,
+      "loss": 0.008,
+      "step": 1070
+    },
+    {
+      "epoch": 1.7204622434747958,
+      "grad_norm": 0.029080206528306007,
+      "learning_rate": 5.221363741296298e-06,
+      "loss": 0.0083,
+      "step": 1080
+    },
+    {
+      "epoch": 1.7364016736401675,
+      "grad_norm": 0.03860724717378616,
+      "learning_rate": 4.657126406837148e-06,
+      "loss": 0.0079,
+      "step": 1090
+    },
+    {
+      "epoch": 1.752341103805539,
+      "grad_norm": 0.10423146188259125,
+      "learning_rate": 4.123656388286812e-06,
+      "loss": 0.0086,
+      "step": 1100
+    },
+    {
+      "epoch": 1.7682805339709105,
+      "grad_norm": 0.03647388890385628,
+      "learning_rate": 3.621315670656117e-06,
+      "loss": 0.007,
+      "step": 1110
+    },
+    {
+      "epoch": 1.7842199641362821,
+      "grad_norm": 0.03490564227104187,
+      "learning_rate": 3.1504451162311986e-06,
+      "loss": 0.0068,
+      "step": 1120
+    },
+    {
+      "epoch": 1.8001593943016538,
+      "grad_norm": 0.03614771366119385,
+      "learning_rate": 2.7113642332821045e-06,
+      "loss": 0.0086,
+      "step": 1130
+    },
+    {
+      "epoch": 1.8160988244670253,
+      "grad_norm": 0.04321877658367157,
+      "learning_rate": 2.3043709592610485e-06,
+      "loss": 0.0077,
+      "step": 1140
+    },
+    {
+      "epoch": 1.8320382546323968,
+      "grad_norm": 0.04423265904188156,
+      "learning_rate": 1.929741458637618e-06,
+      "loss": 0.0112,
+      "step": 1150
+    },
+    {
+      "epoch": 1.8479776847977685,
+      "grad_norm": 0.055377379059791565,
+      "learning_rate": 1.5877299355078533e-06,
+      "loss": 0.0077,
+      "step": 1160
+    },
+    {
+      "epoch": 1.8639171149631402,
+      "grad_norm": 0.05351576954126358,
+      "learning_rate": 1.2785684611046344e-06,
+      "loss": 0.0088,
+      "step": 1170
+    },
+    {
+      "epoch": 1.8798565451285116,
+      "grad_norm": 0.04308084771037102,
+      "learning_rate": 1.002466816326164e-06,
+      "loss": 0.0087,
+      "step": 1180
+    },
+    {
+      "epoch": 1.895795975293883,
+      "grad_norm": 0.05896038934588432,
+      "learning_rate": 7.596123493895991e-07,
+      "loss": 0.0106,
+      "step": 1190
+    },
+    {
+      "epoch": 1.9117354054592548,
+      "grad_norm": 0.039490342140197754,
+      "learning_rate": 5.501698487062446e-07,
+      "loss": 0.0089,
+      "step": 1200
+    },
+    {
+      "epoch": 1.9276748356246265,
+      "grad_norm": 0.03349680081009865,
+      "learning_rate": 3.742814310647602e-07,
+      "loss": 0.0081,
+      "step": 1210
+    },
+    {
+      "epoch": 1.943614265789998,
+      "grad_norm": 0.05369997024536133,
+      "learning_rate": 2.320664451980592e-07,
+      "loss": 0.0078,
+      "step": 1220
+    },
+    {
+      "epoch": 1.9595536959553694,
+      "grad_norm": 0.04113243520259857,
+      "learning_rate": 1.236213907994943e-07,
+      "loss": 0.0076,
+      "step": 1230
+    },
+    {
+      "epoch": 1.9754931261207411,
+      "grad_norm": 0.04669644311070442,
+      "learning_rate": 4.901985304315848e-08,
+      "loss": 0.0094,
+      "step": 1240
+    },
+    {
+      "epoch": 1.9914325562861128,
+      "grad_norm": 0.05660584941506386,
+      "learning_rate": 8.312452652831093e-09,
+      "loss": 0.0082,
+      "step": 1250
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 1256,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 8.551088583926661e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1256/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:246e3c8fa418ba58f4a01f958ecbdae50370f929b9cea9496ce015290e0f4654
+size 5432

logs/events.out.tfevents.1766981779.3043de093a88.2371.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:518821adb0c4531d396eccb0799682ba40312f6961de4db9801247fcb7011fb5
+size 31943

merged/generation_config.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": [
+    32000,
+    32001,
+    32007
+  ],
+  "pad_token_id": 32000,
+  "transformers_version": "4.57.3"
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,135 @@

+{
+  "add_bos_token": false,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": false
+    },
+    "32000": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32001": {
+      "content": "<|assistant|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32002": {
+      "content": "<|placeholder1|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32003": {
+      "content": "<|placeholder2|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32004": {
+      "content": "<|placeholder3|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32005": {
+      "content": "<|placeholder4|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32006": {
+      "content": "<|system|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32007": {
+      "content": "<|end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32008": {
+      "content": "<|placeholder5|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32009": {
+      "content": "<|placeholder6|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "32010": {
+      "content": "<|user|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "extra_special_tokens": {},
+  "legacy": false,
+  "max_length": 2048,
+  "model_max_length": 4096,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "stride": 0,
+  "tokenizer_class": "LlamaTokenizerFast",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:246e3c8fa418ba58f4a01f958ecbdae50370f929b9cea9496ce015290e0f4654
+size 5432

training_metadata.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "stage": "2_SFT",
+  "base_model": "/workspace/nishka-gkc-phi3-base",
+  "training_start": "2025-12-29 04:16:19.019832",
+  "dataset": {
+    "path": "/workspace/pql_sft_training.jsonl",
+    "examples": 10038,
+    "estimated_tokens": 7000000
+  },
+  "lora_config": {
+    "r": 32,
+    "alpha": 64,
+    "dropout": 0.05,
+    "target_modules": [
+      "q_proj",
+      "k_proj",
+      "v_proj",
+      "o_proj",
+      "gate_proj",
+      "up_proj",
+      "down_proj"
+    ]
+  },
+  "training_args": {
+    "epochs": 2,
+    "batch_size": 2,
+    "learning_rate": 0.0001,
+    "max_length": 2048
+  }
+}