Clean up: Remove redundant training checkpoint folder

Browse files

Files changed (13) hide show

last-checkpoint/README.md +0 -209
last-checkpoint/adapter_config.json +0 -46
last-checkpoint/adapter_model.safetensors +0 -3
last-checkpoint/chat_template.jinja +0 -87
last-checkpoint/optimizer.pt +0 -3
last-checkpoint/rng_state.pth +0 -3
last-checkpoint/scaler.pt +0 -3
last-checkpoint/scheduler.pt +0 -3
last-checkpoint/special_tokens_map.json +0 -24
last-checkpoint/tokenizer.json +0 -3
last-checkpoint/tokenizer_config.json +0 -0
last-checkpoint/trainer_state.json +0 -839
last-checkpoint/training_args.bin +0 -3

last-checkpoint/README.md DELETED Viewed

@@ -1,209 +0,0 @@
----
-base_model: mistralai/Mistral-Nemo-Instruct-2407
-library_name: peft
-pipeline_tag: text-generation
-tags:
-- base_model:adapter:mistralai/Mistral-Nemo-Instruct-2407
-- lora
-- sft
-- transformers
-- trl
----
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]
-### Framework versions
-- PEFT 0.18.1

last-checkpoint/adapter_config.json DELETED Viewed

@@ -1,46 +0,0 @@
-{
-  "alora_invocation_tokens": null,
-  "alpha_pattern": {},
-  "arrow_config": null,
-  "auto_mapping": null,
-  "base_model_name_or_path": "mistralai/Mistral-Nemo-Instruct-2407",
-  "bias": "none",
-  "corda_config": null,
-  "ensure_weight_tying": false,
-  "eva_config": null,
-  "exclude_modules": null,
-  "fan_in_fan_out": false,
-  "inference_mode": true,
-  "init_lora_weights": true,
-  "layer_replication": null,
-  "layers_pattern": null,
-  "layers_to_transform": null,
-  "loftq_config": {},
-  "lora_alpha": 32,
-  "lora_bias": false,
-  "lora_dropout": 0.05,
-  "megatron_config": null,
-  "megatron_core": "megatron.core",
-  "modules_to_save": null,
-  "peft_type": "LORA",
-  "peft_version": "0.18.1",
-  "qalora_group_size": 16,
-  "r": 16,
-  "rank_pattern": {},
-  "revision": null,
-  "target_modules": [
-    "up_proj",
-    "q_proj",
-    "down_proj",
-    "o_proj",
-    "v_proj",
-    "gate_proj",
-    "k_proj"
-  ],
-  "target_parameters": null,
-  "task_type": "CAUSAL_LM",
-  "trainable_token_indices": null,
-  "use_dora": false,
-  "use_qalora": false,
-  "use_rslora": false
-}

last-checkpoint/adapter_model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:cd11b39803251198dcb7e030bb69c10b05cece6a9e45160afcc921794cb790cc
-size 228140600

last-checkpoint/chat_template.jinja DELETED Viewed

@@ -1,87 +0,0 @@
-{%- if messages[0]["role"] == "system" %}
-    {%- set system_message = messages[0]["content"] %}
-    {%- set loop_messages = messages[1:] %}
-{%- else %}
-    {%- set loop_messages = messages %}
-{%- endif %}
-{%- if not tools is defined %}
-    {%- set tools = none %}
-{%- endif %}
-{%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}
-{#- This block checks for alternating user/assistant messages, skipping tool calling messages #}
-{%- set ns = namespace() %}
-{%- set ns.index = 0 %}
-{%- for message in loop_messages %}
-    {%- if not (message.role == "tool" or message.role == "tool_results" or (message.tool_calls is defined and message.tool_calls is not none)) %}
-        {%- if (message["role"] == "user") != (ns.index % 2 == 0) %}
-            {{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
-        {%- endif %}
-        {%- set ns.index = ns.index + 1 %}
-    {%- endif %}
-{%- endfor %}
-{{- bos_token }}
-{%- for message in loop_messages %}
-    {%- if message["role"] == "user" %}
-        {%- if tools is not none and (message == user_messages[-1]) %}
-            {{- "[AVAILABLE_TOOLS][" }}
-            {%- for tool in tools %}
-                {%- set tool = tool.function %}
-                {{- '{"type": "function", "function": {' }}
-                {%- for key, val in tool.items() if key != "return" %}
-                    {%- if val is string %}
-                        {{- '"' + key + '": "' + val + '"' }}
-                    {%- else %}
-                        {{- '"' + key + '": ' + val|tojson }}
-                    {%- endif %}
-                    {%- if not loop.last %}
-                        {{- ", " }}
-                    {%- endif %}
-                {%- endfor %}
-                {{- "}}" }}
-                {%- if not loop.last %}
-                    {{- ", " }}
-                {%- else %}
-                    {{- "]" }}
-                {%- endif %}
-            {%- endfor %}
-            {{- "[/AVAILABLE_TOOLS]" }}
-            {%- endif %}
-        {%- if loop.last and system_message is defined %}
-            {{- "[INST]" + system_message + "\n\n" + message["content"] + "[/INST]" }}
-        {%- else %}
-            {{- "[INST]" + message["content"] + "[/INST]" }}
-        {%- endif %}
-    {%- elif (message.tool_calls is defined and message.tool_calls is not none) %}
-        {{- "[TOOL_CALLS][" }}
-        {%- for tool_call in message.tool_calls %}
-            {%- set out = tool_call.function|tojson %}
-            {{- out[:-1] }}
-            {%- if not tool_call.id is defined or tool_call.id|length != 9 %}
-                {{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
-            {%- endif %}
-            {{- ', "id": "' + tool_call.id + '"}' }}
-            {%- if not loop.last %}
-                {{- ", " }}
-            {%- else %}
-                {{- "]" + eos_token }}
-            {%- endif %}
-        {%- endfor %}
-    {%- elif message["role"] == "assistant" %}
-        {{- message["content"] + eos_token}}
-    {%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
-        {%- if message.content is defined and message.content.content is defined %}
-            {%- set content = message.content.content %}
-        {%- else %}
-            {%- set content = message.content %}
-        {%- endif %}
-        {{- '[TOOL_RESULTS]{"content": ' + content|string + ", " }}
-        {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}
-            {{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
-        {%- endif %}
-        {{- '"call_id": "' + message.tool_call_id + '"}[/TOOL_RESULTS]' }}
-    {%- else %}
-        {{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
-    {%- endif %}
-{%- endfor %}

last-checkpoint/optimizer.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:2acc6b93233f66c6ddb8b195904fe7cd974047004ffcd02f1d993e85ebc0a677
-size 116484839

last-checkpoint/rng_state.pth DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:b7883d803ebcafeb5684e5f2bcceb39f2a54258143c0c4972785bf0a17a36dc8
-size 14645

last-checkpoint/scaler.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:7e188a4cd7f588ff088ff68a7d9c18ed5ca570c5b11d6790654dcb4e3accb81e
-size 1383

last-checkpoint/scheduler.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:08f9e08af1aa8eb785ad1df11d9714b6c859fed11b125506168e50ec9ce7af28
-size 1465

last-checkpoint/special_tokens_map.json DELETED Viewed

@@ -1,24 +0,0 @@
-{
-  "bos_token": {
-    "content": "<s>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "eos_token": {
-    "content": "</s>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": "<unk>",
-  "unk_token": {
-    "content": "<unk>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  }
-}

last-checkpoint/tokenizer.json DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:b0240ce510f08e6c2041724e9043e33be9d251d1e4a4d94eb68cd47b954b61d2
-size 17078292

last-checkpoint/tokenizer_config.json DELETED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/trainer_state.json DELETED Viewed

@@ -1,839 +0,0 @@
-{
-  "best_global_step": 750,
-  "best_metric": 0.5089643597602844,
-  "best_model_checkpoint": "./adapter-phase1/checkpoint-750",
-  "epoch": 1.2,
-  "eval_steps": 150,
-  "global_step": 750,
-  "is_hyper_param_search": false,
-  "is_local_process_zero": true,
-  "is_world_process_zero": true,
-  "log_history": [
-    {
-      "entropy": 0.5730658559128642,
-      "epoch": 0.016,
-      "grad_norm": 0.7699093818664551,
-      "learning_rate": 9.9744e-05,
-      "loss": 0.5612,
-      "mean_token_accuracy": 0.8584013111889363,
-      "num_tokens": 41645.0,
-      "step": 10
-    },
-    {
-      "entropy": 0.5554142463952303,
-      "epoch": 0.032,
-      "grad_norm": 0.4457343518733978,
-      "learning_rate": 9.9424e-05,
-      "loss": 0.5273,
-      "mean_token_accuracy": 0.8591317173093558,
-      "num_tokens": 70428.0,
-      "step": 20
-    },
-    {
-      "entropy": 0.5915529232472181,
-      "epoch": 0.048,
-      "grad_norm": 0.4764552116394043,
-      "learning_rate": 9.9104e-05,
-      "loss": 0.571,
-      "mean_token_accuracy": 0.8499542407691478,
-      "num_tokens": 93459.0,
-      "step": 30
-    },
-    {
-      "entropy": 0.6596924969926476,
-      "epoch": 0.064,
-      "grad_norm": 0.5651283264160156,
-      "learning_rate": 9.878400000000001e-05,
-      "loss": 0.6238,
-      "mean_token_accuracy": 0.8377377673983574,
-      "num_tokens": 111867.0,
-      "step": 40
-    },
-    {
-      "entropy": 0.7530947024002671,
-      "epoch": 0.08,
-      "grad_norm": 1.0816445350646973,
-      "learning_rate": 9.8464e-05,
-      "loss": 0.7105,
-      "mean_token_accuracy": 0.8222946774214506,
-      "num_tokens": 124704.0,
-      "step": 50
-    },
-    {
-      "entropy": 0.4857567956671119,
-      "epoch": 0.096,
-      "grad_norm": 0.6103881001472473,
-      "learning_rate": 9.8144e-05,
-      "loss": 0.4722,
-      "mean_token_accuracy": 0.8663083262741565,
-      "num_tokens": 164155.0,
-      "step": 60
-    },
-    {
-      "entropy": 0.49499469716101885,
-      "epoch": 0.112,
-      "grad_norm": 0.4260229170322418,
-      "learning_rate": 9.7824e-05,
-      "loss": 0.4937,
-      "mean_token_accuracy": 0.8636182025074959,
-      "num_tokens": 191809.0,
-      "step": 70
-    },
-    {
-      "entropy": 0.5589337946847082,
-      "epoch": 0.128,
-      "grad_norm": 0.5363767147064209,
-      "learning_rate": 9.750400000000001e-05,
-      "loss": 0.5229,
-      "mean_token_accuracy": 0.8587806306779384,
-      "num_tokens": 214088.0,
-      "step": 80
-    },
-    {
-      "entropy": 0.6311248591169715,
-      "epoch": 0.144,
-      "grad_norm": 0.5147454142570496,
-      "learning_rate": 9.718400000000001e-05,
-      "loss": 0.5924,
-      "mean_token_accuracy": 0.8405886068940163,
-      "num_tokens": 231940.0,
-      "step": 90
-    },
-    {
-      "entropy": 0.7393803182989359,
-      "epoch": 0.16,
-      "grad_norm": 0.8111797571182251,
-      "learning_rate": 9.6864e-05,
-      "loss": 0.6967,
-      "mean_token_accuracy": 0.8172304671257734,
-      "num_tokens": 244562.0,
-      "step": 100
-    },
-    {
-      "entropy": 0.4579536892473698,
-      "epoch": 0.176,
-      "grad_norm": 0.3394778072834015,
-      "learning_rate": 9.6544e-05,
-      "loss": 0.447,
-      "mean_token_accuracy": 0.8702179156243801,
-      "num_tokens": 284741.0,
-      "step": 110
-    },
-    {
-      "entropy": 0.49405701775103805,
-      "epoch": 0.192,
-      "grad_norm": 0.4252355694770813,
-      "learning_rate": 9.6224e-05,
-      "loss": 0.4859,
-      "mean_token_accuracy": 0.8623054280877114,
-      "num_tokens": 313346.0,
-      "step": 120
-    },
-    {
-      "entropy": 0.5604451755061746,
-      "epoch": 0.208,
-      "grad_norm": 0.4283740222454071,
-      "learning_rate": 9.590400000000001e-05,
-      "loss": 0.5192,
-      "mean_token_accuracy": 0.854694677516818,
-      "num_tokens": 336313.0,
-      "step": 130
-    },
-    {
-      "entropy": 0.5832516210153699,
-      "epoch": 0.224,
-      "grad_norm": 0.5574604272842407,
-      "learning_rate": 9.558400000000001e-05,
-      "loss": 0.5639,
-      "mean_token_accuracy": 0.8488325711339713,
-      "num_tokens": 354930.0,
-      "step": 140
-    },
-    {
-      "entropy": 0.7139101274311542,
-      "epoch": 0.24,
-      "grad_norm": 0.9666448831558228,
-      "learning_rate": 9.526400000000001e-05,
-      "loss": 0.6833,
-      "mean_token_accuracy": 0.8205961517989635,
-      "num_tokens": 367969.0,
-      "step": 150
-    },
-    {
-      "epoch": 0.24,
-      "eval_entropy": 0.5783390111327171,
-      "eval_loss": 0.571293830871582,
-      "eval_mean_token_accuracy": 0.8406577571630478,
-      "eval_num_tokens": 367969.0,
-      "eval_runtime": 949.8374,
-      "eval_samples_per_second": 2.106,
-      "eval_steps_per_second": 0.526,
-      "step": 150
-    },
-    {
-      "entropy": 0.4597647111862898,
-      "epoch": 0.256,
-      "grad_norm": 0.3567339777946472,
-      "learning_rate": 9.4944e-05,
-      "loss": 0.4466,
-      "mean_token_accuracy": 0.8714417025446892,
-      "num_tokens": 408832.0,
-      "step": 160
-    },
-    {
-      "entropy": 0.5107372861355544,
-      "epoch": 0.272,
-      "grad_norm": 0.365077406167984,
-      "learning_rate": 9.462400000000001e-05,
-      "loss": 0.4913,
-      "mean_token_accuracy": 0.8607851468026638,
-      "num_tokens": 437025.0,
-      "step": 170
-    },
-    {
-      "entropy": 0.549993178807199,
-      "epoch": 0.288,
-      "grad_norm": 0.921683669090271,
-      "learning_rate": 9.4304e-05,
-      "loss": 0.5275,
-      "mean_token_accuracy": 0.8546455435454845,
-      "num_tokens": 459640.0,
-      "step": 180
-    },
-    {
-      "entropy": 0.5974416058510542,
-      "epoch": 0.304,
-      "grad_norm": 0.5437716245651245,
-      "learning_rate": 9.3984e-05,
-      "loss": 0.5801,
-      "mean_token_accuracy": 0.8437417767941952,
-      "num_tokens": 477698.0,
-      "step": 190
-    },
-    {
-      "entropy": 0.7215298766270279,
-      "epoch": 0.32,
-      "grad_norm": 0.8138436079025269,
-      "learning_rate": 9.3664e-05,
-      "loss": 0.6601,
-      "mean_token_accuracy": 0.823812685161829,
-      "num_tokens": 490139.0,
-      "step": 200
-    },
-    {
-      "entropy": 0.41703027142211796,
-      "epoch": 0.336,
-      "grad_norm": 0.3584943115711212,
-      "learning_rate": 9.3344e-05,
-      "loss": 0.4339,
-      "mean_token_accuracy": 0.8721328835934401,
-      "num_tokens": 529888.0,
-      "step": 210
-    },
-    {
-      "entropy": 0.49080247841775415,
-      "epoch": 0.352,
-      "grad_norm": 0.384344220161438,
-      "learning_rate": 9.3024e-05,
-      "loss": 0.4712,
-      "mean_token_accuracy": 0.8645332992076874,
-      "num_tokens": 558603.0,
-      "step": 220
-    },
-    {
-      "entropy": 0.5349763100966811,
-      "epoch": 0.368,
-      "grad_norm": 0.45451048016548157,
-      "learning_rate": 9.2704e-05,
-      "loss": 0.5037,
-      "mean_token_accuracy": 0.8568188078701496,
-      "num_tokens": 581795.0,
-      "step": 230
-    },
-    {
-      "entropy": 0.5738583486527205,
-      "epoch": 0.384,
-      "grad_norm": 0.5654199123382568,
-      "learning_rate": 9.2384e-05,
-      "loss": 0.5418,
-      "mean_token_accuracy": 0.853222968429327,
-      "num_tokens": 600644.0,
-      "step": 240
-    },
-    {
-      "entropy": 0.7203033225610852,
-      "epoch": 0.4,
-      "grad_norm": 0.7510268688201904,
-      "learning_rate": 9.2064e-05,
-      "loss": 0.6766,
-      "mean_token_accuracy": 0.8211973272264004,
-      "num_tokens": 613463.0,
-      "step": 250
-    },
-    {
-      "entropy": 0.44393815845251083,
-      "epoch": 0.416,
-      "grad_norm": 0.28410282731056213,
-      "learning_rate": 9.174400000000001e-05,
-      "loss": 0.44,
-      "mean_token_accuracy": 0.8719362560659647,
-      "num_tokens": 653916.0,
-      "step": 260
-    },
-    {
-      "entropy": 0.49695795457810166,
-      "epoch": 0.432,
-      "grad_norm": 0.37558332085609436,
-      "learning_rate": 9.142400000000001e-05,
-      "loss": 0.4716,
-      "mean_token_accuracy": 0.8661885127425194,
-      "num_tokens": 682316.0,
-      "step": 270
-    },
-    {
-      "entropy": 0.5323772713541984,
-      "epoch": 0.448,
-      "grad_norm": 0.38234350085258484,
-      "learning_rate": 9.1104e-05,
-      "loss": 0.5153,
-      "mean_token_accuracy": 0.8568883746862411,
-      "num_tokens": 705595.0,
-      "step": 280
-    },
-    {
-      "entropy": 0.5631753627210856,
-      "epoch": 0.464,
-      "grad_norm": 0.5723901391029358,
-      "learning_rate": 9.0784e-05,
-      "loss": 0.5255,
-      "mean_token_accuracy": 0.8554483871906996,
-      "num_tokens": 724261.0,
-      "step": 290
-    },
-    {
-      "entropy": 0.7213884828612208,
-      "epoch": 0.48,
-      "grad_norm": 0.8089677095413208,
-      "learning_rate": 9.0464e-05,
-      "loss": 0.6531,
-      "mean_token_accuracy": 0.8282416380941868,
-      "num_tokens": 736888.0,
-      "step": 300
-    },
-    {
-      "epoch": 0.48,
-      "eval_entropy": 0.5434884746074676,
-      "eval_loss": 0.5595113635063171,
-      "eval_mean_token_accuracy": 0.8425255041122437,
-      "eval_num_tokens": 736888.0,
-      "eval_runtime": 960.1391,
-      "eval_samples_per_second": 2.083,
-      "eval_steps_per_second": 0.521,
-      "step": 300
-    },
-    {
-      "entropy": 0.4070850105956197,
-      "epoch": 0.496,
-      "grad_norm": 0.3164316713809967,
-      "learning_rate": 9.014400000000001e-05,
-      "loss": 0.4347,
-      "mean_token_accuracy": 0.8718821201473475,
-      "num_tokens": 777614.0,
-      "step": 310
-    },
-    {
-      "entropy": 0.4795668950304389,
-      "epoch": 0.512,
-      "grad_norm": 0.36268115043640137,
-      "learning_rate": 8.982400000000001e-05,
-      "loss": 0.4528,
-      "mean_token_accuracy": 0.8704403065145015,
-      "num_tokens": 806367.0,
-      "step": 320
-    },
-    {
-      "entropy": 0.5111968521028757,
-      "epoch": 0.528,
-      "grad_norm": 0.46226978302001953,
-      "learning_rate": 8.9504e-05,
-      "loss": 0.493,
-      "mean_token_accuracy": 0.8627706792205572,
-      "num_tokens": 829451.0,
-      "step": 330
-    },
-    {
-      "entropy": 0.5876390922814607,
-      "epoch": 0.544,
-      "grad_norm": 0.5480667352676392,
-      "learning_rate": 8.9184e-05,
-      "loss": 0.553,
-      "mean_token_accuracy": 0.8471334714442491,
-      "num_tokens": 848096.0,
-      "step": 340
-    },
-    {
-      "entropy": 0.6881475947797299,
-      "epoch": 0.56,
-      "grad_norm": 0.7945592403411865,
-      "learning_rate": 8.886400000000001e-05,
-      "loss": 0.6313,
-      "mean_token_accuracy": 0.8284968301653862,
-      "num_tokens": 861213.0,
-      "step": 350
-    },
-    {
-      "entropy": 0.41089317156001925,
-      "epoch": 0.576,
-      "grad_norm": 0.42293551564216614,
-      "learning_rate": 8.854400000000001e-05,
-      "loss": 0.4347,
-      "mean_token_accuracy": 0.872112188860774,
-      "num_tokens": 902600.0,
-      "step": 360
-    },
-    {
-      "entropy": 0.4904096835292876,
-      "epoch": 0.592,
-      "grad_norm": 0.36379531025886536,
-      "learning_rate": 8.8224e-05,
-      "loss": 0.4607,
-      "mean_token_accuracy": 0.8698205541819334,
-      "num_tokens": 931252.0,
-      "step": 370
-    },
-    {
-      "entropy": 0.5374009614810348,
-      "epoch": 0.608,
-      "grad_norm": 0.39369404315948486,
-      "learning_rate": 8.7904e-05,
-      "loss": 0.5105,
-      "mean_token_accuracy": 0.8589953418821097,
-      "num_tokens": 954440.0,
-      "step": 380
-    },
-    {
-      "entropy": 0.5679504619911313,
-      "epoch": 0.624,
-      "grad_norm": 0.4653433859348297,
-      "learning_rate": 8.7584e-05,
-      "loss": 0.5375,
-      "mean_token_accuracy": 0.8520914990454912,
-      "num_tokens": 973561.0,
-      "step": 390
-    },
-    {
-      "entropy": 0.6884654611349106,
-      "epoch": 0.64,
-      "grad_norm": 0.75364750623703,
-      "learning_rate": 8.7264e-05,
-      "loss": 0.6424,
-      "mean_token_accuracy": 0.8248051077127456,
-      "num_tokens": 986663.0,
-      "step": 400
-    },
-    {
-      "entropy": 0.4144328683614731,
-      "epoch": 0.656,
-      "grad_norm": 0.3373418152332306,
-      "learning_rate": 8.6944e-05,
-      "loss": 0.4187,
-      "mean_token_accuracy": 0.8739797297865153,
-      "num_tokens": 1027401.0,
-      "step": 410
-    },
-    {
-      "entropy": 0.4791171012446284,
-      "epoch": 0.672,
-      "grad_norm": 0.3650929033756256,
-      "learning_rate": 8.6624e-05,
-      "loss": 0.4467,
-      "mean_token_accuracy": 0.8709899630397558,
-      "num_tokens": 1056257.0,
-      "step": 420
-    },
-    {
-      "entropy": 0.5098350465297699,
-      "epoch": 0.688,
-      "grad_norm": 0.3901676833629608,
-      "learning_rate": 8.6304e-05,
-      "loss": 0.4884,
-      "mean_token_accuracy": 0.8597885746508837,
-      "num_tokens": 1079455.0,
-      "step": 430
-    },
-    {
-      "entropy": 0.5697799691930413,
-      "epoch": 0.704,
-      "grad_norm": 0.4802677631378174,
-      "learning_rate": 8.598400000000001e-05,
-      "loss": 0.5419,
-      "mean_token_accuracy": 0.8520946379750967,
-      "num_tokens": 1098043.0,
-      "step": 440
-    },
-    {
-      "entropy": 0.6666812863200903,
-      "epoch": 0.72,
-      "grad_norm": 0.683847188949585,
-      "learning_rate": 8.5664e-05,
-      "loss": 0.6184,
-      "mean_token_accuracy": 0.8354901738464833,
-      "num_tokens": 1111075.0,
-      "step": 450
-    },
-    {
-      "epoch": 0.72,
-      "eval_entropy": 0.5324991160035133,
-      "eval_loss": 0.528014600276947,
-      "eval_mean_token_accuracy": 0.8489134066104889,
-      "eval_num_tokens": 1111075.0,
-      "eval_runtime": 945.2593,
-      "eval_samples_per_second": 2.116,
-      "eval_steps_per_second": 0.529,
-      "step": 450
-    },
-    {
-      "entropy": 0.42183027137070894,
-      "epoch": 0.736,
-      "grad_norm": 0.30442023277282715,
-      "learning_rate": 8.5344e-05,
-      "loss": 0.4258,
-      "mean_token_accuracy": 0.8741536162793636,
-      "num_tokens": 1151218.0,
-      "step": 460
-    },
-    {
-      "entropy": 0.48677743785083294,
-      "epoch": 0.752,
-      "grad_norm": 0.3515753746032715,
-      "learning_rate": 8.5024e-05,
-      "loss": 0.4589,
-      "mean_token_accuracy": 0.8665938679128885,
-      "num_tokens": 1179788.0,
-      "step": 470
-    },
-    {
-      "entropy": 0.5252734636887908,
-      "epoch": 0.768,
-      "grad_norm": 0.396014004945755,
-      "learning_rate": 8.470400000000001e-05,
-      "loss": 0.5021,
-      "mean_token_accuracy": 0.8596790559589863,
-      "num_tokens": 1202980.0,
-      "step": 480
-    },
-    {
-      "entropy": 0.5485310789197684,
-      "epoch": 0.784,
-      "grad_norm": 0.4635639488697052,
-      "learning_rate": 8.438400000000001e-05,
-      "loss": 0.5269,
-      "mean_token_accuracy": 0.8574780374765396,
-      "num_tokens": 1221687.0,
-      "step": 490
-    },
-    {
-      "entropy": 0.6833245273679495,
-      "epoch": 0.8,
-      "grad_norm": 0.700869619846344,
-      "learning_rate": 8.406400000000001e-05,
-      "loss": 0.6387,
-      "mean_token_accuracy": 0.8237500067800283,
-      "num_tokens": 1234514.0,
-      "step": 500
-    },
-    {
-      "entropy": 0.40601076018065213,
-      "epoch": 0.816,
-      "grad_norm": 0.32708939909935,
-      "learning_rate": 8.3744e-05,
-      "loss": 0.408,
-      "mean_token_accuracy": 0.8766346644610167,
-      "num_tokens": 1275596.0,
-      "step": 510
-    },
-    {
-      "entropy": 0.4850410340353847,
-      "epoch": 0.832,
-      "grad_norm": 0.3244037926197052,
-      "learning_rate": 8.3424e-05,
-      "loss": 0.4573,
-      "mean_token_accuracy": 0.8669606879353523,
-      "num_tokens": 1304145.0,
-      "step": 520
-    },
-    {
-      "entropy": 0.515147590637207,
-      "epoch": 0.848,
-      "grad_norm": 0.38329896330833435,
-      "learning_rate": 8.310400000000001e-05,
-      "loss": 0.4863,
-      "mean_token_accuracy": 0.8632290612906217,
-      "num_tokens": 1326982.0,
-      "step": 530
-    },
-    {
-      "entropy": 0.5729153681546449,
-      "epoch": 0.864,
-      "grad_norm": 0.4813045263290405,
-      "learning_rate": 8.278400000000001e-05,
-      "loss": 0.5479,
-      "mean_token_accuracy": 0.847094003856182,
-      "num_tokens": 1345175.0,
-      "step": 540
-    },
-    {
-      "entropy": 0.6926866250112653,
-      "epoch": 0.88,
-      "grad_norm": 0.8769316673278809,
-      "learning_rate": 8.2464e-05,
-      "loss": 0.6399,
-      "mean_token_accuracy": 0.8273460377007723,
-      "num_tokens": 1357612.0,
-      "step": 550
-    },
-    {
-      "entropy": 0.43388050645589826,
-      "epoch": 0.896,
-      "grad_norm": 0.3002051115036011,
-      "learning_rate": 8.2144e-05,
-      "loss": 0.4221,
-      "mean_token_accuracy": 0.8723263714462519,
-      "num_tokens": 1395656.0,
-      "step": 560
-    },
-    {
-      "entropy": 0.47867969051003456,
-      "epoch": 0.912,
-      "grad_norm": 0.3462761640548706,
-      "learning_rate": 8.1824e-05,
-      "loss": 0.4537,
-      "mean_token_accuracy": 0.8681479040533304,
-      "num_tokens": 1423390.0,
-      "step": 570
-    },
-    {
-      "entropy": 0.5229433966800571,
-      "epoch": 0.928,
-      "grad_norm": 0.3625573217868805,
-      "learning_rate": 8.1504e-05,
-      "loss": 0.4876,
-      "mean_token_accuracy": 0.8632072422653436,
-      "num_tokens": 1446208.0,
-      "step": 580
-    },
-    {
-      "entropy": 0.5398129008710384,
-      "epoch": 0.944,
-      "grad_norm": 0.47574466466903687,
-      "learning_rate": 8.1184e-05,
-      "loss": 0.5157,
-      "mean_token_accuracy": 0.8563049588352442,
-      "num_tokens": 1464656.0,
-      "step": 590
-    },
-    {
-      "entropy": 0.6804929848760366,
-      "epoch": 0.96,
-      "grad_norm": 0.8095116019248962,
-      "learning_rate": 8.0864e-05,
-      "loss": 0.6355,
-      "mean_token_accuracy": 0.8217808835208416,
-      "num_tokens": 1477330.0,
-      "step": 600
-    },
-    {
-      "epoch": 0.96,
-      "eval_entropy": 0.5289207182526589,
-      "eval_loss": 0.5149514675140381,
-      "eval_mean_token_accuracy": 0.8529882525205612,
-      "eval_num_tokens": 1477330.0,
-      "eval_runtime": 926.8896,
-      "eval_samples_per_second": 2.158,
-      "eval_steps_per_second": 0.539,
-      "step": 600
-    },
-    {
-      "entropy": 0.4441436754539609,
-      "epoch": 0.976,
-      "grad_norm": 0.3561602830886841,
-      "learning_rate": 8.0544e-05,
-      "loss": 0.4314,
-      "mean_token_accuracy": 0.8683189991861582,
-      "num_tokens": 1511581.0,
-      "step": 610
-    },
-    {
-      "entropy": 0.5468204509466886,
-      "epoch": 0.992,
-      "grad_norm": 0.4606572985649109,
-      "learning_rate": 8.0224e-05,
-      "loss": 0.5182,
-      "mean_token_accuracy": 0.8496101208031177,
-      "num_tokens": 1531489.0,
-      "step": 620
-    },
-    {
-      "entropy": 0.5519248092547059,
-      "epoch": 1.008,
-      "grad_norm": 0.2525905966758728,
-      "learning_rate": 7.9904e-05,
-      "loss": 0.5057,
-      "mean_token_accuracy": 0.8535349532961846,
-      "num_tokens": 1562607.0,
-      "step": 630
-    },
-    {
-      "entropy": 0.40866237645968795,
-      "epoch": 1.024,
-      "grad_norm": 0.3288176357746124,
-      "learning_rate": 7.9584e-05,
-      "loss": 0.3852,
-      "mean_token_accuracy": 0.8830868355929852,
-      "num_tokens": 1596809.0,
-      "step": 640
-    },
-    {
-      "entropy": 0.45947086848318575,
-      "epoch": 1.04,
-      "grad_norm": 0.3615362048149109,
-      "learning_rate": 7.9264e-05,
-      "loss": 0.427,
-      "mean_token_accuracy": 0.8752098884433508,
-      "num_tokens": 1623093.0,
-      "step": 650
-    },
-    {
-      "entropy": 0.48522304855287074,
-      "epoch": 1.056,
-      "grad_norm": 0.6554524898529053,
-      "learning_rate": 7.894400000000001e-05,
-      "loss": 0.4329,
-      "mean_token_accuracy": 0.8734084539115429,
-      "num_tokens": 1644709.0,
-      "step": 660
-    },
-    {
-      "entropy": 0.49721850007772445,
-      "epoch": 1.072,
-      "grad_norm": 0.5433140397071838,
-      "learning_rate": 7.862400000000001e-05,
-      "loss": 0.4597,
-      "mean_token_accuracy": 0.8716968420892954,
-      "num_tokens": 1661390.0,
-      "step": 670
-    },
-    {
-      "entropy": 0.48405226683244107,
-      "epoch": 1.088,
-      "grad_norm": 0.3505966365337372,
-      "learning_rate": 7.8304e-05,
-      "loss": 0.462,
-      "mean_token_accuracy": 0.8606270607560873,
-      "num_tokens": 1689962.0,
-      "step": 680
-    },
-    {
-      "entropy": 0.4111426337622106,
-      "epoch": 1.104,
-      "grad_norm": 0.3443625271320343,
-      "learning_rate": 7.7984e-05,
-      "loss": 0.379,
-      "mean_token_accuracy": 0.8858214061707258,
-      "num_tokens": 1723548.0,
-      "step": 690
-    },
-    {
-      "entropy": 0.45519505683332684,
-      "epoch": 1.12,
-      "grad_norm": 0.43360981345176697,
-      "learning_rate": 7.766400000000001e-05,
-      "loss": 0.417,
-      "mean_token_accuracy": 0.8762232139706612,
-      "num_tokens": 1749751.0,
-      "step": 700
-    },
-    {
-      "entropy": 0.47432731073349715,
-      "epoch": 1.1360000000000001,
-      "grad_norm": 0.48396071791648865,
-      "learning_rate": 7.734400000000001e-05,
-      "loss": 0.4423,
-      "mean_token_accuracy": 0.8694507408887148,
-      "num_tokens": 1770706.0,
-      "step": 710
-    },
-    {
-      "entropy": 0.5454745849594473,
-      "epoch": 1.152,
-      "grad_norm": 0.6095965504646301,
-      "learning_rate": 7.702400000000001e-05,
-      "loss": 0.491,
-      "mean_token_accuracy": 0.8575698807835579,
-      "num_tokens": 1786806.0,
-      "step": 720
-    },
-    {
-      "entropy": 0.4902514209970832,
-      "epoch": 1.168,
-      "grad_norm": 0.3669983148574829,
-      "learning_rate": 7.6704e-05,
-      "loss": 0.4784,
-      "mean_token_accuracy": 0.8593317933380604,
-      "num_tokens": 1814214.0,
-      "step": 730
-    },
-    {
-      "entropy": 0.42295527271926403,
-      "epoch": 1.184,
-      "grad_norm": 0.37635281682014465,
-      "learning_rate": 7.6384e-05,
-      "loss": 0.4006,
-      "mean_token_accuracy": 0.8772692024707794,
-      "num_tokens": 1845867.0,
-      "step": 740
-    },
-    {
-      "entropy": 0.45570146273821593,
-      "epoch": 1.2,
-      "grad_norm": 0.4157540798187256,
-      "learning_rate": 7.6064e-05,
-      "loss": 0.4131,
-      "mean_token_accuracy": 0.8780728686600924,
-      "num_tokens": 1870856.0,
-      "step": 750
-    },
-    {
-      "epoch": 1.2,
-      "eval_entropy": 0.4963426645994186,
-      "eval_loss": 0.5089643597602844,
-      "eval_mean_token_accuracy": 0.8573515907526016,
-      "eval_num_tokens": 1870856.0,
-      "eval_runtime": 949.8047,
-      "eval_samples_per_second": 2.106,
-      "eval_steps_per_second": 0.526,
-      "step": 750
-    }
-  ],
-  "logging_steps": 10,
-  "max_steps": 3125,
-  "num_input_tokens_seen": 0,
-  "num_train_epochs": 5,
-  "save_steps": 150,
-  "stateful_callbacks": {
-    "TrainerControl": {
-      "args": {
-        "should_epoch_stop": false,
-        "should_evaluate": false,
-        "should_log": false,
-        "should_save": true,
-        "should_training_stop": false
-      },
-      "attributes": {}
-    }
-  },
-  "total_flos": 1.3058997783257088e+17,
-  "train_batch_size": 1,
-  "trial_name": null,
-  "trial_params": null
-}

last-checkpoint/training_args.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:d19add453be896fb8010267a01d849597b52aecb53969dce6ab3000e56f1b7d0
-size 6353