Akash-nath29 commited on Dec 27, 2025

Commit

b001d4d

verified ·

1 Parent(s): 070c045

Upload folder using huggingface_hub

Browse files

Files changed (22) hide show

README.md +207 -0
adapter_config.json +41 -0
adapter_model.safetensors +3 -0
chat_template.jinja +15 -0
checkpoint-10616/README.md +207 -0
checkpoint-10616/adapter_config.json +41 -0
checkpoint-10616/adapter_model.safetensors +3 -0
checkpoint-10616/chat_template.jinja +15 -0
checkpoint-10616/optimizer.pt +3 -0
checkpoint-10616/rng_state.pth +3 -0
checkpoint-10616/scaler.pt +3 -0
checkpoint-10616/scheduler.pt +3 -0
checkpoint-10616/special_tokens_map.json +24 -0
checkpoint-10616/tokenizer.json +0 -0
checkpoint-10616/tokenizer.model +3 -0
checkpoint-10616/tokenizer_config.json +43 -0
checkpoint-10616/trainer_state.json +1518 -0
checkpoint-10616/training_args.bin +3 -0
special_tokens_map.json +24 -0
tokenizer.json +0 -0
tokenizer.model +3 -0
tokenizer_config.json +43 -0

README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:TinyLlama/TinyLlama-1.1B-Chat-v1.0
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:21a9f6bf2dabea33f9e8eac94e6e5a8e6d9f887636d9e57d91ae7bd7751c4443
+size 9022864

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,15 @@

+{% for message in messages %}
+{% if message['role'] == 'user' %}
+{{ '<|user|>
+' + message['content'] + eos_token }}
+{% elif message['role'] == 'system' %}
+{{ '<|system|>
+' + message['content'] + eos_token }}
+{% elif message['role'] == 'assistant' %}
+{{ '<|assistant|>
+'  + message['content'] + eos_token }}
+{% endif %}
+{% if loop.last and add_generation_prompt %}
+{{ '<|assistant|>' }}
+{% endif %}
+{% endfor %}

checkpoint-10616/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:TinyLlama/TinyLlama-1.1B-Chat-v1.0
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

checkpoint-10616/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-10616/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:21a9f6bf2dabea33f9e8eac94e6e5a8e6d9f887636d9e57d91ae7bd7751c4443
+size 9022864

checkpoint-10616/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,15 @@

+{% for message in messages %}
+{% if message['role'] == 'user' %}
+{{ '<|user|>
+' + message['content'] + eos_token }}
+{% elif message['role'] == 'system' %}
+{{ '<|system|>
+' + message['content'] + eos_token }}
+{% elif message['role'] == 'assistant' %}
+{{ '<|assistant|>
+'  + message['content'] + eos_token }}
+{% endif %}
+{% if loop.last and add_generation_prompt %}
+{{ '<|assistant|>' }}
+{% endif %}
+{% endfor %}

checkpoint-10616/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:816ac19c4ac86e3db244a4228bf151a03dd1e609465d12af8b2166c8046451dc
+size 18096570

checkpoint-10616/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88f36c0e2460424b1be2fa8226623d48a79dc8e549edf264739448b05ab8fb2a
+size 14244

checkpoint-10616/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7ffccc23a120d376e0f2f995565bbda5433f522190fb429bd8d279fd31543d60
+size 988

checkpoint-10616/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:93034c30478e4d2ca305ba922054bbeb5b6cad5d864d01446e3cb89e4b0ad65b
+size 1064

checkpoint-10616/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "</s>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-10616/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-10616/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

checkpoint-10616/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "legacy": false,
+  "model_max_length": 2048,
+  "pad_token": "</s>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

checkpoint-10616/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1518 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 2.0,
+  "eval_steps": 500,
+  "global_step": 10616,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.009419743782969104,
+      "grad_norm": 4.3123979568481445,
+      "learning_rate": 2.884012539184953e-05,
+      "loss": 4.9378,
+      "step": 50
+    },
+    {
+      "epoch": 0.018839487565938208,
+      "grad_norm": 6.848845958709717,
+      "learning_rate": 5.956112852664577e-05,
+      "loss": 4.564,
+      "step": 100
+    },
+    {
+      "epoch": 0.02825923134890731,
+      "grad_norm": 8.258468627929688,
+      "learning_rate": 9.090909090909092e-05,
+      "loss": 3.9565,
+      "step": 150
+    },
+    {
+      "epoch": 0.037678975131876416,
+      "grad_norm": 6.301790714263916,
+      "learning_rate": 0.00012225705329153605,
+      "loss": 3.6785,
+      "step": 200
+    },
+    {
+      "epoch": 0.04709871891484552,
+      "grad_norm": 9.152833938598633,
+      "learning_rate": 0.0001536050156739812,
+      "loss": 3.6127,
+      "step": 250
+    },
+    {
+      "epoch": 0.05651846269781462,
+      "grad_norm": 7.454412460327148,
+      "learning_rate": 0.00018495297805642635,
+      "loss": 3.5764,
+      "step": 300
+    },
+    {
+      "epoch": 0.06593820648078372,
+      "grad_norm": 5.142513275146484,
+      "learning_rate": 0.0001999968537535188,
+      "loss": 3.6415,
+      "step": 350
+    },
+    {
+      "epoch": 0.07535795026375283,
+      "grad_norm": 6.578873157501221,
+      "learning_rate": 0.00019997311834207807,
+      "loss": 3.5959,
+      "step": 400
+    },
+    {
+      "epoch": 0.08477769404672193,
+      "grad_norm": 4.895726203918457,
+      "learning_rate": 0.00019992611846159642,
+      "loss": 3.5102,
+      "step": 450
+    },
+    {
+      "epoch": 0.09419743782969103,
+      "grad_norm": 5.39362907409668,
+      "learning_rate": 0.0001998558650492866,
+      "loss": 3.501,
+      "step": 500
+    },
+    {
+      "epoch": 0.10361718161266013,
+      "grad_norm": 4.40684700012207,
+      "learning_rate": 0.0001997623744536267,
+      "loss": 3.4602,
+      "step": 550
+    },
+    {
+      "epoch": 0.11303692539562923,
+      "grad_norm": 5.230076313018799,
+      "learning_rate": 0.00019964566843055576,
+      "loss": 3.5721,
+      "step": 600
+    },
+    {
+      "epoch": 0.12245666917859835,
+      "grad_norm": 5.450647354125977,
+      "learning_rate": 0.00019950577413841098,
+      "loss": 3.4872,
+      "step": 650
+    },
+    {
+      "epoch": 0.13187641296156744,
+      "grad_norm": 6.220117568969727,
+      "learning_rate": 0.00019934272413160784,
+      "loss": 3.6352,
+      "step": 700
+    },
+    {
+      "epoch": 0.14129615674453655,
+      "grad_norm": 4.26075553894043,
+      "learning_rate": 0.00019915655635306437,
+      "loss": 3.6413,
+      "step": 750
+    },
+    {
+      "epoch": 0.15071590052750566,
+      "grad_norm": 9.252241134643555,
+      "learning_rate": 0.00019894731412537162,
+      "loss": 3.4355,
+      "step": 800
+    },
+    {
+      "epoch": 0.16013564431047475,
+      "grad_norm": 3.9509174823760986,
+      "learning_rate": 0.00019871504614071213,
+      "loss": 3.444,
+      "step": 850
+    },
+    {
+      "epoch": 0.16955538809344387,
+      "grad_norm": 3.5256080627441406,
+      "learning_rate": 0.0001984598064495289,
+      "loss": 3.5399,
+      "step": 900
+    },
+    {
+      "epoch": 0.17897513187641295,
+      "grad_norm": 5.630585193634033,
+      "learning_rate": 0.0001981816544479476,
+      "loss": 3.4035,
+      "step": 950
+    },
+    {
+      "epoch": 0.18839487565938207,
+      "grad_norm": 6.200850486755371,
+      "learning_rate": 0.00019788065486395443,
+      "loss": 3.5025,
+      "step": 1000
+    },
+    {
+      "epoch": 0.19781461944235118,
+      "grad_norm": 4.016064643859863,
+      "learning_rate": 0.0001975568777423336,
+      "loss": 3.419,
+      "step": 1050
+    },
+    {
+      "epoch": 0.20723436322532027,
+      "grad_norm": 4.542566776275635,
+      "learning_rate": 0.00019721039842836744,
+      "loss": 3.4461,
+      "step": 1100
+    },
+    {
+      "epoch": 0.21665410700828938,
+      "grad_norm": 4.557881832122803,
+      "learning_rate": 0.0001968412975503029,
+      "loss": 3.5097,
+      "step": 1150
+    },
+    {
+      "epoch": 0.22607385079125847,
+      "grad_norm": 3.8412857055664062,
+      "learning_rate": 0.00019644966100058873,
+      "loss": 3.4363,
+      "step": 1200
+    },
+    {
+      "epoch": 0.23549359457422758,
+      "grad_norm": 4.408971786499023,
+      "learning_rate": 0.00019603557991588794,
+      "loss": 3.41,
+      "step": 1250
+    },
+    {
+      "epoch": 0.2449133383571967,
+      "grad_norm": 3.9113070964813232,
+      "learning_rate": 0.00019559915065586926,
+      "loss": 3.4993,
+      "step": 1300
+    },
+    {
+      "epoch": 0.2543330821401658,
+      "grad_norm": 3.450732707977295,
+      "learning_rate": 0.0001951404747807839,
+      "loss": 3.5027,
+      "step": 1350
+    },
+    {
+      "epoch": 0.26375282592313487,
+      "grad_norm": 5.115264892578125,
+      "learning_rate": 0.00019465965902783157,
+      "loss": 3.4742,
+      "step": 1400
+    },
+    {
+      "epoch": 0.273172569706104,
+      "grad_norm": 6.216214656829834,
+      "learning_rate": 0.0001941568152863219,
+      "loss": 3.4495,
+      "step": 1450
+    },
+    {
+      "epoch": 0.2825923134890731,
+      "grad_norm": 3.304917335510254,
+      "learning_rate": 0.00019363206057163718,
+      "loss": 3.5006,
+      "step": 1500
+    },
+    {
+      "epoch": 0.2920120572720422,
+      "grad_norm": 2.754880666732788,
+      "learning_rate": 0.00019308551699800185,
+      "loss": 3.5241,
+      "step": 1550
+    },
+    {
+      "epoch": 0.30143180105501133,
+      "grad_norm": 4.103003025054932,
+      "learning_rate": 0.00019251731175006577,
+      "loss": 3.3811,
+      "step": 1600
+    },
+    {
+      "epoch": 0.3108515448379804,
+      "grad_norm": 3.685828447341919,
+      "learning_rate": 0.0001919275770533073,
+      "loss": 3.4683,
+      "step": 1650
+    },
+    {
+      "epoch": 0.3202712886209495,
+      "grad_norm": 4.288295269012451,
+      "learning_rate": 0.00019131645014326362,
+      "loss": 3.4744,
+      "step": 1700
+    },
+    {
+      "epoch": 0.3296910324039186,
+      "grad_norm": 5.77810525894165,
+      "learning_rate": 0.00019068407323359485,
+      "loss": 3.4305,
+      "step": 1750
+    },
+    {
+      "epoch": 0.33911077618688773,
+      "grad_norm": 3.6046531200408936,
+      "learning_rate": 0.0001900305934829901,
+      "loss": 3.4458,
+      "step": 1800
+    },
+    {
+      "epoch": 0.34853051996985684,
+      "grad_norm": 3.6128194332122803,
+      "learning_rate": 0.0001893561629609224,
+      "loss": 3.5119,
+      "step": 1850
+    },
+    {
+      "epoch": 0.3579502637528259,
+      "grad_norm": 3.9442057609558105,
+      "learning_rate": 0.00018866093861226118,
+      "loss": 3.4399,
+      "step": 1900
+    },
+    {
+      "epoch": 0.367370007535795,
+      "grad_norm": 4.545738697052002,
+      "learning_rate": 0.00018794508222074995,
+      "loss": 3.4582,
+      "step": 1950
+    },
+    {
+      "epoch": 0.37678975131876413,
+      "grad_norm": 4.7795209884643555,
+      "learning_rate": 0.00018720876037135807,
+      "loss": 3.4424,
+      "step": 2000
+    },
+    {
+      "epoch": 0.38620949510173325,
+      "grad_norm": 4.363834857940674,
+      "learning_rate": 0.00018645214441151525,
+      "loss": 3.3851,
+      "step": 2050
+    },
+    {
+      "epoch": 0.39562923888470236,
+      "grad_norm": 3.016232967376709,
+      "learning_rate": 0.0001856754104112378,
+      "loss": 3.445,
+      "step": 2100
+    },
+    {
+      "epoch": 0.4050489826676714,
+      "grad_norm": 3.354729175567627,
+      "learning_rate": 0.00018487873912215576,
+      "loss": 3.4449,
+      "step": 2150
+    },
+    {
+      "epoch": 0.41446872645064053,
+      "grad_norm": 4.864713191986084,
+      "learning_rate": 0.0001840623159354508,
+      "loss": 3.3644,
+      "step": 2200
+    },
+    {
+      "epoch": 0.42388847023360965,
+      "grad_norm": 3.9920921325683594,
+      "learning_rate": 0.00018322633083871416,
+      "loss": 3.3918,
+      "step": 2250
+    },
+    {
+      "epoch": 0.43330821401657876,
+      "grad_norm": 3.2147321701049805,
+      "learning_rate": 0.00018237097837173546,
+      "loss": 3.4225,
+      "step": 2300
+    },
+    {
+      "epoch": 0.4427279577995479,
+      "grad_norm": 3.867025136947632,
+      "learning_rate": 0.0001814964575812316,
+      "loss": 3.392,
+      "step": 2350
+    },
+    {
+      "epoch": 0.45214770158251694,
+      "grad_norm": 3.8767242431640625,
+      "learning_rate": 0.0001806029719745273,
+      "loss": 3.4789,
+      "step": 2400
+    },
+    {
+      "epoch": 0.46156744536548605,
+      "grad_norm": 4.091196537017822,
+      "learning_rate": 0.00017969072947219736,
+      "loss": 3.454,
+      "step": 2450
+    },
+    {
+      "epoch": 0.47098718914845517,
+      "grad_norm": 3.7132067680358887,
+      "learning_rate": 0.00017875994235968222,
+      "loss": 3.4125,
+      "step": 2500
+    },
+    {
+      "epoch": 0.4804069329314243,
+      "grad_norm": 2.912659168243408,
+      "learning_rate": 0.0001778108272378874,
+      "loss": 3.2867,
+      "step": 2550
+    },
+    {
+      "epoch": 0.4898266767143934,
+      "grad_norm": 3.0867087841033936,
+      "learning_rate": 0.00017684360497277905,
+      "loss": 3.2998,
+      "step": 2600
+    },
+    {
+      "epoch": 0.49924642049736245,
+      "grad_norm": 4.568729400634766,
+      "learning_rate": 0.00017585850064398664,
+      "loss": 3.3506,
+      "step": 2650
+    },
+    {
+      "epoch": 0.5086661642803316,
+      "grad_norm": 3.520575761795044,
+      "learning_rate": 0.0001748557434924256,
+      "loss": 3.4694,
+      "step": 2700
+    },
+    {
+      "epoch": 0.5180859080633007,
+      "grad_norm": 3.2413041591644287,
+      "learning_rate": 0.00017383556686695098,
+      "loss": 3.4594,
+      "step": 2750
+    },
+    {
+      "epoch": 0.5275056518462697,
+      "grad_norm": 3.6528191566467285,
+      "learning_rate": 0.00017279820817005579,
+      "loss": 3.3906,
+      "step": 2800
+    },
+    {
+      "epoch": 0.5369253956292389,
+      "grad_norm": 3.3136487007141113,
+      "learning_rate": 0.0001717439088026254,
+      "loss": 3.3688,
+      "step": 2850
+    },
+    {
+      "epoch": 0.546345139412208,
+      "grad_norm": 3.751816511154175,
+      "learning_rate": 0.00017067291410776205,
+      "loss": 3.4496,
+      "step": 2900
+    },
+    {
+      "epoch": 0.5557648831951771,
+      "grad_norm": 3.006528377532959,
+      "learning_rate": 0.0001695854733136917,
+      "loss": 3.3974,
+      "step": 2950
+    },
+    {
+      "epoch": 0.5651846269781462,
+      "grad_norm": 4.444423675537109,
+      "learning_rate": 0.0001684818394757666,
+      "loss": 3.4432,
+      "step": 3000
+    },
+    {
+      "epoch": 0.5746043707611153,
+      "grad_norm": 4.628754138946533,
+      "learning_rate": 0.00016736226941757777,
+      "loss": 3.3712,
+      "step": 3050
+    },
+    {
+      "epoch": 0.5840241145440844,
+      "grad_norm": 3.5171656608581543,
+      "learning_rate": 0.00016622702367119022,
+      "loss": 3.3816,
+      "step": 3100
+    },
+    {
+      "epoch": 0.5934438583270535,
+      "grad_norm": 6.605661392211914,
+      "learning_rate": 0.00016507636641651497,
+      "loss": 3.3663,
+      "step": 3150
+    },
+    {
+      "epoch": 0.6028636021100227,
+      "grad_norm": 3.3791379928588867,
+      "learning_rate": 0.00016391056541983286,
+      "loss": 3.3478,
+      "step": 3200
+    },
+    {
+      "epoch": 0.6122833458929917,
+      "grad_norm": 3.7646875381469727,
+      "learning_rate": 0.0001627298919714832,
+      "loss": 3.3873,
+      "step": 3250
+    },
+    {
+      "epoch": 0.6217030896759608,
+      "grad_norm": 3.5667145252227783,
+      "learning_rate": 0.00016153462082273254,
+      "loss": 3.3857,
+      "step": 3300
+    },
+    {
+      "epoch": 0.6311228334589299,
+      "grad_norm": 4.5765838623046875,
+      "learning_rate": 0.0001603250301218381,
+      "loss": 3.3966,
+      "step": 3350
+    },
+    {
+      "epoch": 0.640542577241899,
+      "grad_norm": 5.043736457824707,
+      "learning_rate": 0.00015910140134932065,
+      "loss": 3.2627,
+      "step": 3400
+    },
+    {
+      "epoch": 0.6499623210248682,
+      "grad_norm": 2.7368907928466797,
+      "learning_rate": 0.00015786401925246195,
+      "loss": 3.3314,
+      "step": 3450
+    },
+    {
+      "epoch": 0.6593820648078372,
+      "grad_norm": 2.664144277572632,
+      "learning_rate": 0.00015661317177904192,
+      "loss": 3.3676,
+      "step": 3500
+    },
+    {
+      "epoch": 0.6688018085908063,
+      "grad_norm": 4.7799553871154785,
+      "learning_rate": 0.00015534915001033133,
+      "loss": 3.3537,
+      "step": 3550
+    },
+    {
+      "epoch": 0.6782215523737755,
+      "grad_norm": 3.967170238494873,
+      "learning_rate": 0.00015407224809335472,
+      "loss": 3.3891,
+      "step": 3600
+    },
+    {
+      "epoch": 0.6876412961567445,
+      "grad_norm": 5.556266784667969,
+      "learning_rate": 0.00015278276317244065,
+      "loss": 3.3001,
+      "step": 3650
+    },
+    {
+      "epoch": 0.6970610399397137,
+      "grad_norm": 3.9986824989318848,
+      "learning_rate": 0.00015148099532007376,
+      "loss": 3.2991,
+      "step": 3700
+    },
+    {
+      "epoch": 0.7064807837226827,
+      "grad_norm": 3.4342408180236816,
+      "learning_rate": 0.00015016724746706587,
+      "loss": 3.3767,
+      "step": 3750
+    },
+    {
+      "epoch": 0.7159005275056518,
+      "grad_norm": 3.237772226333618,
+      "learning_rate": 0.00014884182533206176,
+      "loss": 3.3902,
+      "step": 3800
+    },
+    {
+      "epoch": 0.725320271288621,
+      "grad_norm": 3.1298294067382812,
+      "learning_rate": 0.00014750503735039627,
+      "loss": 3.425,
+      "step": 3850
+    },
+    {
+      "epoch": 0.73474001507159,
+      "grad_norm": 2.7590181827545166,
+      "learning_rate": 0.00014615719460231902,
+      "loss": 3.4326,
+      "step": 3900
+    },
+    {
+      "epoch": 0.7441597588545592,
+      "grad_norm": 2.5840773582458496,
+      "learning_rate": 0.00014479861074060392,
+      "loss": 3.3956,
+      "step": 3950
+    },
+    {
+      "epoch": 0.7535795026375283,
+      "grad_norm": 3.954084873199463,
+      "learning_rate": 0.00014342960191755986,
+      "loss": 3.4343,
+      "step": 4000
+    },
+    {
+      "epoch": 0.7629992464204973,
+      "grad_norm": 3.5688316822052,
+      "learning_rate": 0.00014205048671145975,
+      "loss": 3.3426,
+      "step": 4050
+    },
+    {
+      "epoch": 0.7724189902034665,
+      "grad_norm": 3.0752432346343994,
+      "learning_rate": 0.0001406615860524051,
+      "loss": 3.3778,
+      "step": 4100
+    },
+    {
+      "epoch": 0.7818387339864356,
+      "grad_norm": 4.173229694366455,
+      "learning_rate": 0.00013926322314764325,
+      "loss": 3.3514,
+      "step": 4150
+    },
+    {
+      "epoch": 0.7912584777694047,
+      "grad_norm": 4.163102626800537,
+      "learning_rate": 0.0001378557234063546,
+      "loss": 3.5147,
+      "step": 4200
+    },
+    {
+      "epoch": 0.8006782215523738,
+      "grad_norm": 2.8861260414123535,
+      "learning_rate": 0.0001364394143639277,
+      "loss": 3.3723,
+      "step": 4250
+    },
+    {
+      "epoch": 0.8100979653353428,
+      "grad_norm": 6.053293704986572,
+      "learning_rate": 0.00013501462560573917,
+      "loss": 3.2998,
+      "step": 4300
+    },
+    {
+      "epoch": 0.819517709118312,
+      "grad_norm": 4.325138092041016,
+      "learning_rate": 0.0001335816886904571,
+      "loss": 3.4297,
+      "step": 4350
+    },
+    {
+      "epoch": 0.8289374529012811,
+      "grad_norm": 4.685932159423828,
+      "learning_rate": 0.00013214093707288467,
+      "loss": 3.4018,
+      "step": 4400
+    },
+    {
+      "epoch": 0.8383571966842502,
+      "grad_norm": 3.808549404144287,
+      "learning_rate": 0.00013069270602636296,
+      "loss": 3.3846,
+      "step": 4450
+    },
+    {
+      "epoch": 0.8477769404672193,
+      "grad_norm": 3.4035122394561768,
+      "learning_rate": 0.00012923733256475032,
+      "loss": 3.3903,
+      "step": 4500
+    },
+    {
+      "epoch": 0.8571966842501884,
+      "grad_norm": 3.155348300933838,
+      "learning_rate": 0.0001277751553639969,
+      "loss": 3.411,
+      "step": 4550
+    },
+    {
+      "epoch": 0.8666164280331575,
+      "grad_norm": 3.338160514831543,
+      "learning_rate": 0.00012630651468333216,
+      "loss": 3.3597,
+      "step": 4600
+    },
+    {
+      "epoch": 0.8760361718161266,
+      "grad_norm": 3.211545944213867,
+      "learning_rate": 0.00012483175228608428,
+      "loss": 3.3356,
+      "step": 4650
+    },
+    {
+      "epoch": 0.8854559155990958,
+      "grad_norm": 3.8928279876708984,
+      "learning_rate": 0.0001233512113601492,
+      "loss": 3.3746,
+      "step": 4700
+    },
+    {
+      "epoch": 0.8948756593820648,
+      "grad_norm": 3.73189377784729,
+      "learning_rate": 0.00012186523643812829,
+      "loss": 3.3233,
+      "step": 4750
+    },
+    {
+      "epoch": 0.9042954031650339,
+      "grad_norm": 3.3256442546844482,
+      "learning_rate": 0.00012037417331715326,
+      "loss": 3.3809,
+      "step": 4800
+    },
+    {
+      "epoch": 0.913715146948003,
+      "grad_norm": 2.753485918045044,
+      "learning_rate": 0.00011887836897841656,
+      "loss": 3.3312,
+      "step": 4850
+    },
+    {
+      "epoch": 0.9231348907309721,
+      "grad_norm": 3.896688461303711,
+      "learning_rate": 0.00011737817150642642,
+      "loss": 3.3309,
+      "step": 4900
+    },
+    {
+      "epoch": 0.9325546345139413,
+      "grad_norm": 2.9412078857421875,
+      "learning_rate": 0.00011587393000800495,
+      "loss": 3.3925,
+      "step": 4950
+    },
+    {
+      "epoch": 0.9419743782969103,
+      "grad_norm": 3.046893358230591,
+      "learning_rate": 0.0001143659945310485,
+      "loss": 3.3528,
+      "step": 5000
+    },
+    {
+      "epoch": 0.9513941220798794,
+      "grad_norm": 2.9416866302490234,
+      "learning_rate": 0.00011285471598306904,
+      "loss": 3.3705,
+      "step": 5050
+    },
+    {
+      "epoch": 0.9608138658628486,
+      "grad_norm": 3.031576633453369,
+      "learning_rate": 0.00011134044604953535,
+      "loss": 3.2594,
+      "step": 5100
+    },
+    {
+      "epoch": 0.9702336096458176,
+      "grad_norm": 3.1344618797302246,
+      "learning_rate": 0.00010982353711203335,
+      "loss": 3.3358,
+      "step": 5150
+    },
+    {
+      "epoch": 0.9796533534287868,
+      "grad_norm": 3.8994078636169434,
+      "learning_rate": 0.00010830434216626429,
+      "loss": 3.3859,
+      "step": 5200
+    },
+    {
+      "epoch": 0.9890730972117558,
+      "grad_norm": 3.5508439540863037,
+      "learning_rate": 0.0001067832147399001,
+      "loss": 3.4064,
+      "step": 5250
+    },
+    {
+      "epoch": 0.9984928409947249,
+      "grad_norm": 3.756168842315674,
+      "learning_rate": 0.0001052605088103149,
+      "loss": 3.4031,
+      "step": 5300
+    },
+    {
+      "epoch": 1.007912584777694,
+      "grad_norm": 3.423412799835205,
+      "learning_rate": 0.00010373657872221201,
+      "loss": 3.2772,
+      "step": 5350
+    },
+    {
+      "epoch": 1.0173323285606632,
+      "grad_norm": 2.908240556716919,
+      "learning_rate": 0.00010221177910516535,
+      "loss": 3.2389,
+      "step": 5400
+    },
+    {
+      "epoch": 1.0267520723436323,
+      "grad_norm": 3.300856113433838,
+      "learning_rate": 0.00010068646479109438,
+      "loss": 3.3032,
+      "step": 5450
+    },
+    {
+      "epoch": 1.0361718161266014,
+      "grad_norm": 3.262367010116577,
+      "learning_rate": 9.916099073169246e-05,
+      "loss": 3.2558,
+      "step": 5500
+    },
+    {
+      "epoch": 1.0455915599095704,
+      "grad_norm": 3.140495538711548,
+      "learning_rate": 9.763571191582666e-05,
+      "loss": 3.2801,
+      "step": 5550
+    },
+    {
+      "epoch": 1.0550113036925395,
+      "grad_norm": 4.067094326019287,
+      "learning_rate": 9.611098328692965e-05,
+      "loss": 3.3523,
+      "step": 5600
+    },
+    {
+      "epoch": 1.0644310474755088,
+      "grad_norm": 3.3451836109161377,
+      "learning_rate": 9.45871596604015e-05,
+      "loss": 3.1674,
+      "step": 5650
+    },
+    {
+      "epoch": 1.0738507912584778,
+      "grad_norm": 3.6689610481262207,
+      "learning_rate": 9.306459564104165e-05,
+      "loss": 3.1716,
+      "step": 5700
+    },
+    {
+      "epoch": 1.0832705350414469,
+      "grad_norm": 3.774512529373169,
+      "learning_rate": 9.154364554052994e-05,
+      "loss": 3.2854,
+      "step": 5750
+    },
+    {
+      "epoch": 1.092690278824416,
+      "grad_norm": 5.2747039794921875,
+      "learning_rate": 9.002466329497544e-05,
+      "loss": 3.2782,
+      "step": 5800
+    },
+    {
+      "epoch": 1.102110022607385,
+      "grad_norm": 4.102133750915527,
+      "learning_rate": 8.850800238255325e-05,
+      "loss": 3.2492,
+      "step": 5850
+    },
+    {
+      "epoch": 1.1115297663903543,
+      "grad_norm": 3.484194278717041,
+      "learning_rate": 8.699401574124738e-05,
+      "loss": 3.2691,
+      "step": 5900
+    },
+    {
+      "epoch": 1.1209495101733233,
+      "grad_norm": 3.3113062381744385,
+      "learning_rate": 8.548305568671955e-05,
+      "loss": 3.3834,
+      "step": 5950
+    },
+    {
+      "epoch": 1.1303692539562924,
+      "grad_norm": 3.179262161254883,
+      "learning_rate": 8.397547383032287e-05,
+      "loss": 3.2885,
+      "step": 6000
+    },
+    {
+      "epoch": 1.1397889977392615,
+      "grad_norm": 4.8219313621521,
+      "learning_rate": 8.247162099727914e-05,
+      "loss": 3.1998,
+      "step": 6050
+    },
+    {
+      "epoch": 1.1492087415222305,
+      "grad_norm": 3.6914913654327393,
+      "learning_rate": 8.097184714503958e-05,
+      "loss": 3.1671,
+      "step": 6100
+    },
+    {
+      "epoch": 1.1586284853051998,
+      "grad_norm": 3.255949020385742,
+      "learning_rate": 7.947650128184687e-05,
+      "loss": 3.2354,
+      "step": 6150
+    },
+    {
+      "epoch": 1.1680482290881689,
+      "grad_norm": 4.178644180297852,
+      "learning_rate": 7.801569373352544e-05,
+      "loss": 3.2532,
+      "step": 6200
+    },
+    {
+      "epoch": 1.177467972871138,
+      "grad_norm": 4.313507556915283,
+      "learning_rate": 7.653014082209849e-05,
+      "loss": 3.1376,
+      "step": 6250
+    },
+    {
+      "epoch": 1.186887716654107,
+      "grad_norm": 3.4054031372070312,
+      "learning_rate": 7.505004951696425e-05,
+      "loss": 3.212,
+      "step": 6300
+    },
+    {
+      "epoch": 1.196307460437076,
+      "grad_norm": 4.730232238769531,
+      "learning_rate": 7.357576424609412e-05,
+      "loss": 3.3218,
+      "step": 6350
+    },
+    {
+      "epoch": 1.2057272042200453,
+      "grad_norm": 5.010429382324219,
+      "learning_rate": 7.210762808635328e-05,
+      "loss": 3.3585,
+      "step": 6400
+    },
+    {
+      "epoch": 1.2151469480030144,
+      "grad_norm": 2.9518239498138428,
+      "learning_rate": 7.064598268366423e-05,
+      "loss": 3.234,
+      "step": 6450
+    },
+    {
+      "epoch": 1.2245666917859834,
+      "grad_norm": 3.174252986907959,
+      "learning_rate": 6.919116817350311e-05,
+      "loss": 3.256,
+      "step": 6500
+    },
+    {
+      "epoch": 1.2339864355689525,
+      "grad_norm": 4.318403720855713,
+      "learning_rate": 6.774352310174807e-05,
+      "loss": 3.2992,
+      "step": 6550
+    },
+    {
+      "epoch": 1.2434061793519215,
+      "grad_norm": 3.2725417613983154,
+      "learning_rate": 6.630338434589684e-05,
+      "loss": 3.2518,
+      "step": 6600
+    },
+    {
+      "epoch": 1.2528259231348908,
+      "grad_norm": 4.7729172706604,
+      "learning_rate": 6.48710870366732e-05,
+      "loss": 3.1914,
+      "step": 6650
+    },
+    {
+      "epoch": 1.2622456669178599,
+      "grad_norm": 4.15845251083374,
+      "learning_rate": 6.344696448003936e-05,
+      "loss": 3.3578,
+      "step": 6700
+    },
+    {
+      "epoch": 1.271665410700829,
+      "grad_norm": 4.36642599105835,
+      "learning_rate": 6.203134807963338e-05,
+      "loss": 3.2254,
+      "step": 6750
+    },
+    {
+      "epoch": 1.281085154483798,
+      "grad_norm": 3.920123338699341,
+      "learning_rate": 6.06245672596492e-05,
+      "loss": 3.3039,
+      "step": 6800
+    },
+    {
+      "epoch": 1.290504898266767,
+      "grad_norm": 3.520005941390991,
+      "learning_rate": 5.9226949388177074e-05,
+      "loss": 3.2138,
+      "step": 6850
+    },
+    {
+      "epoch": 1.2999246420497363,
+      "grad_norm": 4.262415409088135,
+      "learning_rate": 5.783881970102284e-05,
+      "loss": 3.2628,
+      "step": 6900
+    },
+    {
+      "epoch": 1.3093443858327054,
+      "grad_norm": 5.027698993682861,
+      "learning_rate": 5.6460501226023046e-05,
+      "loss": 3.2169,
+      "step": 6950
+    },
+    {
+      "epoch": 1.3187641296156745,
+      "grad_norm": 4.618427276611328,
+      "learning_rate": 5.509231470787404e-05,
+      "loss": 3.2332,
+      "step": 7000
+    },
+    {
+      "epoch": 1.3281838733986435,
+      "grad_norm": 4.263955593109131,
+      "learning_rate": 5.3734578533492506e-05,
+      "loss": 3.2219,
+      "step": 7050
+    },
+    {
+      "epoch": 1.3376036171816126,
+      "grad_norm": 4.1408915519714355,
+      "learning_rate": 5.238760865792434e-05,
+      "loss": 3.3457,
+      "step": 7100
+    },
+    {
+      "epoch": 1.3470233609645819,
+      "grad_norm": 4.737987995147705,
+      "learning_rate": 5.105171853081967e-05,
+      "loss": 3.2674,
+      "step": 7150
+    },
+    {
+      "epoch": 1.356443104747551,
+      "grad_norm": 3.757457733154297,
+      "learning_rate": 4.97272190234909e-05,
+      "loss": 3.2347,
+      "step": 7200
+    },
+    {
+      "epoch": 1.36586284853052,
+      "grad_norm": 4.666607856750488,
+      "learning_rate": 4.8414418356570646e-05,
+      "loss": 3.1712,
+      "step": 7250
+    },
+    {
+      "epoch": 1.375282592313489,
+      "grad_norm": 3.6052024364471436,
+      "learning_rate": 4.7113622028286695e-05,
+      "loss": 3.2267,
+      "step": 7300
+    },
+    {
+      "epoch": 1.384702336096458,
+      "grad_norm": 4.111656665802002,
+      "learning_rate": 4.582513274337014e-05,
+      "loss": 3.3016,
+      "step": 7350
+    },
+    {
+      "epoch": 1.3941220798794274,
+      "grad_norm": 4.049815654754639,
+      "learning_rate": 4.454925034261394e-05,
+      "loss": 3.1486,
+      "step": 7400
+    },
+    {
+      "epoch": 1.4035418236623964,
+      "grad_norm": 5.213015079498291,
+      "learning_rate": 4.328627173309776e-05,
+      "loss": 3.2477,
+      "step": 7450
+    },
+    {
+      "epoch": 1.4129615674453655,
+      "grad_norm": 3.8552088737487793,
+      "learning_rate": 4.203649081909552e-05,
+      "loss": 3.2842,
+      "step": 7500
+    },
+    {
+      "epoch": 1.4223813112283346,
+      "grad_norm": 4.220608234405518,
+      "learning_rate": 4.0800198433681856e-05,
+      "loss": 3.2362,
+      "step": 7550
+    },
+    {
+      "epoch": 1.4318010550113036,
+      "grad_norm": 3.6608469486236572,
+      "learning_rate": 3.957768227105295e-05,
+      "loss": 3.2575,
+      "step": 7600
+    },
+    {
+      "epoch": 1.441220798794273,
+      "grad_norm": 3.775721311569214,
+      "learning_rate": 3.83692268195782e-05,
+      "loss": 3.3166,
+      "step": 7650
+    },
+    {
+      "epoch": 1.450640542577242,
+      "grad_norm": 3.3248648643493652,
+      "learning_rate": 3.717511329559756e-05,
+      "loss": 3.2473,
+      "step": 7700
+    },
+    {
+      "epoch": 1.460060286360211,
+      "grad_norm": 4.1636962890625,
+      "learning_rate": 3.599561957798061e-05,
+      "loss": 3.2503,
+      "step": 7750
+    },
+    {
+      "epoch": 1.46948003014318,
+      "grad_norm": 4.054011821746826,
+      "learning_rate": 3.483102014346197e-05,
+      "loss": 3.2782,
+      "step": 7800
+    },
+    {
+      "epoch": 1.4788997739261491,
+      "grad_norm": 4.512032985687256,
+      "learning_rate": 3.368158600276878e-05,
+      "loss": 3.1953,
+      "step": 7850
+    },
+    {
+      "epoch": 1.4883195177091184,
+      "grad_norm": 4.303177833557129,
+      "learning_rate": 3.254758463755433e-05,
+      "loss": 3.2378,
+      "step": 7900
+    },
+    {
+      "epoch": 1.4977392614920875,
+      "grad_norm": 3.9611923694610596,
+      "learning_rate": 3.142927993815323e-05,
+      "loss": 3.2471,
+      "step": 7950
+    },
+    {
+      "epoch": 1.5071590052750565,
+      "grad_norm": 3.29958438873291,
+      "learning_rate": 3.032693214217227e-05,
+      "loss": 3.3052,
+      "step": 8000
+    },
+    {
+      "epoch": 1.5165787490580256,
+      "grad_norm": 3.985778570175171,
+      "learning_rate": 2.9240797773931095e-05,
+      "loss": 3.2962,
+      "step": 8050
+    },
+    {
+      "epoch": 1.5259984928409946,
+      "grad_norm": 4.667643070220947,
+      "learning_rate": 2.8171129584767374e-05,
+      "loss": 3.182,
+      "step": 8100
+    },
+    {
+      "epoch": 1.535418236623964,
+      "grad_norm": 3.338125705718994,
+      "learning_rate": 2.7118176494219484e-05,
+      "loss": 3.3367,
+      "step": 8150
+    },
+    {
+      "epoch": 1.544837980406933,
+      "grad_norm": 4.168639183044434,
+      "learning_rate": 2.608218353210127e-05,
+      "loss": 3.249,
+      "step": 8200
+    },
+    {
+      "epoch": 1.554257724189902,
+      "grad_norm": 4.156576633453369,
+      "learning_rate": 2.5063391781481782e-05,
+      "loss": 3.2778,
+      "step": 8250
+    },
+    {
+      "epoch": 1.563677467972871,
+      "grad_norm": 3.8702266216278076,
+      "learning_rate": 2.4062038322583514e-05,
+      "loss": 3.3314,
+      "step": 8300
+    },
+    {
+      "epoch": 1.5730972117558402,
+      "grad_norm": 3.8674046993255615,
+      "learning_rate": 2.3078356177612193e-05,
+      "loss": 3.3005,
+      "step": 8350
+    },
+    {
+      "epoch": 1.5825169555388094,
+      "grad_norm": 4.1688923835754395,
+      "learning_rate": 2.2112574256530648e-05,
+      "loss": 3.1497,
+      "step": 8400
+    },
+    {
+      "epoch": 1.5919366993217783,
+      "grad_norm": 3.7345731258392334,
+      "learning_rate": 2.1164917303789967e-05,
+      "loss": 3.2213,
+      "step": 8450
+    },
+    {
+      "epoch": 1.6013564431047476,
+      "grad_norm": 5.062714576721191,
+      "learning_rate": 2.0235605846029725e-05,
+      "loss": 3.1729,
+      "step": 8500
+    },
+    {
+      "epoch": 1.6107761868877166,
+      "grad_norm": 3.7708489894866943,
+      "learning_rate": 1.9324856140759927e-05,
+      "loss": 3.3534,
+      "step": 8550
+    },
+    {
+      "epoch": 1.6201959306706857,
+      "grad_norm": 4.154157638549805,
+      "learning_rate": 1.8432880126036266e-05,
+      "loss": 3.2141,
+      "step": 8600
+    },
+    {
+      "epoch": 1.629615674453655,
+      "grad_norm": 3.6767871379852295,
+      "learning_rate": 1.757715792857161e-05,
+      "loss": 3.2594,
+      "step": 8650
+    },
+    {
+      "epoch": 1.6390354182366238,
+      "grad_norm": 2.9394946098327637,
+      "learning_rate": 1.6722961935375305e-05,
+      "loss": 3.1776,
+      "step": 8700
+    },
+    {
+      "epoch": 1.648455162019593,
+      "grad_norm": 3.7877252101898193,
+      "learning_rate": 1.588814511235993e-05,
+      "loss": 3.3192,
+      "step": 8750
+    },
+    {
+      "epoch": 1.6578749058025621,
+      "grad_norm": 3.540524959564209,
+      "learning_rate": 1.5072901727449351e-05,
+      "loss": 3.2872,
+      "step": 8800
+    },
+    {
+      "epoch": 1.6672946495855312,
+      "grad_norm": 3.0098791122436523,
+      "learning_rate": 1.4277421493686417e-05,
+      "loss": 3.2883,
+      "step": 8850
+    },
+    {
+      "epoch": 1.6767143933685005,
+      "grad_norm": 4.260079860687256,
+      "learning_rate": 1.3501889525085553e-05,
+      "loss": 3.2825,
+      "step": 8900
+    },
+    {
+      "epoch": 1.6861341371514693,
+      "grad_norm": 4.635756969451904,
+      "learning_rate": 1.2746486293555393e-05,
+      "loss": 3.2375,
+      "step": 8950
+    },
+    {
+      "epoch": 1.6955538809344386,
+      "grad_norm": 3.4941935539245605,
+      "learning_rate": 1.2011387586901468e-05,
+      "loss": 3.2819,
+      "step": 9000
+    },
+    {
+      "epoch": 1.7049736247174077,
+      "grad_norm": 3.1473937034606934,
+      "learning_rate": 1.1296764467919386e-05,
+      "loss": 3.325,
+      "step": 9050
+    },
+    {
+      "epoch": 1.7143933685003767,
+      "grad_norm": 4.018296718597412,
+      "learning_rate": 1.0602783234587055e-05,
+      "loss": 3.259,
+      "step": 9100
+    },
+    {
+      "epoch": 1.723813112283346,
+      "grad_norm": 3.661189317703247,
+      "learning_rate": 9.929605381366025e-06,
+      "loss": 3.3664,
+      "step": 9150
+    },
+    {
+      "epoch": 1.7332328560663148,
+      "grad_norm": 3.716421127319336,
+      "learning_rate": 9.277387561620621e-06,
+      "loss": 3.2394,
+      "step": 9200
+    },
+    {
+      "epoch": 1.7426525998492841,
+      "grad_norm": 4.685845375061035,
+      "learning_rate": 8.646281551163372e-06,
+      "loss": 3.2201,
+      "step": 9250
+    },
+    {
+      "epoch": 1.7520723436322532,
+      "grad_norm": 4.318825721740723,
+      "learning_rate": 8.036434212935961e-06,
+      "loss": 3.2097,
+      "step": 9300
+    },
+    {
+      "epoch": 1.7614920874152222,
+      "grad_norm": 3.7548320293426514,
+      "learning_rate": 7.447987462832906e-06,
+      "loss": 3.3436,
+      "step": 9350
+    },
+    {
+      "epoch": 1.7709118311981915,
+      "grad_norm": 3.607837677001953,
+      "learning_rate": 6.881078236676819e-06,
+      "loss": 3.1546,
+      "step": 9400
+    },
+    {
+      "epoch": 1.7803315749811603,
+      "grad_norm": 3.722788095474243,
+      "learning_rate": 6.33583845835245e-06,
+      "loss": 3.1925,
+      "step": 9450
+    },
+    {
+      "epoch": 1.7897513187641296,
+      "grad_norm": 4.712146282196045,
+      "learning_rate": 5.812395009106986e-06,
+      "loss": 3.2728,
+      "step": 9500
+    },
+    {
+      "epoch": 1.7991710625470987,
+      "grad_norm": 3.2678236961364746,
+      "learning_rate": 5.310869698023957e-06,
+      "loss": 3.1844,
+      "step": 9550
+    },
+    {
+      "epoch": 1.8085908063300677,
+      "grad_norm": 2.833974599838257,
+      "learning_rate": 4.83137923367728e-06,
+      "loss": 3.3046,
+      "step": 9600
+    },
+    {
+      "epoch": 1.818010550113037,
+      "grad_norm": 3.119704484939575,
+      "learning_rate": 4.37403519697237e-06,
+      "loss": 3.2437,
+      "step": 9650
+    },
+    {
+      "epoch": 1.8274302938960059,
+      "grad_norm": 3.8922572135925293,
+      "learning_rate": 3.93894401518049e-06,
+      "loss": 3.3119,
+      "step": 9700
+    },
+    {
+      "epoch": 1.8368500376789751,
+      "grad_norm": 4.705239295959473,
+      "learning_rate": 3.526206937172283e-06,
+      "loss": 3.1159,
+      "step": 9750
+    },
+    {
+      "epoch": 1.8462697814619442,
+      "grad_norm": 5.051453590393066,
+      "learning_rate": 3.135920009856508e-06,
+      "loss": 3.2452,
+      "step": 9800
+    },
+    {
+      "epoch": 1.8556895252449133,
+      "grad_norm": 4.280581951141357,
+      "learning_rate": 2.7681740558291534e-06,
+      "loss": 3.181,
+      "step": 9850
+    },
+    {
+      "epoch": 1.8651092690278825,
+      "grad_norm": 3.615163564682007,
+      "learning_rate": 2.423054652238388e-06,
+      "loss": 3.3135,
+      "step": 9900
+    },
+    {
+      "epoch": 1.8745290128108514,
+      "grad_norm": 3.097933530807495,
+      "learning_rate": 2.1006421108701658e-06,
+      "loss": 3.158,
+      "step": 9950
+    },
+    {
+      "epoch": 1.8839487565938207,
+      "grad_norm": 4.366623401641846,
+      "learning_rate": 1.8010114594590344e-06,
+      "loss": 3.2268,
+      "step": 10000
+    },
+    {
+      "epoch": 1.8933685003767897,
+      "grad_norm": 4.784694671630859,
+      "learning_rate": 1.524232424228733e-06,
+      "loss": 3.2006,
+      "step": 10050
+    },
+    {
+      "epoch": 1.9027882441597588,
+      "grad_norm": 3.8721609115600586,
+      "learning_rate": 1.2703694136662613e-06,
+      "loss": 3.2527,
+      "step": 10100
+    },
+    {
+      "epoch": 1.912207987942728,
+      "grad_norm": 3.8283731937408447,
+      "learning_rate": 1.0394815035336791e-06,
+      "loss": 3.2675,
+      "step": 10150
+    },
+    {
+      "epoch": 1.921627731725697,
+      "grad_norm": 4.774003028869629,
+      "learning_rate": 8.316224231206704e-07,
+      "loss": 3.2669,
+      "step": 10200
+    },
+    {
+      "epoch": 1.9310474755086662,
+      "grad_norm": 4.009103775024414,
+      "learning_rate": 6.468405427413893e-07,
+      "loss": 3.2564,
+      "step": 10250
+    },
+    {
+      "epoch": 1.9404672192916352,
+      "grad_norm": 3.9507622718811035,
+      "learning_rate": 4.851788624783415e-07,
+      "loss": 3.316,
+      "step": 10300
+    },
+    {
+      "epoch": 1.9498869630746043,
+      "grad_norm": 4.7480340003967285,
+      "learning_rate": 3.466750021758891e-07,
+      "loss": 3.2233,
+      "step": 10350
+    },
+    {
+      "epoch": 1.9593067068575736,
+      "grad_norm": 3.8609328269958496,
+      "learning_rate": 2.313611926859638e-07,
+      "loss": 3.1851,
+      "step": 10400
+    },
+    {
+      "epoch": 1.9687264506405424,
+      "grad_norm": 3.8725392818450928,
+      "learning_rate": 1.39264268367556e-07,
+      "loss": 3.2008,
+      "step": 10450
+    },
+    {
+      "epoch": 1.9781461944235117,
+      "grad_norm": 3.1601712703704834,
+      "learning_rate": 7.040566084230981e-08,
+      "loss": 3.1425,
+      "step": 10500
+    },
+    {
+      "epoch": 1.9875659382064808,
+      "grad_norm": 5.7955098152160645,
+      "learning_rate": 2.4801394007123446e-08,
+      "loss": 3.2082,
+      "step": 10550
+    },
+    {
+      "epoch": 1.9969856819894498,
+      "grad_norm": 3.779911994934082,
+      "learning_rate": 2.462080305365433e-09,
+      "loss": 3.2009,
+      "step": 10600
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 10616,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 6261647005286400.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-10616/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ab69d66570056eeedaac3e15c93029be77fde29561fd972597b2eaaa6d20f3bd
+size 5432

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "</s>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "legacy": false,
+  "model_max_length": 2048,
+  "pad_token": "</s>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}