checkpoint complet pour reprise

Browse files

Files changed (14) hide show

results/checkpoint-1200/README.md +202 -0
results/checkpoint-1200/adapter_config.json +34 -0
results/checkpoint-1200/adapter_model.safetensors +3 -0
results/checkpoint-1200/merges.txt +0 -0
results/checkpoint-1200/optimizer.pt +3 -0
results/checkpoint-1200/rng_state.pth +3 -0
results/checkpoint-1200/scaler.pt +3 -0
results/checkpoint-1200/scheduler.pt +3 -0
results/checkpoint-1200/special_tokens_map.json +63 -0
results/checkpoint-1200/tokenizer.json +0 -0
results/checkpoint-1200/tokenizer_config.json +357 -0
results/checkpoint-1200/trainer_state.json +1114 -0
results/checkpoint-1200/training_args.bin +3 -0
results/checkpoint-1200/vocab.json +0 -0

results/checkpoint-1200/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: bigcode/starcoder2-3b
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.2.dev0

results/checkpoint-1200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "bigcode/starcoder2-3b",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

results/checkpoint-1200/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aadac4b039bae373fdd4721162b0781dcca6c991bae66f228b25e86938e025d4
+size 9108904

results/checkpoint-1200/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

results/checkpoint-1200/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:abbc98f1f2e0b5315aeb9f79cd7f2c04e653a8bd49b8345dba6a8d0c6b41f7ac
+size 18287162

results/checkpoint-1200/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aa8a50f7b976d8c8ca34d880dd26f60dd2f851bac0a0a5095719fb54f5a75773
+size 14244

results/checkpoint-1200/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:beeef06129d3879de46a6da795139adc62396b85b4a9bd7c58a4fe337c9a9c57
+size 988

results/checkpoint-1200/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5cfc5baadd288335fe7d83a0d3dd2b713a9e631fc75cb337745b4efa6e9e4c91
+size 1064

results/checkpoint-1200/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "<fim_prefix>",
+    "<fim_middle>",
+    "<fim_suffix>",
+    "<fim_pad>",
+    "<repo_name>",
+    "<file_sep>",
+    "<issue_start>",
+    "<issue_comment>",
+    "<issue_closed>",
+    "<jupyter_start>",
+    "<jupyter_text>",
+    "<jupyter_code>",
+    "<jupyter_output>",
+    "<jupyter_script>",
+    "<empty_output>",
+    "<code_to_intermediate>",
+    "<intermediate_to_code>",
+    "<pr>",
+    "<pr_status>",
+    "<pr_is_merged>",
+    "<pr_base>",
+    "<pr_file>",
+    "<pr_base_code>",
+    "<pr_diff>",
+    "<pr_diff_hunk>",
+    "<pr_comment>",
+    "<pr_event_id>",
+    "<pr_review>",
+    "<pr_review_state>",
+    "<pr_review_comment>",
+    "<pr_in_reply_to_review_id>",
+    "<pr_in_reply_to_comment_id>",
+    "<pr_diff_hunk_comment_line>",
+    "<NAME>",
+    "<EMAIL>",
+    "<KEY>",
+    "<PASSWORD>"
+  ],
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

results/checkpoint-1200/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

results/checkpoint-1200/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,357 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<fim_prefix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<fim_middle>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<fim_suffix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<fim_pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "<repo_name>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "6": {
+      "content": "<file_sep>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "7": {
+      "content": "<issue_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "8": {
+      "content": "<issue_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "9": {
+      "content": "<issue_closed>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "10": {
+      "content": "<jupyter_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "11": {
+      "content": "<jupyter_text>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "12": {
+      "content": "<jupyter_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "13": {
+      "content": "<jupyter_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "14": {
+      "content": "<jupyter_script>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "15": {
+      "content": "<empty_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "16": {
+      "content": "<code_to_intermediate>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "17": {
+      "content": "<intermediate_to_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "18": {
+      "content": "<pr>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "19": {
+      "content": "<pr_status>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "20": {
+      "content": "<pr_is_merged>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21": {
+      "content": "<pr_base>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "22": {
+      "content": "<pr_file>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "23": {
+      "content": "<pr_base_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "24": {
+      "content": "<pr_diff>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "25": {
+      "content": "<pr_diff_hunk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "26": {
+      "content": "<pr_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "27": {
+      "content": "<pr_event_id>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "28": {
+      "content": "<pr_review>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "29": {
+      "content": "<pr_review_state>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "30": {
+      "content": "<pr_review_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "31": {
+      "content": "<pr_in_reply_to_review_id>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32": {
+      "content": "<pr_in_reply_to_comment_id>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "33": {
+      "content": "<pr_diff_hunk_comment_line>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "34": {
+      "content": "<NAME>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "35": {
+      "content": "<EMAIL>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "36": {
+      "content": "<KEY>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "37": {
+      "content": "<PASSWORD>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "<fim_prefix>",
+    "<fim_middle>",
+    "<fim_suffix>",
+    "<fim_pad>",
+    "<repo_name>",
+    "<file_sep>",
+    "<issue_start>",
+    "<issue_comment>",
+    "<issue_closed>",
+    "<jupyter_start>",
+    "<jupyter_text>",
+    "<jupyter_code>",
+    "<jupyter_output>",
+    "<jupyter_script>",
+    "<empty_output>",
+    "<code_to_intermediate>",
+    "<intermediate_to_code>",
+    "<pr>",
+    "<pr_status>",
+    "<pr_is_merged>",
+    "<pr_base>",
+    "<pr_file>",
+    "<pr_base_code>",
+    "<pr_diff>",
+    "<pr_diff_hunk>",
+    "<pr_comment>",
+    "<pr_event_id>",
+    "<pr_review>",
+    "<pr_review_state>",
+    "<pr_review_comment>",
+    "<pr_in_reply_to_review_id>",
+    "<pr_in_reply_to_comment_id>",
+    "<pr_diff_hunk_comment_line>",
+    "<NAME>",
+    "<EMAIL>",
+    "<KEY>",
+    "<PASSWORD>"
+  ],
+  "bos_token": "<|endoftext|>",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "extra_special_tokens": {},
+  "model_max_length": 1000000000000000019884624838656,
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>",
+  "vocab_size": 49152
+}

results/checkpoint-1200/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1114 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 2.4242424242424243,
+  "eval_steps": 500,
+  "global_step": 1200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.020202020202020204,
+      "grad_norm": 0.7669611573219299,
+      "learning_rate": 8.000000000000001e-06,
+      "loss": 3.4353,
+      "mean_token_accuracy": 0.4085813149809837,
+      "num_tokens": 19518.0,
+      "step": 10
+    },
+    {
+      "epoch": 0.04040404040404041,
+      "grad_norm": 1.0622327327728271,
+      "learning_rate": 1.7000000000000003e-05,
+      "loss": 2.917,
+      "mean_token_accuracy": 0.47086485363543035,
+      "num_tokens": 39828.0,
+      "step": 20
+    },
+    {
+      "epoch": 0.06060606060606061,
+      "grad_norm": 0.7585554718971252,
+      "learning_rate": 2.6000000000000002e-05,
+      "loss": 3.4086,
+      "mean_token_accuracy": 0.4174537444487214,
+      "num_tokens": 60025.0,
+      "step": 30
+    },
+    {
+      "epoch": 0.08080808080808081,
+      "grad_norm": 0.8688523173332214,
+      "learning_rate": 3.6e-05,
+      "loss": 3.0871,
+      "mean_token_accuracy": 0.44842766746878626,
+      "num_tokens": 79198.0,
+      "step": 40
+    },
+    {
+      "epoch": 0.10101010101010101,
+      "grad_norm": 1.1718096733093262,
+      "learning_rate": 4.600000000000001e-05,
+      "loss": 2.8734,
+      "mean_token_accuracy": 0.47476900182664394,
+      "num_tokens": 99513.0,
+      "step": 50
+    },
+    {
+      "epoch": 0.12121212121212122,
+      "grad_norm": 4.563867092132568,
+      "learning_rate": 5.500000000000001e-05,
+      "loss": 3.3615,
+      "mean_token_accuracy": 0.4317817037925124,
+      "num_tokens": 117370.0,
+      "step": 60
+    },
+    {
+      "epoch": 0.1414141414141414,
+      "grad_norm": 1.2560386657714844,
+      "learning_rate": 6.500000000000001e-05,
+      "loss": 3.283,
+      "mean_token_accuracy": 0.4325120337307453,
+      "num_tokens": 135492.0,
+      "step": 70
+    },
+    {
+      "epoch": 0.16161616161616163,
+      "grad_norm": 0.9355543255805969,
+      "learning_rate": 7.500000000000001e-05,
+      "loss": 2.6181,
+      "mean_token_accuracy": 0.5057813063263893,
+      "num_tokens": 155520.0,
+      "step": 80
+    },
+    {
+      "epoch": 0.18181818181818182,
+      "grad_norm": 3.2217044830322266,
+      "learning_rate": 8.5e-05,
+      "loss": 2.7865,
+      "mean_token_accuracy": 0.4679586015641689,
+      "num_tokens": 175768.0,
+      "step": 90
+    },
+    {
+      "epoch": 0.20202020202020202,
+      "grad_norm": 3.879002809524536,
+      "learning_rate": 9.5e-05,
+      "loss": 2.5889,
+      "mean_token_accuracy": 0.4929826859384775,
+      "num_tokens": 194625.0,
+      "step": 100
+    },
+    {
+      "epoch": 0.2222222222222222,
+      "grad_norm": 4.434224605560303,
+      "learning_rate": 9.96389891696751e-05,
+      "loss": 2.8938,
+      "mean_token_accuracy": 0.4700875423848629,
+      "num_tokens": 213171.0,
+      "step": 110
+    },
+    {
+      "epoch": 0.24242424242424243,
+      "grad_norm": 9.846081733703613,
+      "learning_rate": 9.891696750902527e-05,
+      "loss": 2.343,
+      "mean_token_accuracy": 0.5364672098308801,
+      "num_tokens": 233264.0,
+      "step": 120
+    },
+    {
+      "epoch": 0.26262626262626265,
+      "grad_norm": 1.6914633512496948,
+      "learning_rate": 9.819494584837545e-05,
+      "loss": 2.0776,
+      "mean_token_accuracy": 0.5672583125531674,
+      "num_tokens": 253987.0,
+      "step": 130
+    },
+    {
+      "epoch": 0.2828282828282828,
+      "grad_norm": 2.6192626953125,
+      "learning_rate": 9.747292418772563e-05,
+      "loss": 2.3453,
+      "mean_token_accuracy": 0.5441015616059304,
+      "num_tokens": 270290.0,
+      "step": 140
+    },
+    {
+      "epoch": 0.30303030303030304,
+      "grad_norm": 1.5915228128433228,
+      "learning_rate": 9.675090252707581e-05,
+      "loss": 2.3179,
+      "mean_token_accuracy": 0.5427416026592254,
+      "num_tokens": 287937.0,
+      "step": 150
+    },
+    {
+      "epoch": 0.32323232323232326,
+      "grad_norm": 3.6255054473876953,
+      "learning_rate": 9.6028880866426e-05,
+      "loss": 2.0695,
+      "mean_token_accuracy": 0.5739563502371311,
+      "num_tokens": 307751.0,
+      "step": 160
+    },
+    {
+      "epoch": 0.3434343434343434,
+      "grad_norm": 1.644443392753601,
+      "learning_rate": 9.530685920577617e-05,
+      "loss": 2.0005,
+      "mean_token_accuracy": 0.5894926242530346,
+      "num_tokens": 329149.0,
+      "step": 170
+    },
+    {
+      "epoch": 0.36363636363636365,
+      "grad_norm": 3.0595431327819824,
+      "learning_rate": 9.458483754512635e-05,
+      "loss": 2.1171,
+      "mean_token_accuracy": 0.563767921924591,
+      "num_tokens": 346950.0,
+      "step": 180
+    },
+    {
+      "epoch": 0.3838383838383838,
+      "grad_norm": 4.366697311401367,
+      "learning_rate": 9.386281588447655e-05,
+      "loss": 1.8502,
+      "mean_token_accuracy": 0.6056702233850956,
+      "num_tokens": 365017.0,
+      "step": 190
+    },
+    {
+      "epoch": 0.40404040404040403,
+      "grad_norm": 2.07828950881958,
+      "learning_rate": 9.314079422382673e-05,
+      "loss": 1.7173,
+      "mean_token_accuracy": 0.621714337170124,
+      "num_tokens": 385734.0,
+      "step": 200
+    },
+    {
+      "epoch": 0.42424242424242425,
+      "grad_norm": 2.536418914794922,
+      "learning_rate": 9.24187725631769e-05,
+      "loss": 1.8389,
+      "mean_token_accuracy": 0.6277161747217178,
+      "num_tokens": 403192.0,
+      "step": 210
+    },
+    {
+      "epoch": 0.4444444444444444,
+      "grad_norm": 1.2784960269927979,
+      "learning_rate": 9.169675090252709e-05,
+      "loss": 1.8463,
+      "mean_token_accuracy": 0.614106347411871,
+      "num_tokens": 423909.0,
+      "step": 220
+    },
+    {
+      "epoch": 0.46464646464646464,
+      "grad_norm": 2.1213629245758057,
+      "learning_rate": 9.097472924187727e-05,
+      "loss": 1.9916,
+      "mean_token_accuracy": 0.5884236626327037,
+      "num_tokens": 440385.0,
+      "step": 230
+    },
+    {
+      "epoch": 0.48484848484848486,
+      "grad_norm": 2.149017810821533,
+      "learning_rate": 9.025270758122743e-05,
+      "loss": 1.8883,
+      "mean_token_accuracy": 0.5964126840233803,
+      "num_tokens": 458254.0,
+      "step": 240
+    },
+    {
+      "epoch": 0.5050505050505051,
+      "grad_norm": 2.0171642303466797,
+      "learning_rate": 8.953068592057761e-05,
+      "loss": 2.0051,
+      "mean_token_accuracy": 0.5975183926522731,
+      "num_tokens": 473348.0,
+      "step": 250
+    },
+    {
+      "epoch": 0.5252525252525253,
+      "grad_norm": 2.7957370281219482,
+      "learning_rate": 8.88086642599278e-05,
+      "loss": 1.8217,
+      "mean_token_accuracy": 0.6270358674228191,
+      "num_tokens": 494498.0,
+      "step": 260
+    },
+    {
+      "epoch": 0.5454545454545454,
+      "grad_norm": 1.990042805671692,
+      "learning_rate": 8.808664259927798e-05,
+      "loss": 1.9135,
+      "mean_token_accuracy": 0.6138852916657924,
+      "num_tokens": 513100.0,
+      "step": 270
+    },
+    {
+      "epoch": 0.5656565656565656,
+      "grad_norm": 2.3455405235290527,
+      "learning_rate": 8.736462093862816e-05,
+      "loss": 1.73,
+      "mean_token_accuracy": 0.6234532974660396,
+      "num_tokens": 532747.0,
+      "step": 280
+    },
+    {
+      "epoch": 0.5858585858585859,
+      "grad_norm": 6.667909145355225,
+      "learning_rate": 8.664259927797834e-05,
+      "loss": 1.7277,
+      "mean_token_accuracy": 0.6382385298609734,
+      "num_tokens": 548769.0,
+      "step": 290
+    },
+    {
+      "epoch": 0.6060606060606061,
+      "grad_norm": 1.917138695716858,
+      "learning_rate": 8.592057761732852e-05,
+      "loss": 1.5142,
+      "mean_token_accuracy": 0.6500309258699417,
+      "num_tokens": 567923.0,
+      "step": 300
+    },
+    {
+      "epoch": 0.6262626262626263,
+      "grad_norm": 2.0420806407928467,
+      "learning_rate": 8.51985559566787e-05,
+      "loss": 1.7889,
+      "mean_token_accuracy": 0.6363476559519767,
+      "num_tokens": 585783.0,
+      "step": 310
+    },
+    {
+      "epoch": 0.6464646464646465,
+      "grad_norm": 2.097153425216675,
+      "learning_rate": 8.447653429602888e-05,
+      "loss": 1.8036,
+      "mean_token_accuracy": 0.6113098107278347,
+      "num_tokens": 603216.0,
+      "step": 320
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 1.5260653495788574,
+      "learning_rate": 8.375451263537906e-05,
+      "loss": 1.6468,
+      "mean_token_accuracy": 0.6486528031527996,
+      "num_tokens": 624173.0,
+      "step": 330
+    },
+    {
+      "epoch": 0.6868686868686869,
+      "grad_norm": 1.6897279024124146,
+      "learning_rate": 8.303249097472924e-05,
+      "loss": 1.6672,
+      "mean_token_accuracy": 0.6469507545232773,
+      "num_tokens": 644656.0,
+      "step": 340
+    },
+    {
+      "epoch": 0.7070707070707071,
+      "grad_norm": 3.271334648132324,
+      "learning_rate": 8.231046931407944e-05,
+      "loss": 1.7365,
+      "mean_token_accuracy": 0.6231018535792827,
+      "num_tokens": 664866.0,
+      "step": 350
+    },
+    {
+      "epoch": 0.7272727272727273,
+      "grad_norm": 2.4320480823516846,
+      "learning_rate": 8.158844765342962e-05,
+      "loss": 1.7142,
+      "mean_token_accuracy": 0.6582800924777985,
+      "num_tokens": 683588.0,
+      "step": 360
+    },
+    {
+      "epoch": 0.7474747474747475,
+      "grad_norm": 1.7879201173782349,
+      "learning_rate": 8.086642599277978e-05,
+      "loss": 1.7034,
+      "mean_token_accuracy": 0.6335549138486385,
+      "num_tokens": 701111.0,
+      "step": 370
+    },
+    {
+      "epoch": 0.7676767676767676,
+      "grad_norm": 2.026250123977661,
+      "learning_rate": 8.014440433212996e-05,
+      "loss": 1.7315,
+      "mean_token_accuracy": 0.647477601468563,
+      "num_tokens": 719347.0,
+      "step": 380
+    },
+    {
+      "epoch": 0.7878787878787878,
+      "grad_norm": 1.7138152122497559,
+      "learning_rate": 7.942238267148014e-05,
+      "loss": 1.612,
+      "mean_token_accuracy": 0.6578697174787521,
+      "num_tokens": 736038.0,
+      "step": 390
+    },
+    {
+      "epoch": 0.8080808080808081,
+      "grad_norm": 1.5255950689315796,
+      "learning_rate": 7.870036101083032e-05,
+      "loss": 1.8457,
+      "mean_token_accuracy": 0.6219270460307598,
+      "num_tokens": 754840.0,
+      "step": 400
+    },
+    {
+      "epoch": 0.8282828282828283,
+      "grad_norm": 3.739635705947876,
+      "learning_rate": 7.79783393501805e-05,
+      "loss": 1.7356,
+      "mean_token_accuracy": 0.6468625396490097,
+      "num_tokens": 769781.0,
+      "step": 410
+    },
+    {
+      "epoch": 0.8484848484848485,
+      "grad_norm": 1.507598638534546,
+      "learning_rate": 7.72563176895307e-05,
+      "loss": 1.692,
+      "mean_token_accuracy": 0.6468491986393928,
+      "num_tokens": 788586.0,
+      "step": 420
+    },
+    {
+      "epoch": 0.8686868686868687,
+      "grad_norm": 1.7837804555892944,
+      "learning_rate": 7.653429602888087e-05,
+      "loss": 1.5843,
+      "mean_token_accuracy": 0.6515591643750668,
+      "num_tokens": 808940.0,
+      "step": 430
+    },
+    {
+      "epoch": 0.8888888888888888,
+      "grad_norm": 1.6429297924041748,
+      "learning_rate": 7.581227436823105e-05,
+      "loss": 1.7314,
+      "mean_token_accuracy": 0.6319857247173786,
+      "num_tokens": 828022.0,
+      "step": 440
+    },
+    {
+      "epoch": 0.9090909090909091,
+      "grad_norm": 2.7530970573425293,
+      "learning_rate": 7.509025270758123e-05,
+      "loss": 1.7059,
+      "mean_token_accuracy": 0.6434222847223282,
+      "num_tokens": 845577.0,
+      "step": 450
+    },
+    {
+      "epoch": 0.9292929292929293,
+      "grad_norm": 1.5740615129470825,
+      "learning_rate": 7.436823104693141e-05,
+      "loss": 1.7016,
+      "mean_token_accuracy": 0.6465534403920173,
+      "num_tokens": 866655.0,
+      "step": 460
+    },
+    {
+      "epoch": 0.9494949494949495,
+      "grad_norm": 1.735592246055603,
+      "learning_rate": 7.36462093862816e-05,
+      "loss": 1.7066,
+      "mean_token_accuracy": 0.6451319254934788,
+      "num_tokens": 884148.0,
+      "step": 470
+    },
+    {
+      "epoch": 0.9696969696969697,
+      "grad_norm": 2.2288308143615723,
+      "learning_rate": 7.292418772563177e-05,
+      "loss": 1.5397,
+      "mean_token_accuracy": 0.657177159935236,
+      "num_tokens": 905387.0,
+      "step": 480
+    },
+    {
+      "epoch": 0.98989898989899,
+      "grad_norm": 2.363151788711548,
+      "learning_rate": 7.220216606498195e-05,
+      "loss": 1.919,
+      "mean_token_accuracy": 0.632861833833158,
+      "num_tokens": 925073.0,
+      "step": 490
+    },
+    {
+      "epoch": 1.0101010101010102,
+      "grad_norm": 2.896883487701416,
+      "learning_rate": 7.148014440433213e-05,
+      "loss": 1.7299,
+      "mean_token_accuracy": 0.6438414633274079,
+      "num_tokens": 941834.0,
+      "step": 500
+    },
+    {
+      "epoch": 1.0303030303030303,
+      "grad_norm": 5.034731388092041,
+      "learning_rate": 7.075812274368231e-05,
+      "loss": 1.6831,
+      "mean_token_accuracy": 0.6518400736153126,
+      "num_tokens": 958017.0,
+      "step": 510
+    },
+    {
+      "epoch": 1.0505050505050506,
+      "grad_norm": 1.8448883295059204,
+      "learning_rate": 7.003610108303249e-05,
+      "loss": 1.5903,
+      "mean_token_accuracy": 0.656456682831049,
+      "num_tokens": 974729.0,
+      "step": 520
+    },
+    {
+      "epoch": 1.0707070707070707,
+      "grad_norm": 1.8980131149291992,
+      "learning_rate": 6.931407942238267e-05,
+      "loss": 1.5521,
+      "mean_token_accuracy": 0.6531489036977292,
+      "num_tokens": 995648.0,
+      "step": 530
+    },
+    {
+      "epoch": 1.0909090909090908,
+      "grad_norm": 11.001644134521484,
+      "learning_rate": 6.859205776173285e-05,
+      "loss": 1.6765,
+      "mean_token_accuracy": 0.6484075963497162,
+      "num_tokens": 1013028.0,
+      "step": 540
+    },
+    {
+      "epoch": 1.1111111111111112,
+      "grad_norm": 2.1369686126708984,
+      "learning_rate": 6.787003610108303e-05,
+      "loss": 1.6332,
+      "mean_token_accuracy": 0.6697568111121655,
+      "num_tokens": 1035666.0,
+      "step": 550
+    },
+    {
+      "epoch": 1.1313131313131313,
+      "grad_norm": 1.4799697399139404,
+      "learning_rate": 6.714801444043321e-05,
+      "loss": 1.7022,
+      "mean_token_accuracy": 0.6447197504341602,
+      "num_tokens": 1055111.0,
+      "step": 560
+    },
+    {
+      "epoch": 1.1515151515151516,
+      "grad_norm": 2.329430341720581,
+      "learning_rate": 6.642599277978339e-05,
+      "loss": 1.7747,
+      "mean_token_accuracy": 0.6284119591116906,
+      "num_tokens": 1073114.0,
+      "step": 570
+    },
+    {
+      "epoch": 1.1717171717171717,
+      "grad_norm": 3.0006322860717773,
+      "learning_rate": 6.570397111913357e-05,
+      "loss": 1.6484,
+      "mean_token_accuracy": 0.6459825620055198,
+      "num_tokens": 1089325.0,
+      "step": 580
+    },
+    {
+      "epoch": 1.1919191919191918,
+      "grad_norm": 8.296801567077637,
+      "learning_rate": 6.498194945848377e-05,
+      "loss": 1.6361,
+      "mean_token_accuracy": 0.6575549930334091,
+      "num_tokens": 1105923.0,
+      "step": 590
+    },
+    {
+      "epoch": 1.2121212121212122,
+      "grad_norm": 2.0805375576019287,
+      "learning_rate": 6.425992779783394e-05,
+      "loss": 1.4366,
+      "mean_token_accuracy": 0.6729512564837933,
+      "num_tokens": 1127328.0,
+      "step": 600
+    },
+    {
+      "epoch": 1.2323232323232323,
+      "grad_norm": 2.0608692169189453,
+      "learning_rate": 6.353790613718412e-05,
+      "loss": 1.5935,
+      "mean_token_accuracy": 0.6634075284004212,
+      "num_tokens": 1147181.0,
+      "step": 610
+    },
+    {
+      "epoch": 1.2525252525252526,
+      "grad_norm": 3.865906238555908,
+      "learning_rate": 6.28158844765343e-05,
+      "loss": 1.5445,
+      "mean_token_accuracy": 0.6648930206894874,
+      "num_tokens": 1164753.0,
+      "step": 620
+    },
+    {
+      "epoch": 1.2727272727272727,
+      "grad_norm": 1.8212089538574219,
+      "learning_rate": 6.209386281588448e-05,
+      "loss": 1.6492,
+      "mean_token_accuracy": 0.6418032497167587,
+      "num_tokens": 1184594.0,
+      "step": 630
+    },
+    {
+      "epoch": 1.2929292929292928,
+      "grad_norm": 3.3243095874786377,
+      "learning_rate": 6.137184115523465e-05,
+      "loss": 1.5253,
+      "mean_token_accuracy": 0.669656652957201,
+      "num_tokens": 1206129.0,
+      "step": 640
+    },
+    {
+      "epoch": 1.3131313131313131,
+      "grad_norm": 1.6167833805084229,
+      "learning_rate": 6.064981949458484e-05,
+      "loss": 1.5478,
+      "mean_token_accuracy": 0.6591526836156845,
+      "num_tokens": 1226012.0,
+      "step": 650
+    },
+    {
+      "epoch": 1.3333333333333333,
+      "grad_norm": 3.81766676902771,
+      "learning_rate": 5.992779783393502e-05,
+      "loss": 1.788,
+      "mean_token_accuracy": 0.6285306230187416,
+      "num_tokens": 1242162.0,
+      "step": 660
+    },
+    {
+      "epoch": 1.3535353535353536,
+      "grad_norm": 1.2418630123138428,
+      "learning_rate": 5.9205776173285197e-05,
+      "loss": 1.498,
+      "mean_token_accuracy": 0.6632598295807839,
+      "num_tokens": 1265769.0,
+      "step": 670
+    },
+    {
+      "epoch": 1.3737373737373737,
+      "grad_norm": 5.77175235748291,
+      "learning_rate": 5.848375451263538e-05,
+      "loss": 1.5168,
+      "mean_token_accuracy": 0.668974144756794,
+      "num_tokens": 1284762.0,
+      "step": 680
+    },
+    {
+      "epoch": 1.393939393939394,
+      "grad_norm": 2.184446334838867,
+      "learning_rate": 5.776173285198556e-05,
+      "loss": 1.5881,
+      "mean_token_accuracy": 0.6551995210349559,
+      "num_tokens": 1303301.0,
+      "step": 690
+    },
+    {
+      "epoch": 1.4141414141414141,
+      "grad_norm": 1.2407817840576172,
+      "learning_rate": 5.703971119133574e-05,
+      "loss": 1.5,
+      "mean_token_accuracy": 0.6752019837498665,
+      "num_tokens": 1325905.0,
+      "step": 700
+    },
+    {
+      "epoch": 1.4343434343434343,
+      "grad_norm": 1.709302544593811,
+      "learning_rate": 5.631768953068592e-05,
+      "loss": 1.3928,
+      "mean_token_accuracy": 0.6914731428027153,
+      "num_tokens": 1345901.0,
+      "step": 710
+    },
+    {
+      "epoch": 1.4545454545454546,
+      "grad_norm": 1.451839566230774,
+      "learning_rate": 5.55956678700361e-05,
+      "loss": 1.7266,
+      "mean_token_accuracy": 0.6524573139846325,
+      "num_tokens": 1362788.0,
+      "step": 720
+    },
+    {
+      "epoch": 1.4747474747474747,
+      "grad_norm": 3.0613152980804443,
+      "learning_rate": 5.487364620938629e-05,
+      "loss": 1.5518,
+      "mean_token_accuracy": 0.669068893790245,
+      "num_tokens": 1379456.0,
+      "step": 730
+    },
+    {
+      "epoch": 1.494949494949495,
+      "grad_norm": 1.5313241481781006,
+      "learning_rate": 5.415162454873647e-05,
+      "loss": 1.4793,
+      "mean_token_accuracy": 0.6733302772045135,
+      "num_tokens": 1398659.0,
+      "step": 740
+    },
+    {
+      "epoch": 1.5151515151515151,
+      "grad_norm": 1.9046810865402222,
+      "learning_rate": 5.342960288808665e-05,
+      "loss": 1.441,
+      "mean_token_accuracy": 0.681334413588047,
+      "num_tokens": 1416828.0,
+      "step": 750
+    },
+    {
+      "epoch": 1.5353535353535355,
+      "grad_norm": 1.984887719154358,
+      "learning_rate": 5.270758122743683e-05,
+      "loss": 1.6379,
+      "mean_token_accuracy": 0.6509823858737945,
+      "num_tokens": 1431285.0,
+      "step": 760
+    },
+    {
+      "epoch": 1.5555555555555556,
+      "grad_norm": 1.1224578619003296,
+      "learning_rate": 5.1985559566787e-05,
+      "loss": 1.6412,
+      "mean_token_accuracy": 0.6585724964737892,
+      "num_tokens": 1451394.0,
+      "step": 770
+    },
+    {
+      "epoch": 1.5757575757575757,
+      "grad_norm": 1.988461971282959,
+      "learning_rate": 5.126353790613718e-05,
+      "loss": 1.7935,
+      "mean_token_accuracy": 0.6379878364503384,
+      "num_tokens": 1471734.0,
+      "step": 780
+    },
+    {
+      "epoch": 1.595959595959596,
+      "grad_norm": 1.495737075805664,
+      "learning_rate": 5.054151624548736e-05,
+      "loss": 1.5828,
+      "mean_token_accuracy": 0.6762645319104195,
+      "num_tokens": 1489257.0,
+      "step": 790
+    },
+    {
+      "epoch": 1.6161616161616161,
+      "grad_norm": 8.480497360229492,
+      "learning_rate": 4.981949458483755e-05,
+      "loss": 1.8259,
+      "mean_token_accuracy": 0.6398707143962383,
+      "num_tokens": 1506944.0,
+      "step": 800
+    },
+    {
+      "epoch": 1.6363636363636362,
+      "grad_norm": 3.5872299671173096,
+      "learning_rate": 4.909747292418773e-05,
+      "loss": 1.659,
+      "mean_token_accuracy": 0.6536437503993511,
+      "num_tokens": 1522614.0,
+      "step": 810
+    },
+    {
+      "epoch": 1.6565656565656566,
+      "grad_norm": 1.6361726522445679,
+      "learning_rate": 4.837545126353791e-05,
+      "loss": 1.6725,
+      "mean_token_accuracy": 0.6559996947646141,
+      "num_tokens": 1543689.0,
+      "step": 820
+    },
+    {
+      "epoch": 1.676767676767677,
+      "grad_norm": 2.0231411457061768,
+      "learning_rate": 4.765342960288809e-05,
+      "loss": 1.5075,
+      "mean_token_accuracy": 0.6640274345874786,
+      "num_tokens": 1563909.0,
+      "step": 830
+    },
+    {
+      "epoch": 1.696969696969697,
+      "grad_norm": 2.8920161724090576,
+      "learning_rate": 4.693140794223827e-05,
+      "loss": 1.7398,
+      "mean_token_accuracy": 0.6467047482728958,
+      "num_tokens": 1581501.0,
+      "step": 840
+    },
+    {
+      "epoch": 1.7171717171717171,
+      "grad_norm": 1.7013530731201172,
+      "learning_rate": 4.620938628158845e-05,
+      "loss": 1.5343,
+      "mean_token_accuracy": 0.6554797604680062,
+      "num_tokens": 1602745.0,
+      "step": 850
+    },
+    {
+      "epoch": 1.7373737373737375,
+      "grad_norm": 1.5854769945144653,
+      "learning_rate": 4.548736462093863e-05,
+      "loss": 1.5482,
+      "mean_token_accuracy": 0.6624557688832283,
+      "num_tokens": 1622681.0,
+      "step": 860
+    },
+    {
+      "epoch": 1.7575757575757576,
+      "grad_norm": 1.8224149942398071,
+      "learning_rate": 4.4765342960288806e-05,
+      "loss": 1.5386,
+      "mean_token_accuracy": 0.6684516966342926,
+      "num_tokens": 1640007.0,
+      "step": 870
+    },
+    {
+      "epoch": 1.7777777777777777,
+      "grad_norm": 3.453603744506836,
+      "learning_rate": 4.404332129963899e-05,
+      "loss": 1.517,
+      "mean_token_accuracy": 0.6810053952038289,
+      "num_tokens": 1662564.0,
+      "step": 880
+    },
+    {
+      "epoch": 1.797979797979798,
+      "grad_norm": 1.8291434049606323,
+      "learning_rate": 4.332129963898917e-05,
+      "loss": 1.4867,
+      "mean_token_accuracy": 0.6807132661342621,
+      "num_tokens": 1682205.0,
+      "step": 890
+    },
+    {
+      "epoch": 1.8181818181818183,
+      "grad_norm": 3.217017889022827,
+      "learning_rate": 4.259927797833935e-05,
+      "loss": 1.5669,
+      "mean_token_accuracy": 0.6671051770448685,
+      "num_tokens": 1697359.0,
+      "step": 900
+    },
+    {
+      "epoch": 1.8383838383838382,
+      "grad_norm": 1.371291160583496,
+      "learning_rate": 4.187725631768953e-05,
+      "loss": 1.4343,
+      "mean_token_accuracy": 0.6958822838962078,
+      "num_tokens": 1720184.0,
+      "step": 910
+    },
+    {
+      "epoch": 1.8585858585858586,
+      "grad_norm": 2.7192142009735107,
+      "learning_rate": 4.115523465703972e-05,
+      "loss": 1.4134,
+      "mean_token_accuracy": 0.6945954069495202,
+      "num_tokens": 1739446.0,
+      "step": 920
+    },
+    {
+      "epoch": 1.878787878787879,
+      "grad_norm": 2.4172279834747314,
+      "learning_rate": 4.043321299638989e-05,
+      "loss": 1.5238,
+      "mean_token_accuracy": 0.6700037866830826,
+      "num_tokens": 1758629.0,
+      "step": 930
+    },
+    {
+      "epoch": 1.898989898989899,
+      "grad_norm": 1.7151827812194824,
+      "learning_rate": 3.971119133574007e-05,
+      "loss": 1.5609,
+      "mean_token_accuracy": 0.665402963757515,
+      "num_tokens": 1777313.0,
+      "step": 940
+    },
+    {
+      "epoch": 1.9191919191919191,
+      "grad_norm": 2.2101497650146484,
+      "learning_rate": 3.898916967509025e-05,
+      "loss": 1.6266,
+      "mean_token_accuracy": 0.6585289388895035,
+      "num_tokens": 1797829.0,
+      "step": 950
+    },
+    {
+      "epoch": 1.9393939393939394,
+      "grad_norm": 1.5860098600387573,
+      "learning_rate": 3.826714801444044e-05,
+      "loss": 1.5842,
+      "mean_token_accuracy": 0.6568711154162884,
+      "num_tokens": 1819044.0,
+      "step": 960
+    },
+    {
+      "epoch": 1.9595959595959596,
+      "grad_norm": 2.2135324478149414,
+      "learning_rate": 3.754512635379062e-05,
+      "loss": 1.5017,
+      "mean_token_accuracy": 0.6738567680120469,
+      "num_tokens": 1837829.0,
+      "step": 970
+    },
+    {
+      "epoch": 1.9797979797979797,
+      "grad_norm": 1.8832942247390747,
+      "learning_rate": 3.68231046931408e-05,
+      "loss": 1.6386,
+      "mean_token_accuracy": 0.6536656714975834,
+      "num_tokens": 1854112.0,
+      "step": 980
+    },
+    {
+      "epoch": 2.0,
+      "grad_norm": 1.4356534481048584,
+      "learning_rate": 3.610108303249098e-05,
+      "loss": 1.5847,
+      "mean_token_accuracy": 0.661115899682045,
+      "num_tokens": 1869584.0,
+      "step": 990
+    },
+    {
+      "epoch": 2.0202020202020203,
+      "grad_norm": 3.277709484100342,
+      "learning_rate": 3.537906137184116e-05,
+      "loss": 1.5794,
+      "mean_token_accuracy": 0.6566751167178154,
+      "num_tokens": 1885930.0,
+      "step": 1000
+    },
+    {
+      "epoch": 2.04040404040404,
+      "grad_norm": 1.672176718711853,
+      "learning_rate": 3.4657039711191336e-05,
+      "loss": 1.6426,
+      "mean_token_accuracy": 0.669060529768467,
+      "num_tokens": 1908386.0,
+      "step": 1010
+    },
+    {
+      "epoch": 2.0606060606060606,
+      "grad_norm": 1.787185549736023,
+      "learning_rate": 3.3935018050541516e-05,
+      "loss": 1.487,
+      "mean_token_accuracy": 0.6758425906300545,
+      "num_tokens": 1928179.0,
+      "step": 1020
+    },
+    {
+      "epoch": 2.080808080808081,
+      "grad_norm": 1.1577355861663818,
+      "learning_rate": 3.3212996389891696e-05,
+      "loss": 1.4625,
+      "mean_token_accuracy": 0.6700806766748428,
+      "num_tokens": 1947571.0,
+      "step": 1030
+    },
+    {
+      "epoch": 2.101010101010101,
+      "grad_norm": 2.881878137588501,
+      "learning_rate": 3.249097472924188e-05,
+      "loss": 1.5191,
+      "mean_token_accuracy": 0.6762366116046905,
+      "num_tokens": 1965861.0,
+      "step": 1040
+    },
+    {
+      "epoch": 2.121212121212121,
+      "grad_norm": 1.5470958948135376,
+      "learning_rate": 3.176895306859206e-05,
+      "loss": 1.5277,
+      "mean_token_accuracy": 0.6653557240962982,
+      "num_tokens": 1987291.0,
+      "step": 1050
+    },
+    {
+      "epoch": 2.1414141414141414,
+      "grad_norm": 1.8662647008895874,
+      "learning_rate": 3.104693140794224e-05,
+      "loss": 1.4983,
+      "mean_token_accuracy": 0.6764706581830978,
+      "num_tokens": 2003260.0,
+      "step": 1060
+    },
+    {
+      "epoch": 2.1616161616161618,
+      "grad_norm": 1.2521296739578247,
+      "learning_rate": 3.032490974729242e-05,
+      "loss": 1.4703,
+      "mean_token_accuracy": 0.6709941066801548,
+      "num_tokens": 2020415.0,
+      "step": 1070
+    },
+    {
+      "epoch": 2.1818181818181817,
+      "grad_norm": 6.714540004730225,
+      "learning_rate": 2.9602888086642598e-05,
+      "loss": 1.6314,
+      "mean_token_accuracy": 0.6627085514366626,
+      "num_tokens": 2037429.0,
+      "step": 1080
+    },
+    {
+      "epoch": 2.202020202020202,
+      "grad_norm": 2.123655080795288,
+      "learning_rate": 2.888086642599278e-05,
+      "loss": 1.5588,
+      "mean_token_accuracy": 0.6700812846422195,
+      "num_tokens": 2054688.0,
+      "step": 1090
+    },
+    {
+      "epoch": 2.2222222222222223,
+      "grad_norm": 2.0840301513671875,
+      "learning_rate": 2.815884476534296e-05,
+      "loss": 1.7006,
+      "mean_token_accuracy": 0.6508332662284374,
+      "num_tokens": 2075865.0,
+      "step": 1100
+    },
+    {
+      "epoch": 2.242424242424242,
+      "grad_norm": 1.9797368049621582,
+      "learning_rate": 2.7436823104693144e-05,
+      "loss": 1.501,
+      "mean_token_accuracy": 0.6622319832444191,
+      "num_tokens": 2093688.0,
+      "step": 1110
+    },
+    {
+      "epoch": 2.2626262626262625,
+      "grad_norm": 2.007617950439453,
+      "learning_rate": 2.6714801444043324e-05,
+      "loss": 1.487,
+      "mean_token_accuracy": 0.6778388306498527,
+      "num_tokens": 2113155.0,
+      "step": 1120
+    },
+    {
+      "epoch": 2.282828282828283,
+      "grad_norm": 1.2606422901153564,
+      "learning_rate": 2.59927797833935e-05,
+      "loss": 1.4389,
+      "mean_token_accuracy": 0.6810906417667866,
+      "num_tokens": 2131996.0,
+      "step": 1130
+    },
+    {
+      "epoch": 2.303030303030303,
+      "grad_norm": 1.655875563621521,
+      "learning_rate": 2.527075812274368e-05,
+      "loss": 1.5242,
+      "mean_token_accuracy": 0.6587833181023598,
+      "num_tokens": 2151671.0,
+      "step": 1140
+    },
+    {
+      "epoch": 2.323232323232323,
+      "grad_norm": 1.516184687614441,
+      "learning_rate": 2.4548736462093864e-05,
+      "loss": 1.4613,
+      "mean_token_accuracy": 0.6847339481115341,
+      "num_tokens": 2173364.0,
+      "step": 1150
+    },
+    {
+      "epoch": 2.3434343434343434,
+      "grad_norm": 1.842247486114502,
+      "learning_rate": 2.3826714801444043e-05,
+      "loss": 1.5037,
+      "mean_token_accuracy": 0.6658960357308388,
+      "num_tokens": 2190588.0,
+      "step": 1160
+    },
+    {
+      "epoch": 2.3636363636363638,
+      "grad_norm": 3.459821939468384,
+      "learning_rate": 2.3104693140794227e-05,
+      "loss": 1.6169,
+      "mean_token_accuracy": 0.6574626617133618,
+      "num_tokens": 2212944.0,
+      "step": 1170
+    },
+    {
+      "epoch": 2.3838383838383836,
+      "grad_norm": 2.880796194076538,
+      "learning_rate": 2.2382671480144403e-05,
+      "loss": 1.4261,
+      "mean_token_accuracy": 0.6781707689166069,
+      "num_tokens": 2230777.0,
+      "step": 1180
+    },
+    {
+      "epoch": 2.404040404040404,
+      "grad_norm": 1.416815996170044,
+      "learning_rate": 2.1660649819494586e-05,
+      "loss": 1.539,
+      "mean_token_accuracy": 0.6707717284560204,
+      "num_tokens": 2248750.0,
+      "step": 1190
+    },
+    {
+      "epoch": 2.4242424242424243,
+      "grad_norm": 1.6914799213409424,
+      "learning_rate": 2.0938628158844766e-05,
+      "loss": 1.4456,
+      "mean_token_accuracy": 0.6809201754629612,
+      "num_tokens": 2266641.0,
+      "step": 1200
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 1485,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 50,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.918999165635174e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

results/checkpoint-1200/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2c2e099a6969a2a35f5b0a318e89c5857fca33ddbae202ddebca99dadbbe51de
+size 5560

results/checkpoint-1200/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff