Add NER LoRA checkpoint (checkpoints/ner-wikiann-bert-smoke)

by SHA888 - opened Sep 23, 2025

base: refs/heads/main

←

from: refs/pr/5

Discussion Files changed

+118726

-0

Files changed (19) hide show

checkpoints/ner-wikiann-bert-smoke/README.md +206 -0
checkpoints/ner-wikiann-bert-smoke/adapter_config.json +40 -0
checkpoints/ner-wikiann-bert-smoke/adapter_model.safetensors +3 -0
checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/README.md +206 -0
checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/adapter_config.json +40 -0
checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/adapter_model.safetensors +3 -0
checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/optimizer.pt +3 -0
checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/rng_state.pth +3 -0
checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/scheduler.pt +3 -0
checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/special_tokens_map.json +7 -0
checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/tokenizer.json +0 -0
checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/tokenizer_config.json +56 -0
checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/trainer_state.json +1796 -0
checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/training_args.bin +3 -0
checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/vocab.txt +0 -0
checkpoints/ner-wikiann-bert-smoke/special_tokens_map.json +7 -0
checkpoints/ner-wikiann-bert-smoke/tokenizer.json +0 -0
checkpoints/ner-wikiann-bert-smoke/tokenizer_config.json +56 -0
checkpoints/ner-wikiann-bert-smoke/vocab.txt +0 -0

checkpoints/ner-wikiann-bert-smoke/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: bert-base-cased
+library_name: peft
+tags:
+- base_model:adapter:bert-base-cased
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

checkpoints/ner-wikiann-bert-smoke/adapter_config.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "bert-base-cased",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "value",
+    "query"
+  ],
+  "target_parameters": null,
+  "task_type": "TOKEN_CLS",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoints/ner-wikiann-bert-smoke/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8bb93371ec4eb323bf453e66bdc4110c3bfa598c0a133faaf661c4e7afcabecc
+size 1208052

checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: bert-base-cased
+library_name: peft
+tags:
+- base_model:adapter:bert-base-cased
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "bert-base-cased",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "value",
+    "query"
+  ],
+  "target_parameters": null,
+  "task_type": "TOKEN_CLS",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8bb93371ec4eb323bf453e66bdc4110c3bfa598c0a133faaf661c4e7afcabecc
+size 1208052

checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bee40b7f0ed2dfab88684f1401268be0581871a8769b7770d51dff3bb4af145d
+size 2445771

checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee49f7c2a7be0f5271687059660052b26a1738f92fff221cd3fe6de0c4bbf6b2
+size 14645

checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7452baf421186c7ecb17aca0bed967e1dda02ffe91de0d6aa55e68a1398920e2
+size 1465

checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": false,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1796 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 2500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.004,
+      "grad_norm": 3.243311882019043,
+      "learning_rate": 1.9928e-05,
+      "loss": 1.9443,
+      "step": 10
+    },
+    {
+      "epoch": 0.008,
+      "grad_norm": 3.4340174198150635,
+      "learning_rate": 1.9848e-05,
+      "loss": 1.9473,
+      "step": 20
+    },
+    {
+      "epoch": 0.012,
+      "grad_norm": 3.3173458576202393,
+      "learning_rate": 1.9768e-05,
+      "loss": 1.9152,
+      "step": 30
+    },
+    {
+      "epoch": 0.016,
+      "grad_norm": 5.302436351776123,
+      "learning_rate": 1.9688000000000002e-05,
+      "loss": 1.9221,
+      "step": 40
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 3.962202548980713,
+      "learning_rate": 1.9608000000000003e-05,
+      "loss": 1.8838,
+      "step": 50
+    },
+    {
+      "epoch": 0.024,
+      "grad_norm": 3.301600933074951,
+      "learning_rate": 1.9528000000000003e-05,
+      "loss": 1.8561,
+      "step": 60
+    },
+    {
+      "epoch": 0.028,
+      "grad_norm": 3.8835692405700684,
+      "learning_rate": 1.9448e-05,
+      "loss": 1.8696,
+      "step": 70
+    },
+    {
+      "epoch": 0.032,
+      "grad_norm": 4.735332012176514,
+      "learning_rate": 1.9368e-05,
+      "loss": 1.8405,
+      "step": 80
+    },
+    {
+      "epoch": 0.036,
+      "grad_norm": 3.618642568588257,
+      "learning_rate": 1.9288000000000002e-05,
+      "loss": 1.8151,
+      "step": 90
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 5.779294013977051,
+      "learning_rate": 1.9208000000000003e-05,
+      "loss": 1.8103,
+      "step": 100
+    },
+    {
+      "epoch": 0.044,
+      "grad_norm": 4.251532554626465,
+      "learning_rate": 1.9128e-05,
+      "loss": 1.8145,
+      "step": 110
+    },
+    {
+      "epoch": 0.048,
+      "grad_norm": 4.306429862976074,
+      "learning_rate": 1.9048e-05,
+      "loss": 1.7743,
+      "step": 120
+    },
+    {
+      "epoch": 0.052,
+      "grad_norm": 2.939504623413086,
+      "learning_rate": 1.8968000000000002e-05,
+      "loss": 1.7805,
+      "step": 130
+    },
+    {
+      "epoch": 0.056,
+      "grad_norm": 3.0483181476593018,
+      "learning_rate": 1.8888000000000003e-05,
+      "loss": 1.7268,
+      "step": 140
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 2.478515386581421,
+      "learning_rate": 1.8808e-05,
+      "loss": 1.6992,
+      "step": 150
+    },
+    {
+      "epoch": 0.064,
+      "grad_norm": 3.3856873512268066,
+      "learning_rate": 1.8728e-05,
+      "loss": 1.7258,
+      "step": 160
+    },
+    {
+      "epoch": 0.068,
+      "grad_norm": 2.6437103748321533,
+      "learning_rate": 1.8648000000000002e-05,
+      "loss": 1.6494,
+      "step": 170
+    },
+    {
+      "epoch": 0.072,
+      "grad_norm": 3.571794033050537,
+      "learning_rate": 1.8568000000000002e-05,
+      "loss": 1.6458,
+      "step": 180
+    },
+    {
+      "epoch": 0.076,
+      "grad_norm": 3.8345205783843994,
+      "learning_rate": 1.8488e-05,
+      "loss": 1.6268,
+      "step": 190
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 2.661442995071411,
+      "learning_rate": 1.8408e-05,
+      "loss": 1.5826,
+      "step": 200
+    },
+    {
+      "epoch": 0.084,
+      "grad_norm": 2.9777419567108154,
+      "learning_rate": 1.8328e-05,
+      "loss": 1.563,
+      "step": 210
+    },
+    {
+      "epoch": 0.088,
+      "grad_norm": 3.143165349960327,
+      "learning_rate": 1.8248000000000002e-05,
+      "loss": 1.5037,
+      "step": 220
+    },
+    {
+      "epoch": 0.092,
+      "grad_norm": 3.133349657058716,
+      "learning_rate": 1.8168e-05,
+      "loss": 1.5133,
+      "step": 230
+    },
+    {
+      "epoch": 0.096,
+      "grad_norm": 2.9322919845581055,
+      "learning_rate": 1.8088e-05,
+      "loss": 1.4559,
+      "step": 240
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 3.060082197189331,
+      "learning_rate": 1.8008e-05,
+      "loss": 1.4318,
+      "step": 250
+    },
+    {
+      "epoch": 0.104,
+      "grad_norm": 1.9127253293991089,
+      "learning_rate": 1.7928000000000002e-05,
+      "loss": 1.5185,
+      "step": 260
+    },
+    {
+      "epoch": 0.108,
+      "grad_norm": 2.1275875568389893,
+      "learning_rate": 1.7848e-05,
+      "loss": 1.4826,
+      "step": 270
+    },
+    {
+      "epoch": 0.112,
+      "grad_norm": 2.2487740516662598,
+      "learning_rate": 1.7768e-05,
+      "loss": 1.5274,
+      "step": 280
+    },
+    {
+      "epoch": 0.116,
+      "grad_norm": 2.3608744144439697,
+      "learning_rate": 1.7688e-05,
+      "loss": 1.4483,
+      "step": 290
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 2.2422122955322266,
+      "learning_rate": 1.7608e-05,
+      "loss": 1.4307,
+      "step": 300
+    },
+    {
+      "epoch": 0.124,
+      "grad_norm": 2.0272817611694336,
+      "learning_rate": 1.7528e-05,
+      "loss": 1.4409,
+      "step": 310
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 2.469334363937378,
+      "learning_rate": 1.7448e-05,
+      "loss": 1.3973,
+      "step": 320
+    },
+    {
+      "epoch": 0.132,
+      "grad_norm": 2.6341335773468018,
+      "learning_rate": 1.7368e-05,
+      "loss": 1.479,
+      "step": 330
+    },
+    {
+      "epoch": 0.136,
+      "grad_norm": 2.263126850128174,
+      "learning_rate": 1.7288e-05,
+      "loss": 1.361,
+      "step": 340
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 2.4379866123199463,
+      "learning_rate": 1.7208000000000002e-05,
+      "loss": 1.3741,
+      "step": 350
+    },
+    {
+      "epoch": 0.144,
+      "grad_norm": 1.9549463987350464,
+      "learning_rate": 1.7128000000000003e-05,
+      "loss": 1.3458,
+      "step": 360
+    },
+    {
+      "epoch": 0.148,
+      "grad_norm": 1.8918944597244263,
+      "learning_rate": 1.7048000000000003e-05,
+      "loss": 1.3627,
+      "step": 370
+    },
+    {
+      "epoch": 0.152,
+      "grad_norm": 2.4447574615478516,
+      "learning_rate": 1.6968e-05,
+      "loss": 1.4518,
+      "step": 380
+    },
+    {
+      "epoch": 0.156,
+      "grad_norm": 2.7350759506225586,
+      "learning_rate": 1.6888e-05,
+      "loss": 1.2062,
+      "step": 390
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 5.4455156326293945,
+      "learning_rate": 1.6808000000000002e-05,
+      "loss": 1.3981,
+      "step": 400
+    },
+    {
+      "epoch": 0.164,
+      "grad_norm": 2.918653964996338,
+      "learning_rate": 1.6728000000000003e-05,
+      "loss": 1.3089,
+      "step": 410
+    },
+    {
+      "epoch": 0.168,
+      "grad_norm": 2.46914005279541,
+      "learning_rate": 1.6648e-05,
+      "loss": 1.3386,
+      "step": 420
+    },
+    {
+      "epoch": 0.172,
+      "grad_norm": 2.6897878646850586,
+      "learning_rate": 1.6568e-05,
+      "loss": 1.2054,
+      "step": 430
+    },
+    {
+      "epoch": 0.176,
+      "grad_norm": 2.6565287113189697,
+      "learning_rate": 1.6488000000000002e-05,
+      "loss": 1.3508,
+      "step": 440
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 2.7012150287628174,
+      "learning_rate": 1.6408000000000003e-05,
+      "loss": 1.2383,
+      "step": 450
+    },
+    {
+      "epoch": 0.184,
+      "grad_norm": 2.4276187419891357,
+      "learning_rate": 1.6328e-05,
+      "loss": 1.1929,
+      "step": 460
+    },
+    {
+      "epoch": 0.188,
+      "grad_norm": 2.3446202278137207,
+      "learning_rate": 1.6248e-05,
+      "loss": 1.2533,
+      "step": 470
+    },
+    {
+      "epoch": 0.192,
+      "grad_norm": 2.4274702072143555,
+      "learning_rate": 1.6168000000000002e-05,
+      "loss": 1.2686,
+      "step": 480
+    },
+    {
+      "epoch": 0.196,
+      "grad_norm": 2.160151720046997,
+      "learning_rate": 1.6088000000000002e-05,
+      "loss": 1.2466,
+      "step": 490
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 2.388549327850342,
+      "learning_rate": 1.6008e-05,
+      "loss": 1.2247,
+      "step": 500
+    },
+    {
+      "epoch": 0.204,
+      "grad_norm": 3.3736000061035156,
+      "learning_rate": 1.5928e-05,
+      "loss": 1.2758,
+      "step": 510
+    },
+    {
+      "epoch": 0.208,
+      "grad_norm": 1.8944320678710938,
+      "learning_rate": 1.5848e-05,
+      "loss": 1.2294,
+      "step": 520
+    },
+    {
+      "epoch": 0.212,
+      "grad_norm": 2.5767781734466553,
+      "learning_rate": 1.5768000000000002e-05,
+      "loss": 1.3096,
+      "step": 530
+    },
+    {
+      "epoch": 0.216,
+      "grad_norm": 2.6784284114837646,
+      "learning_rate": 1.5688e-05,
+      "loss": 1.2624,
+      "step": 540
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 2.188873052597046,
+      "learning_rate": 1.5608e-05,
+      "loss": 1.2169,
+      "step": 550
+    },
+    {
+      "epoch": 0.224,
+      "grad_norm": 2.460374593734741,
+      "learning_rate": 1.5528e-05,
+      "loss": 1.1571,
+      "step": 560
+    },
+    {
+      "epoch": 0.228,
+      "grad_norm": 2.694652795791626,
+      "learning_rate": 1.5448000000000002e-05,
+      "loss": 1.2368,
+      "step": 570
+    },
+    {
+      "epoch": 0.232,
+      "grad_norm": 2.3777997493743896,
+      "learning_rate": 1.5368e-05,
+      "loss": 1.2236,
+      "step": 580
+    },
+    {
+      "epoch": 0.236,
+      "grad_norm": 2.4224419593811035,
+      "learning_rate": 1.5288e-05,
+      "loss": 1.2562,
+      "step": 590
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 4.032938480377197,
+      "learning_rate": 1.5208e-05,
+      "loss": 1.2291,
+      "step": 600
+    },
+    {
+      "epoch": 0.244,
+      "grad_norm": 2.988703966140747,
+      "learning_rate": 1.5128e-05,
+      "loss": 1.1691,
+      "step": 610
+    },
+    {
+      "epoch": 0.248,
+      "grad_norm": 3.962585687637329,
+      "learning_rate": 1.5048e-05,
+      "loss": 1.2792,
+      "step": 620
+    },
+    {
+      "epoch": 0.252,
+      "grad_norm": 2.3877758979797363,
+      "learning_rate": 1.4968e-05,
+      "loss": 1.1589,
+      "step": 630
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 2.211920976638794,
+      "learning_rate": 1.4888e-05,
+      "loss": 1.1132,
+      "step": 640
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 2.198737144470215,
+      "learning_rate": 1.4808e-05,
+      "loss": 1.2004,
+      "step": 650
+    },
+    {
+      "epoch": 0.264,
+      "grad_norm": 2.451795816421509,
+      "learning_rate": 1.4728000000000002e-05,
+      "loss": 1.1163,
+      "step": 660
+    },
+    {
+      "epoch": 0.268,
+      "grad_norm": 2.8584885597229004,
+      "learning_rate": 1.4648000000000003e-05,
+      "loss": 1.1064,
+      "step": 670
+    },
+    {
+      "epoch": 0.272,
+      "grad_norm": 3.7197935581207275,
+      "learning_rate": 1.4568000000000002e-05,
+      "loss": 1.1866,
+      "step": 680
+    },
+    {
+      "epoch": 0.276,
+      "grad_norm": 2.2992563247680664,
+      "learning_rate": 1.4488000000000003e-05,
+      "loss": 1.1086,
+      "step": 690
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 2.9502830505371094,
+      "learning_rate": 1.4408000000000002e-05,
+      "loss": 1.1798,
+      "step": 700
+    },
+    {
+      "epoch": 0.284,
+      "grad_norm": 2.0384209156036377,
+      "learning_rate": 1.4328000000000002e-05,
+      "loss": 1.0636,
+      "step": 710
+    },
+    {
+      "epoch": 0.288,
+      "grad_norm": 1.7637971639633179,
+      "learning_rate": 1.4248000000000001e-05,
+      "loss": 1.1358,
+      "step": 720
+    },
+    {
+      "epoch": 0.292,
+      "grad_norm": 2.0006768703460693,
+      "learning_rate": 1.4168000000000002e-05,
+      "loss": 1.0458,
+      "step": 730
+    },
+    {
+      "epoch": 0.296,
+      "grad_norm": 2.78169322013855,
+      "learning_rate": 1.4088000000000001e-05,
+      "loss": 1.1397,
+      "step": 740
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 2.5473616123199463,
+      "learning_rate": 1.4008000000000002e-05,
+      "loss": 1.0934,
+      "step": 750
+    },
+    {
+      "epoch": 0.304,
+      "grad_norm": 2.606701135635376,
+      "learning_rate": 1.3928000000000001e-05,
+      "loss": 1.0793,
+      "step": 760
+    },
+    {
+      "epoch": 0.308,
+      "grad_norm": 2.461888074874878,
+      "learning_rate": 1.3848000000000002e-05,
+      "loss": 1.0964,
+      "step": 770
+    },
+    {
+      "epoch": 0.312,
+      "grad_norm": 2.24332857131958,
+      "learning_rate": 1.3768000000000001e-05,
+      "loss": 1.0943,
+      "step": 780
+    },
+    {
+      "epoch": 0.316,
+      "grad_norm": 2.6332437992095947,
+      "learning_rate": 1.3688000000000002e-05,
+      "loss": 1.1136,
+      "step": 790
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 2.7040021419525146,
+      "learning_rate": 1.3608e-05,
+      "loss": 1.2109,
+      "step": 800
+    },
+    {
+      "epoch": 0.324,
+      "grad_norm": 3.8601574897766113,
+      "learning_rate": 1.3528000000000002e-05,
+      "loss": 1.2232,
+      "step": 810
+    },
+    {
+      "epoch": 0.328,
+      "grad_norm": 3.08937668800354,
+      "learning_rate": 1.3448e-05,
+      "loss": 1.1661,
+      "step": 820
+    },
+    {
+      "epoch": 0.332,
+      "grad_norm": 2.297661066055298,
+      "learning_rate": 1.3368000000000001e-05,
+      "loss": 0.9877,
+      "step": 830
+    },
+    {
+      "epoch": 0.336,
+      "grad_norm": 2.062130928039551,
+      "learning_rate": 1.3288e-05,
+      "loss": 0.9535,
+      "step": 840
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 2.4786460399627686,
+      "learning_rate": 1.3208000000000001e-05,
+      "loss": 1.0308,
+      "step": 850
+    },
+    {
+      "epoch": 0.344,
+      "grad_norm": 2.18310809135437,
+      "learning_rate": 1.3128e-05,
+      "loss": 1.0117,
+      "step": 860
+    },
+    {
+      "epoch": 0.348,
+      "grad_norm": 2.427093029022217,
+      "learning_rate": 1.3048000000000001e-05,
+      "loss": 1.1716,
+      "step": 870
+    },
+    {
+      "epoch": 0.352,
+      "grad_norm": 2.1192243099212646,
+      "learning_rate": 1.2968e-05,
+      "loss": 1.1058,
+      "step": 880
+    },
+    {
+      "epoch": 0.356,
+      "grad_norm": 2.2957637310028076,
+      "learning_rate": 1.2888000000000001e-05,
+      "loss": 0.9935,
+      "step": 890
+    },
+    {
+      "epoch": 0.36,
+      "grad_norm": 2.9623489379882812,
+      "learning_rate": 1.2808e-05,
+      "loss": 0.9785,
+      "step": 900
+    },
+    {
+      "epoch": 0.364,
+      "grad_norm": 1.9601175785064697,
+      "learning_rate": 1.2728e-05,
+      "loss": 0.9503,
+      "step": 910
+    },
+    {
+      "epoch": 0.368,
+      "grad_norm": 2.708333730697632,
+      "learning_rate": 1.2648e-05,
+      "loss": 0.9301,
+      "step": 920
+    },
+    {
+      "epoch": 0.372,
+      "grad_norm": 1.687951683998108,
+      "learning_rate": 1.2568e-05,
+      "loss": 1.044,
+      "step": 930
+    },
+    {
+      "epoch": 0.376,
+      "grad_norm": 2.6426076889038086,
+      "learning_rate": 1.2488e-05,
+      "loss": 1.0762,
+      "step": 940
+    },
+    {
+      "epoch": 0.38,
+      "grad_norm": 2.6836400032043457,
+      "learning_rate": 1.2408e-05,
+      "loss": 0.9769,
+      "step": 950
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 2.1915109157562256,
+      "learning_rate": 1.2328e-05,
+      "loss": 0.9242,
+      "step": 960
+    },
+    {
+      "epoch": 0.388,
+      "grad_norm": 2.1659505367279053,
+      "learning_rate": 1.2248000000000002e-05,
+      "loss": 0.9292,
+      "step": 970
+    },
+    {
+      "epoch": 0.392,
+      "grad_norm": 1.3131425380706787,
+      "learning_rate": 1.2168000000000003e-05,
+      "loss": 0.9952,
+      "step": 980
+    },
+    {
+      "epoch": 0.396,
+      "grad_norm": 1.6859830617904663,
+      "learning_rate": 1.2088000000000002e-05,
+      "loss": 0.9113,
+      "step": 990
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 2.733213424682617,
+      "learning_rate": 1.2008000000000003e-05,
+      "loss": 1.0417,
+      "step": 1000
+    },
+    {
+      "epoch": 0.404,
+      "grad_norm": 1.8991081714630127,
+      "learning_rate": 1.1928000000000002e-05,
+      "loss": 1.0169,
+      "step": 1010
+    },
+    {
+      "epoch": 0.408,
+      "grad_norm": 2.302356719970703,
+      "learning_rate": 1.1848000000000002e-05,
+      "loss": 0.976,
+      "step": 1020
+    },
+    {
+      "epoch": 0.412,
+      "grad_norm": 3.2483420372009277,
+      "learning_rate": 1.1768000000000002e-05,
+      "loss": 0.9776,
+      "step": 1030
+    },
+    {
+      "epoch": 0.416,
+      "grad_norm": 2.8546910285949707,
+      "learning_rate": 1.1688000000000002e-05,
+      "loss": 1.0167,
+      "step": 1040
+    },
+    {
+      "epoch": 0.42,
+      "grad_norm": 1.6336743831634521,
+      "learning_rate": 1.1608000000000001e-05,
+      "loss": 0.932,
+      "step": 1050
+    },
+    {
+      "epoch": 0.424,
+      "grad_norm": 1.7520802021026611,
+      "learning_rate": 1.1528000000000002e-05,
+      "loss": 0.9445,
+      "step": 1060
+    },
+    {
+      "epoch": 0.428,
+      "grad_norm": 1.485300064086914,
+      "learning_rate": 1.1448000000000001e-05,
+      "loss": 0.8426,
+      "step": 1070
+    },
+    {
+      "epoch": 0.432,
+      "grad_norm": 2.4218764305114746,
+      "learning_rate": 1.1368000000000002e-05,
+      "loss": 0.8435,
+      "step": 1080
+    },
+    {
+      "epoch": 0.436,
+      "grad_norm": 2.309624195098877,
+      "learning_rate": 1.1288000000000001e-05,
+      "loss": 1.0399,
+      "step": 1090
+    },
+    {
+      "epoch": 0.44,
+      "grad_norm": 2.342698812484741,
+      "learning_rate": 1.1208000000000002e-05,
+      "loss": 0.976,
+      "step": 1100
+    },
+    {
+      "epoch": 0.444,
+      "grad_norm": 3.409608840942383,
+      "learning_rate": 1.1128000000000001e-05,
+      "loss": 0.9518,
+      "step": 1110
+    },
+    {
+      "epoch": 0.448,
+      "grad_norm": 2.1572208404541016,
+      "learning_rate": 1.1048000000000002e-05,
+      "loss": 0.8887,
+      "step": 1120
+    },
+    {
+      "epoch": 0.452,
+      "grad_norm": 1.6231638193130493,
+      "learning_rate": 1.0968e-05,
+      "loss": 0.9394,
+      "step": 1130
+    },
+    {
+      "epoch": 0.456,
+      "grad_norm": 2.244163990020752,
+      "learning_rate": 1.0888000000000001e-05,
+      "loss": 0.8558,
+      "step": 1140
+    },
+    {
+      "epoch": 0.46,
+      "grad_norm": 1.8961011171340942,
+      "learning_rate": 1.0808e-05,
+      "loss": 0.8702,
+      "step": 1150
+    },
+    {
+      "epoch": 0.464,
+      "grad_norm": 2.714932918548584,
+      "learning_rate": 1.0728000000000001e-05,
+      "loss": 0.925,
+      "step": 1160
+    },
+    {
+      "epoch": 0.468,
+      "grad_norm": 2.4342856407165527,
+      "learning_rate": 1.0648e-05,
+      "loss": 0.9112,
+      "step": 1170
+    },
+    {
+      "epoch": 0.472,
+      "grad_norm": 2.8005857467651367,
+      "learning_rate": 1.0568000000000001e-05,
+      "loss": 0.9538,
+      "step": 1180
+    },
+    {
+      "epoch": 0.476,
+      "grad_norm": 2.0518391132354736,
+      "learning_rate": 1.0488e-05,
+      "loss": 0.9468,
+      "step": 1190
+    },
+    {
+      "epoch": 0.48,
+      "grad_norm": 2.055837392807007,
+      "learning_rate": 1.0408000000000001e-05,
+      "loss": 0.9461,
+      "step": 1200
+    },
+    {
+      "epoch": 0.484,
+      "grad_norm": 2.783745527267456,
+      "learning_rate": 1.0328e-05,
+      "loss": 0.9171,
+      "step": 1210
+    },
+    {
+      "epoch": 0.488,
+      "grad_norm": 2.9242422580718994,
+      "learning_rate": 1.0248e-05,
+      "loss": 0.8805,
+      "step": 1220
+    },
+    {
+      "epoch": 0.492,
+      "grad_norm": 2.0500447750091553,
+      "learning_rate": 1.0168e-05,
+      "loss": 0.9339,
+      "step": 1230
+    },
+    {
+      "epoch": 0.496,
+      "grad_norm": 5.169466495513916,
+      "learning_rate": 1.0088e-05,
+      "loss": 0.9204,
+      "step": 1240
+    },
+    {
+      "epoch": 0.5,
+      "grad_norm": 2.055866003036499,
+      "learning_rate": 1.0008e-05,
+      "loss": 0.8512,
+      "step": 1250
+    },
+    {
+      "epoch": 0.504,
+      "grad_norm": 2.6294054985046387,
+      "learning_rate": 9.928e-06,
+      "loss": 0.8987,
+      "step": 1260
+    },
+    {
+      "epoch": 0.508,
+      "grad_norm": 2.4524762630462646,
+      "learning_rate": 9.848000000000001e-06,
+      "loss": 0.8888,
+      "step": 1270
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 2.122523784637451,
+      "learning_rate": 9.768e-06,
+      "loss": 0.9447,
+      "step": 1280
+    },
+    {
+      "epoch": 0.516,
+      "grad_norm": 2.1826155185699463,
+      "learning_rate": 9.688000000000001e-06,
+      "loss": 0.9058,
+      "step": 1290
+    },
+    {
+      "epoch": 0.52,
+      "grad_norm": 2.5300002098083496,
+      "learning_rate": 9.608e-06,
+      "loss": 0.9846,
+      "step": 1300
+    },
+    {
+      "epoch": 0.524,
+      "grad_norm": 2.9742186069488525,
+      "learning_rate": 9.528000000000001e-06,
+      "loss": 0.8508,
+      "step": 1310
+    },
+    {
+      "epoch": 0.528,
+      "grad_norm": 2.04305100440979,
+      "learning_rate": 9.448e-06,
+      "loss": 0.8024,
+      "step": 1320
+    },
+    {
+      "epoch": 0.532,
+      "grad_norm": 3.284501314163208,
+      "learning_rate": 9.368e-06,
+      "loss": 0.8675,
+      "step": 1330
+    },
+    {
+      "epoch": 0.536,
+      "grad_norm": 2.4430627822875977,
+      "learning_rate": 9.288e-06,
+      "loss": 0.9219,
+      "step": 1340
+    },
+    {
+      "epoch": 0.54,
+      "grad_norm": 3.3678085803985596,
+      "learning_rate": 9.208e-06,
+      "loss": 0.8197,
+      "step": 1350
+    },
+    {
+      "epoch": 0.544,
+      "grad_norm": 2.4278366565704346,
+      "learning_rate": 9.128e-06,
+      "loss": 0.8925,
+      "step": 1360
+    },
+    {
+      "epoch": 0.548,
+      "grad_norm": 1.850438117980957,
+      "learning_rate": 9.048e-06,
+      "loss": 0.8104,
+      "step": 1370
+    },
+    {
+      "epoch": 0.552,
+      "grad_norm": 2.19571852684021,
+      "learning_rate": 8.968000000000001e-06,
+      "loss": 0.7943,
+      "step": 1380
+    },
+    {
+      "epoch": 0.556,
+      "grad_norm": 2.3027920722961426,
+      "learning_rate": 8.888e-06,
+      "loss": 0.8311,
+      "step": 1390
+    },
+    {
+      "epoch": 0.56,
+      "grad_norm": 2.2954397201538086,
+      "learning_rate": 8.808000000000001e-06,
+      "loss": 0.9068,
+      "step": 1400
+    },
+    {
+      "epoch": 0.564,
+      "grad_norm": 1.897251009941101,
+      "learning_rate": 8.728e-06,
+      "loss": 0.8213,
+      "step": 1410
+    },
+    {
+      "epoch": 0.568,
+      "grad_norm": 3.836338520050049,
+      "learning_rate": 8.648000000000001e-06,
+      "loss": 0.8516,
+      "step": 1420
+    },
+    {
+      "epoch": 0.572,
+      "grad_norm": 3.1619820594787598,
+      "learning_rate": 8.568e-06,
+      "loss": 0.9575,
+      "step": 1430
+    },
+    {
+      "epoch": 0.576,
+      "grad_norm": 2.0836496353149414,
+      "learning_rate": 8.488e-06,
+      "loss": 0.8856,
+      "step": 1440
+    },
+    {
+      "epoch": 0.58,
+      "grad_norm": 3.4184463024139404,
+      "learning_rate": 8.408e-06,
+      "loss": 0.689,
+      "step": 1450
+    },
+    {
+      "epoch": 0.584,
+      "grad_norm": 2.6366779804229736,
+      "learning_rate": 8.328e-06,
+      "loss": 0.8382,
+      "step": 1460
+    },
+    {
+      "epoch": 0.588,
+      "grad_norm": 2.03658390045166,
+      "learning_rate": 8.248e-06,
+      "loss": 0.8093,
+      "step": 1470
+    },
+    {
+      "epoch": 0.592,
+      "grad_norm": 2.4847910404205322,
+      "learning_rate": 8.168e-06,
+      "loss": 0.7996,
+      "step": 1480
+    },
+    {
+      "epoch": 0.596,
+      "grad_norm": 2.1482226848602295,
+      "learning_rate": 8.088e-06,
+      "loss": 0.8246,
+      "step": 1490
+    },
+    {
+      "epoch": 0.6,
+      "grad_norm": 2.0978434085845947,
+      "learning_rate": 8.008e-06,
+      "loss": 0.7359,
+      "step": 1500
+    },
+    {
+      "epoch": 0.604,
+      "grad_norm": 4.422749996185303,
+      "learning_rate": 7.928e-06,
+      "loss": 0.7479,
+      "step": 1510
+    },
+    {
+      "epoch": 0.608,
+      "grad_norm": 2.035379409790039,
+      "learning_rate": 7.848000000000002e-06,
+      "loss": 0.9107,
+      "step": 1520
+    },
+    {
+      "epoch": 0.612,
+      "grad_norm": 3.720285415649414,
+      "learning_rate": 7.768e-06,
+      "loss": 0.7922,
+      "step": 1530
+    },
+    {
+      "epoch": 0.616,
+      "grad_norm": 3.993703842163086,
+      "learning_rate": 7.688000000000002e-06,
+      "loss": 0.8273,
+      "step": 1540
+    },
+    {
+      "epoch": 0.62,
+      "grad_norm": 2.976177930831909,
+      "learning_rate": 7.608000000000001e-06,
+      "loss": 0.7356,
+      "step": 1550
+    },
+    {
+      "epoch": 0.624,
+      "grad_norm": 2.1457252502441406,
+      "learning_rate": 7.528000000000001e-06,
+      "loss": 0.7757,
+      "step": 1560
+    },
+    {
+      "epoch": 0.628,
+      "grad_norm": 3.819840908050537,
+      "learning_rate": 7.4480000000000005e-06,
+      "loss": 0.8886,
+      "step": 1570
+    },
+    {
+      "epoch": 0.632,
+      "grad_norm": 4.748547554016113,
+      "learning_rate": 7.3680000000000004e-06,
+      "loss": 0.8789,
+      "step": 1580
+    },
+    {
+      "epoch": 0.636,
+      "grad_norm": 2.702115058898926,
+      "learning_rate": 7.288e-06,
+      "loss": 0.9161,
+      "step": 1590
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 4.047061443328857,
+      "learning_rate": 7.208e-06,
+      "loss": 0.9155,
+      "step": 1600
+    },
+    {
+      "epoch": 0.644,
+      "grad_norm": 2.9659008979797363,
+      "learning_rate": 7.128e-06,
+      "loss": 0.8114,
+      "step": 1610
+    },
+    {
+      "epoch": 0.648,
+      "grad_norm": 3.6247856616973877,
+      "learning_rate": 7.048e-06,
+      "loss": 0.7938,
+      "step": 1620
+    },
+    {
+      "epoch": 0.652,
+      "grad_norm": 3.002122402191162,
+      "learning_rate": 6.968e-06,
+      "loss": 0.8127,
+      "step": 1630
+    },
+    {
+      "epoch": 0.656,
+      "grad_norm": 2.6267194747924805,
+      "learning_rate": 6.888e-06,
+      "loss": 0.8381,
+      "step": 1640
+    },
+    {
+      "epoch": 0.66,
+      "grad_norm": 3.199043035507202,
+      "learning_rate": 6.808e-06,
+      "loss": 0.8507,
+      "step": 1650
+    },
+    {
+      "epoch": 0.664,
+      "grad_norm": 3.576150417327881,
+      "learning_rate": 6.728e-06,
+      "loss": 0.731,
+      "step": 1660
+    },
+    {
+      "epoch": 0.668,
+      "grad_norm": 3.59387469291687,
+      "learning_rate": 6.648e-06,
+      "loss": 0.9536,
+      "step": 1670
+    },
+    {
+      "epoch": 0.672,
+      "grad_norm": 1.788014531135559,
+      "learning_rate": 6.568000000000001e-06,
+      "loss": 0.8538,
+      "step": 1680
+    },
+    {
+      "epoch": 0.676,
+      "grad_norm": 2.6225569248199463,
+      "learning_rate": 6.488000000000001e-06,
+      "loss": 0.8049,
+      "step": 1690
+    },
+    {
+      "epoch": 0.68,
+      "grad_norm": 3.2675693035125732,
+      "learning_rate": 6.408000000000001e-06,
+      "loss": 0.8805,
+      "step": 1700
+    },
+    {
+      "epoch": 0.684,
+      "grad_norm": 2.034709930419922,
+      "learning_rate": 6.328000000000001e-06,
+      "loss": 0.7499,
+      "step": 1710
+    },
+    {
+      "epoch": 0.688,
+      "grad_norm": 1.8868328332901,
+      "learning_rate": 6.248000000000001e-06,
+      "loss": 0.8066,
+      "step": 1720
+    },
+    {
+      "epoch": 0.692,
+      "grad_norm": 5.277392864227295,
+      "learning_rate": 6.168000000000001e-06,
+      "loss": 0.7236,
+      "step": 1730
+    },
+    {
+      "epoch": 0.696,
+      "grad_norm": 3.48518443107605,
+      "learning_rate": 6.088000000000001e-06,
+      "loss": 0.8388,
+      "step": 1740
+    },
+    {
+      "epoch": 0.7,
+      "grad_norm": 2.2365574836730957,
+      "learning_rate": 6.008000000000001e-06,
+      "loss": 0.7064,
+      "step": 1750
+    },
+    {
+      "epoch": 0.704,
+      "grad_norm": 2.5346803665161133,
+      "learning_rate": 5.928000000000001e-06,
+      "loss": 0.7754,
+      "step": 1760
+    },
+    {
+      "epoch": 0.708,
+      "grad_norm": 3.9243640899658203,
+      "learning_rate": 5.848000000000001e-06,
+      "loss": 0.85,
+      "step": 1770
+    },
+    {
+      "epoch": 0.712,
+      "grad_norm": 2.9606575965881348,
+      "learning_rate": 5.7680000000000005e-06,
+      "loss": 0.724,
+      "step": 1780
+    },
+    {
+      "epoch": 0.716,
+      "grad_norm": 6.477645397186279,
+      "learning_rate": 5.6880000000000004e-06,
+      "loss": 0.7925,
+      "step": 1790
+    },
+    {
+      "epoch": 0.72,
+      "grad_norm": 3.563234329223633,
+      "learning_rate": 5.608e-06,
+      "loss": 0.7662,
+      "step": 1800
+    },
+    {
+      "epoch": 0.724,
+      "grad_norm": 4.008406639099121,
+      "learning_rate": 5.528e-06,
+      "loss": 0.9116,
+      "step": 1810
+    },
+    {
+      "epoch": 0.728,
+      "grad_norm": 2.8766539096832275,
+      "learning_rate": 5.448e-06,
+      "loss": 0.8675,
+      "step": 1820
+    },
+    {
+      "epoch": 0.732,
+      "grad_norm": 3.101940155029297,
+      "learning_rate": 5.368000000000001e-06,
+      "loss": 0.8026,
+      "step": 1830
+    },
+    {
+      "epoch": 0.736,
+      "grad_norm": 2.1584157943725586,
+      "learning_rate": 5.288000000000001e-06,
+      "loss": 0.7255,
+      "step": 1840
+    },
+    {
+      "epoch": 0.74,
+      "grad_norm": 2.157031774520874,
+      "learning_rate": 5.208000000000001e-06,
+      "loss": 0.7803,
+      "step": 1850
+    },
+    {
+      "epoch": 0.744,
+      "grad_norm": 4.665865898132324,
+      "learning_rate": 5.128000000000001e-06,
+      "loss": 0.8971,
+      "step": 1860
+    },
+    {
+      "epoch": 0.748,
+      "grad_norm": 4.0717244148254395,
+      "learning_rate": 5.048000000000001e-06,
+      "loss": 0.8332,
+      "step": 1870
+    },
+    {
+      "epoch": 0.752,
+      "grad_norm": 3.106688976287842,
+      "learning_rate": 4.9680000000000005e-06,
+      "loss": 0.743,
+      "step": 1880
+    },
+    {
+      "epoch": 0.756,
+      "grad_norm": 3.3465564250946045,
+      "learning_rate": 4.8880000000000005e-06,
+      "loss": 0.8192,
+      "step": 1890
+    },
+    {
+      "epoch": 0.76,
+      "grad_norm": 3.8297150135040283,
+      "learning_rate": 4.808e-06,
+      "loss": 0.748,
+      "step": 1900
+    },
+    {
+      "epoch": 0.764,
+      "grad_norm": 4.068183898925781,
+      "learning_rate": 4.728e-06,
+      "loss": 0.8454,
+      "step": 1910
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 3.2511086463928223,
+      "learning_rate": 4.648e-06,
+      "loss": 0.7184,
+      "step": 1920
+    },
+    {
+      "epoch": 0.772,
+      "grad_norm": 2.2039601802825928,
+      "learning_rate": 4.568e-06,
+      "loss": 0.6962,
+      "step": 1930
+    },
+    {
+      "epoch": 0.776,
+      "grad_norm": 2.8335773944854736,
+      "learning_rate": 4.488e-06,
+      "loss": 0.7419,
+      "step": 1940
+    },
+    {
+      "epoch": 0.78,
+      "grad_norm": 3.4596734046936035,
+      "learning_rate": 4.408000000000001e-06,
+      "loss": 0.7337,
+      "step": 1950
+    },
+    {
+      "epoch": 0.784,
+      "grad_norm": 3.2034690380096436,
+      "learning_rate": 4.328000000000001e-06,
+      "loss": 0.8237,
+      "step": 1960
+    },
+    {
+      "epoch": 0.788,
+      "grad_norm": 2.5316877365112305,
+      "learning_rate": 4.248000000000001e-06,
+      "loss": 0.7452,
+      "step": 1970
+    },
+    {
+      "epoch": 0.792,
+      "grad_norm": 3.9065985679626465,
+      "learning_rate": 4.168000000000001e-06,
+      "loss": 0.7351,
+      "step": 1980
+    },
+    {
+      "epoch": 0.796,
+      "grad_norm": 3.2880232334136963,
+      "learning_rate": 4.0880000000000005e-06,
+      "loss": 0.6624,
+      "step": 1990
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 2.6434664726257324,
+      "learning_rate": 4.008e-06,
+      "loss": 0.6929,
+      "step": 2000
+    },
+    {
+      "epoch": 0.804,
+      "grad_norm": 2.8303961753845215,
+      "learning_rate": 3.928e-06,
+      "loss": 0.7728,
+      "step": 2010
+    },
+    {
+      "epoch": 0.808,
+      "grad_norm": 3.5802438259124756,
+      "learning_rate": 3.848e-06,
+      "loss": 0.7584,
+      "step": 2020
+    },
+    {
+      "epoch": 0.812,
+      "grad_norm": 5.526547431945801,
+      "learning_rate": 3.7680000000000006e-06,
+      "loss": 0.7928,
+      "step": 2030
+    },
+    {
+      "epoch": 0.816,
+      "grad_norm": 2.7042505741119385,
+      "learning_rate": 3.6880000000000005e-06,
+      "loss": 0.6035,
+      "step": 2040
+    },
+    {
+      "epoch": 0.82,
+      "grad_norm": 3.579296112060547,
+      "learning_rate": 3.6080000000000004e-06,
+      "loss": 0.7391,
+      "step": 2050
+    },
+    {
+      "epoch": 0.824,
+      "grad_norm": 2.9411709308624268,
+      "learning_rate": 3.5280000000000004e-06,
+      "loss": 0.7278,
+      "step": 2060
+    },
+    {
+      "epoch": 0.828,
+      "grad_norm": 3.7891061305999756,
+      "learning_rate": 3.4480000000000003e-06,
+      "loss": 0.7715,
+      "step": 2070
+    },
+    {
+      "epoch": 0.832,
+      "grad_norm": 3.165001630783081,
+      "learning_rate": 3.368e-06,
+      "loss": 0.7535,
+      "step": 2080
+    },
+    {
+      "epoch": 0.836,
+      "grad_norm": 2.631362199783325,
+      "learning_rate": 3.288e-06,
+      "loss": 0.7649,
+      "step": 2090
+    },
+    {
+      "epoch": 0.84,
+      "grad_norm": 8.075745582580566,
+      "learning_rate": 3.208e-06,
+      "loss": 0.7965,
+      "step": 2100
+    },
+    {
+      "epoch": 0.844,
+      "grad_norm": 2.704510450363159,
+      "learning_rate": 3.1280000000000004e-06,
+      "loss": 0.7002,
+      "step": 2110
+    },
+    {
+      "epoch": 0.848,
+      "grad_norm": 3.558990478515625,
+      "learning_rate": 3.0480000000000003e-06,
+      "loss": 0.7918,
+      "step": 2120
+    },
+    {
+      "epoch": 0.852,
+      "grad_norm": 3.2328171730041504,
+      "learning_rate": 2.9680000000000002e-06,
+      "loss": 0.6342,
+      "step": 2130
+    },
+    {
+      "epoch": 0.856,
+      "grad_norm": 4.635783672332764,
+      "learning_rate": 2.888e-06,
+      "loss": 0.7259,
+      "step": 2140
+    },
+    {
+      "epoch": 0.86,
+      "grad_norm": 6.2160234451293945,
+      "learning_rate": 2.808e-06,
+      "loss": 0.746,
+      "step": 2150
+    },
+    {
+      "epoch": 0.864,
+      "grad_norm": 2.262748956680298,
+      "learning_rate": 2.728e-06,
+      "loss": 0.7318,
+      "step": 2160
+    },
+    {
+      "epoch": 0.868,
+      "grad_norm": 2.518980026245117,
+      "learning_rate": 2.648e-06,
+      "loss": 0.7428,
+      "step": 2170
+    },
+    {
+      "epoch": 0.872,
+      "grad_norm": 4.256569862365723,
+      "learning_rate": 2.568e-06,
+      "loss": 0.6773,
+      "step": 2180
+    },
+    {
+      "epoch": 0.876,
+      "grad_norm": 4.446414470672607,
+      "learning_rate": 2.488e-06,
+      "loss": 0.8276,
+      "step": 2190
+    },
+    {
+      "epoch": 0.88,
+      "grad_norm": 4.081722736358643,
+      "learning_rate": 2.408e-06,
+      "loss": 0.6634,
+      "step": 2200
+    },
+    {
+      "epoch": 0.884,
+      "grad_norm": 3.0264625549316406,
+      "learning_rate": 2.3280000000000004e-06,
+      "loss": 0.6893,
+      "step": 2210
+    },
+    {
+      "epoch": 0.888,
+      "grad_norm": 2.5336077213287354,
+      "learning_rate": 2.2480000000000003e-06,
+      "loss": 0.7381,
+      "step": 2220
+    },
+    {
+      "epoch": 0.892,
+      "grad_norm": 2.8016293048858643,
+      "learning_rate": 2.1680000000000002e-06,
+      "loss": 0.7699,
+      "step": 2230
+    },
+    {
+      "epoch": 0.896,
+      "grad_norm": 4.106541156768799,
+      "learning_rate": 2.088e-06,
+      "loss": 0.7099,
+      "step": 2240
+    },
+    {
+      "epoch": 0.9,
+      "grad_norm": 2.492737054824829,
+      "learning_rate": 2.008e-06,
+      "loss": 0.7294,
+      "step": 2250
+    },
+    {
+      "epoch": 0.904,
+      "grad_norm": 2.4935927391052246,
+      "learning_rate": 1.928e-06,
+      "loss": 0.7498,
+      "step": 2260
+    },
+    {
+      "epoch": 0.908,
+      "grad_norm": 3.0411596298217773,
+      "learning_rate": 1.8480000000000001e-06,
+      "loss": 0.755,
+      "step": 2270
+    },
+    {
+      "epoch": 0.912,
+      "grad_norm": 3.7122206687927246,
+      "learning_rate": 1.7680000000000003e-06,
+      "loss": 0.7554,
+      "step": 2280
+    },
+    {
+      "epoch": 0.916,
+      "grad_norm": 2.088099718093872,
+      "learning_rate": 1.6880000000000002e-06,
+      "loss": 0.8632,
+      "step": 2290
+    },
+    {
+      "epoch": 0.92,
+      "grad_norm": 2.7009341716766357,
+      "learning_rate": 1.608e-06,
+      "loss": 0.6919,
+      "step": 2300
+    },
+    {
+      "epoch": 0.924,
+      "grad_norm": 6.126946449279785,
+      "learning_rate": 1.528e-06,
+      "loss": 0.7582,
+      "step": 2310
+    },
+    {
+      "epoch": 0.928,
+      "grad_norm": 2.268087387084961,
+      "learning_rate": 1.4480000000000002e-06,
+      "loss": 0.6809,
+      "step": 2320
+    },
+    {
+      "epoch": 0.932,
+      "grad_norm": 3.783358335494995,
+      "learning_rate": 1.368e-06,
+      "loss": 0.7445,
+      "step": 2330
+    },
+    {
+      "epoch": 0.936,
+      "grad_norm": 4.899768829345703,
+      "learning_rate": 1.288e-06,
+      "loss": 0.7178,
+      "step": 2340
+    },
+    {
+      "epoch": 0.94,
+      "grad_norm": 3.765782356262207,
+      "learning_rate": 1.2080000000000001e-06,
+      "loss": 0.7609,
+      "step": 2350
+    },
+    {
+      "epoch": 0.944,
+      "grad_norm": 4.587326526641846,
+      "learning_rate": 1.128e-06,
+      "loss": 0.6941,
+      "step": 2360
+    },
+    {
+      "epoch": 0.948,
+      "grad_norm": 4.030784606933594,
+      "learning_rate": 1.0480000000000002e-06,
+      "loss": 0.6918,
+      "step": 2370
+    },
+    {
+      "epoch": 0.952,
+      "grad_norm": 4.84142541885376,
+      "learning_rate": 9.68e-07,
+      "loss": 0.6424,
+      "step": 2380
+    },
+    {
+      "epoch": 0.956,
+      "grad_norm": 7.12539529800415,
+      "learning_rate": 8.880000000000001e-07,
+      "loss": 0.7323,
+      "step": 2390
+    },
+    {
+      "epoch": 0.96,
+      "grad_norm": 2.630418539047241,
+      "learning_rate": 8.08e-07,
+      "loss": 0.7747,
+      "step": 2400
+    },
+    {
+      "epoch": 0.964,
+      "grad_norm": 3.6240687370300293,
+      "learning_rate": 7.280000000000001e-07,
+      "loss": 0.6401,
+      "step": 2410
+    },
+    {
+      "epoch": 0.968,
+      "grad_norm": 2.6736936569213867,
+      "learning_rate": 6.48e-07,
+      "loss": 0.6785,
+      "step": 2420
+    },
+    {
+      "epoch": 0.972,
+      "grad_norm": 2.631958246231079,
+      "learning_rate": 5.680000000000001e-07,
+      "loss": 0.7415,
+      "step": 2430
+    },
+    {
+      "epoch": 0.976,
+      "grad_norm": 2.158177137374878,
+      "learning_rate": 4.88e-07,
+      "loss": 0.744,
+      "step": 2440
+    },
+    {
+      "epoch": 0.98,
+      "grad_norm": 4.066929817199707,
+      "learning_rate": 4.0800000000000005e-07,
+      "loss": 0.7974,
+      "step": 2450
+    },
+    {
+      "epoch": 0.984,
+      "grad_norm": 3.5081562995910645,
+      "learning_rate": 3.280000000000001e-07,
+      "loss": 0.7216,
+      "step": 2460
+    },
+    {
+      "epoch": 0.988,
+      "grad_norm": 3.929936170578003,
+      "learning_rate": 2.48e-07,
+      "loss": 0.6835,
+      "step": 2470
+    },
+    {
+      "epoch": 0.992,
+      "grad_norm": 2.827505350112915,
+      "learning_rate": 1.68e-07,
+      "loss": 0.7287,
+      "step": 2480
+    },
+    {
+      "epoch": 0.996,
+      "grad_norm": 3.1354634761810303,
+      "learning_rate": 8.800000000000001e-08,
+      "loss": 0.6532,
+      "step": 2490
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 2.7581207752227783,
+      "learning_rate": 8e-09,
+      "loss": 0.7283,
+      "step": 2500
+    },
+    {
+      "epoch": 1.0,
+      "eval_accuracy": 0.7924654812754545,
+      "eval_f1": 0.4610274716746857,
+      "eval_loss": 0.6840835213661194,
+      "eval_precision": 0.4109801317173059,
+      "eval_recall": 0.5249540506150149,
+      "eval_runtime": 53.0748,
+      "eval_samples_per_second": 188.413,
+      "eval_steps_per_second": 23.552,
+      "step": 2500
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 2500,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3884764894814016.0,
+  "train_batch_size": 8,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c2aa3da10f1aa4078db41327431415d5a2b91757462db489e71c0643d1fd6815
+size 5777

checkpoints/ner-wikiann-bert-smoke/checkpoint-2500/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoints/ner-wikiann-bert-smoke/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

checkpoints/ner-wikiann-bert-smoke/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoints/ner-wikiann-bert-smoke/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": false,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

checkpoints/ner-wikiann-bert-smoke/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff