Training in progress, step 146, checkpoint

Browse files

Files changed (12) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +40 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer.model +3 -0
last-checkpoint/tokenizer_config.json +88 -0
last-checkpoint/trainer_state.json +1055 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: defog/sqlcoder-7b-2
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "defog/sqlcoder-7b-2",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "o_proj",
+    "gate_proj",
+    "q_proj",
+    "down_proj",
+    "v_proj",
+    "up_proj",
+    "k_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3c41ebb5355b482e4162781db16445d62945a9d03bbb0e145341ea955321c94e
+size 80013120

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:75ca82ddc972ad9524a3be43d2fa258066014bb070b05f188ba109610a1e6cb0
+size 41119636

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b5393297e1bbcddf4861a745a1364a8bbdfebf48added837922527703b7c8817
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8dd0707e902cc421dc15bd6c24f9133efedd0e8bb56350fb0e5085b6588e6f18
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "additional_special_tokens": [
+    "▁<PRE>",
+    "▁<MID>",
+    "▁<SUF>",
+    "▁<EOT>",
+    "▁<PRE>",
+    "▁<MID>",
+    "▁<SUF>",
+    "▁<EOT>"
+  ],
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45ccb9c8b6b561889acea59191d66986d314e7cbd6a78abc6e49b139ca91c1e6
+size 500058

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,88 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32007": {
+      "content": "▁<PRE>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32008": {
+      "content": "▁<SUF>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32009": {
+      "content": "▁<MID>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32010": {
+      "content": "▁<EOT>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "▁<PRE>",
+    "▁<MID>",
+    "▁<SUF>",
+    "▁<EOT>",
+    "▁<PRE>",
+    "▁<MID>",
+    "▁<SUF>",
+    "▁<EOT>"
+  ],
+  "bos_token": "<s>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "eot_token": "▁<EOT>",
+  "fill_token": "<FILL_ME>",
+  "legacy": null,
+  "middle_token": "▁<MID>",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "prefix_token": "▁<PRE>",
+  "sp_model_kwargs": {},
+  "suffix_token": "▁<SUF>",
+  "tokenizer_class": "CodeLlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1055 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.25053625053625056,
+  "eval_steps": 500,
+  "global_step": 146,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.001716001716001716,
+      "grad_norm": 0.09711357951164246,
+      "learning_rate": 2e-05,
+      "loss": 1.2456,
+      "step": 1
+    },
+    {
+      "epoch": 0.003432003432003432,
+      "grad_norm": 0.09846765547990799,
+      "learning_rate": 4e-05,
+      "loss": 1.2554,
+      "step": 2
+    },
+    {
+      "epoch": 0.005148005148005148,
+      "grad_norm": 0.11242333054542542,
+      "learning_rate": 6e-05,
+      "loss": 1.3329,
+      "step": 3
+    },
+    {
+      "epoch": 0.006864006864006864,
+      "grad_norm": 0.11436708271503448,
+      "learning_rate": 8e-05,
+      "loss": 1.2801,
+      "step": 4
+    },
+    {
+      "epoch": 0.00858000858000858,
+      "grad_norm": 0.14455951750278473,
+      "learning_rate": 0.0001,
+      "loss": 1.3653,
+      "step": 5
+    },
+    {
+      "epoch": 0.010296010296010296,
+      "grad_norm": 0.1637658029794693,
+      "learning_rate": 9.999926144471874e-05,
+      "loss": 1.2566,
+      "step": 6
+    },
+    {
+      "epoch": 0.012012012012012012,
+      "grad_norm": 0.1974203735589981,
+      "learning_rate": 9.999704580069346e-05,
+      "loss": 1.2676,
+      "step": 7
+    },
+    {
+      "epoch": 0.013728013728013728,
+      "grad_norm": 0.20523439347743988,
+      "learning_rate": 9.999335313337923e-05,
+      "loss": 1.3186,
+      "step": 8
+    },
+    {
+      "epoch": 0.015444015444015444,
+      "grad_norm": 0.20027151703834534,
+      "learning_rate": 9.99881835518656e-05,
+      "loss": 1.2834,
+      "step": 9
+    },
+    {
+      "epoch": 0.01716001716001716,
+      "grad_norm": 0.22311723232269287,
+      "learning_rate": 9.998153720887342e-05,
+      "loss": 1.2737,
+      "step": 10
+    },
+    {
+      "epoch": 0.018876018876018877,
+      "grad_norm": 0.20790226757526398,
+      "learning_rate": 9.997341430075036e-05,
+      "loss": 1.2268,
+      "step": 11
+    },
+    {
+      "epoch": 0.02059202059202059,
+      "grad_norm": 0.18226559460163116,
+      "learning_rate": 9.99638150674651e-05,
+      "loss": 1.2264,
+      "step": 12
+    },
+    {
+      "epoch": 0.02230802230802231,
+      "grad_norm": 0.1926519274711609,
+      "learning_rate": 9.995273979260022e-05,
+      "loss": 1.1828,
+      "step": 13
+    },
+    {
+      "epoch": 0.024024024024024024,
+      "grad_norm": 0.20070688426494598,
+      "learning_rate": 9.994018880334383e-05,
+      "loss": 1.2728,
+      "step": 14
+    },
+    {
+      "epoch": 0.02574002574002574,
+      "grad_norm": 0.21836882829666138,
+      "learning_rate": 9.99261624704799e-05,
+      "loss": 1.2733,
+      "step": 15
+    },
+    {
+      "epoch": 0.027456027456027456,
+      "grad_norm": 0.2309180200099945,
+      "learning_rate": 9.991066120837731e-05,
+      "loss": 1.3171,
+      "step": 16
+    },
+    {
+      "epoch": 0.02917202917202917,
+      "grad_norm": 0.30299851298332214,
+      "learning_rate": 9.989368547497763e-05,
+      "loss": 1.2062,
+      "step": 17
+    },
+    {
+      "epoch": 0.03088803088803089,
+      "grad_norm": 0.23873500525951385,
+      "learning_rate": 9.987523577178155e-05,
+      "loss": 1.2422,
+      "step": 18
+    },
+    {
+      "epoch": 0.03260403260403261,
+      "grad_norm": 0.2547118365764618,
+      "learning_rate": 9.985531264383412e-05,
+      "loss": 1.1421,
+      "step": 19
+    },
+    {
+      "epoch": 0.03432003432003432,
+      "grad_norm": 0.24708643555641174,
+      "learning_rate": 9.983391667970859e-05,
+      "loss": 1.2756,
+      "step": 20
+    },
+    {
+      "epoch": 0.036036036036036036,
+      "grad_norm": 0.25113534927368164,
+      "learning_rate": 9.981104851148904e-05,
+      "loss": 1.1768,
+      "step": 21
+    },
+    {
+      "epoch": 0.037752037752037754,
+      "grad_norm": 0.25495368242263794,
+      "learning_rate": 9.978670881475172e-05,
+      "loss": 1.1657,
+      "step": 22
+    },
+    {
+      "epoch": 0.039468039468039465,
+      "grad_norm": 0.2855076789855957,
+      "learning_rate": 9.976089830854514e-05,
+      "loss": 1.2062,
+      "step": 23
+    },
+    {
+      "epoch": 0.04118404118404118,
+      "grad_norm": 0.30621200799942017,
+      "learning_rate": 9.973361775536866e-05,
+      "loss": 1.174,
+      "step": 24
+    },
+    {
+      "epoch": 0.0429000429000429,
+      "grad_norm": 0.30931034684181213,
+      "learning_rate": 9.97048679611502e-05,
+      "loss": 1.1911,
+      "step": 25
+    },
+    {
+      "epoch": 0.04461604461604462,
+      "grad_norm": 0.30715975165367126,
+      "learning_rate": 9.96746497752222e-05,
+      "loss": 1.1963,
+      "step": 26
+    },
+    {
+      "epoch": 0.04633204633204633,
+      "grad_norm": 0.30101117491722107,
+      "learning_rate": 9.964296409029675e-05,
+      "loss": 1.1409,
+      "step": 27
+    },
+    {
+      "epoch": 0.04804804804804805,
+      "grad_norm": 0.3699706792831421,
+      "learning_rate": 9.960981184243903e-05,
+      "loss": 1.22,
+      "step": 28
+    },
+    {
+      "epoch": 0.049764049764049766,
+      "grad_norm": 0.3153722286224365,
+      "learning_rate": 9.957519401103972e-05,
+      "loss": 1.1653,
+      "step": 29
+    },
+    {
+      "epoch": 0.05148005148005148,
+      "grad_norm": 0.3458651900291443,
+      "learning_rate": 9.953911161878612e-05,
+      "loss": 1.1326,
+      "step": 30
+    },
+    {
+      "epoch": 0.053196053196053195,
+      "grad_norm": 0.36204057931900024,
+      "learning_rate": 9.950156573163192e-05,
+      "loss": 1.24,
+      "step": 31
+    },
+    {
+      "epoch": 0.05491205491205491,
+      "grad_norm": 0.37383830547332764,
+      "learning_rate": 9.946255745876562e-05,
+      "loss": 1.1419,
+      "step": 32
+    },
+    {
+      "epoch": 0.05662805662805663,
+      "grad_norm": 0.3656322658061981,
+      "learning_rate": 9.942208795257786e-05,
+      "loss": 1.0709,
+      "step": 33
+    },
+    {
+      "epoch": 0.05834405834405834,
+      "grad_norm": 0.4202072024345398,
+      "learning_rate": 9.938015840862733e-05,
+      "loss": 1.1721,
+      "step": 34
+    },
+    {
+      "epoch": 0.06006006006006006,
+      "grad_norm": 0.40786582231521606,
+      "learning_rate": 9.93367700656055e-05,
+      "loss": 1.2163,
+      "step": 35
+    },
+    {
+      "epoch": 0.06177606177606178,
+      "grad_norm": 0.5648673176765442,
+      "learning_rate": 9.929192420529995e-05,
+      "loss": 1.0535,
+      "step": 36
+    },
+    {
+      "epoch": 0.06349206349206349,
+      "grad_norm": 0.39177295565605164,
+      "learning_rate": 9.924562215255655e-05,
+      "loss": 1.0978,
+      "step": 37
+    },
+    {
+      "epoch": 0.06520806520806521,
+      "grad_norm": 0.4586556553840637,
+      "learning_rate": 9.919786527524035e-05,
+      "loss": 1.1455,
+      "step": 38
+    },
+    {
+      "epoch": 0.06692406692406692,
+      "grad_norm": 0.5327500700950623,
+      "learning_rate": 9.91486549841951e-05,
+      "loss": 1.2103,
+      "step": 39
+    },
+    {
+      "epoch": 0.06864006864006864,
+      "grad_norm": 0.547812819480896,
+      "learning_rate": 9.90979927332016e-05,
+      "loss": 1.1527,
+      "step": 40
+    },
+    {
+      "epoch": 0.07035607035607036,
+      "grad_norm": 0.501379668712616,
+      "learning_rate": 9.904588001893477e-05,
+      "loss": 1.1252,
+      "step": 41
+    },
+    {
+      "epoch": 0.07207207207207207,
+      "grad_norm": 0.4829859733581543,
+      "learning_rate": 9.899231838091944e-05,
+      "loss": 1.1408,
+      "step": 42
+    },
+    {
+      "epoch": 0.07378807378807378,
+      "grad_norm": 0.5352693200111389,
+      "learning_rate": 9.893730940148482e-05,
+      "loss": 1.1626,
+      "step": 43
+    },
+    {
+      "epoch": 0.07550407550407551,
+      "grad_norm": 0.5864397883415222,
+      "learning_rate": 9.888085470571782e-05,
+      "loss": 1.1083,
+      "step": 44
+    },
+    {
+      "epoch": 0.07722007722007722,
+      "grad_norm": 0.573275089263916,
+      "learning_rate": 9.882295596141496e-05,
+      "loss": 1.0658,
+      "step": 45
+    },
+    {
+      "epoch": 0.07893607893607893,
+      "grad_norm": 0.6483068466186523,
+      "learning_rate": 9.87636148790332e-05,
+      "loss": 1.2207,
+      "step": 46
+    },
+    {
+      "epoch": 0.08065208065208065,
+      "grad_norm": 0.7409388422966003,
+      "learning_rate": 9.870283321163934e-05,
+      "loss": 1.2229,
+      "step": 47
+    },
+    {
+      "epoch": 0.08236808236808237,
+      "grad_norm": 0.8097925782203674,
+      "learning_rate": 9.864061275485821e-05,
+      "loss": 1.1792,
+      "step": 48
+    },
+    {
+      "epoch": 0.08408408408408409,
+      "grad_norm": 0.6753950119018555,
+      "learning_rate": 9.85769553468197e-05,
+      "loss": 1.1306,
+      "step": 49
+    },
+    {
+      "epoch": 0.0858000858000858,
+      "grad_norm": 0.9566589593887329,
+      "learning_rate": 9.851186286810441e-05,
+      "loss": 1.1484,
+      "step": 50
+    },
+    {
+      "epoch": 0.08751608751608751,
+      "grad_norm": 0.4181309640407562,
+      "learning_rate": 9.844533724168809e-05,
+      "loss": 1.1483,
+      "step": 51
+    },
+    {
+      "epoch": 0.08923208923208924,
+      "grad_norm": 0.44185400009155273,
+      "learning_rate": 9.837738043288486e-05,
+      "loss": 1.1995,
+      "step": 52
+    },
+    {
+      "epoch": 0.09094809094809095,
+      "grad_norm": 0.4195774793624878,
+      "learning_rate": 9.83079944492891e-05,
+      "loss": 1.1264,
+      "step": 53
+    },
+    {
+      "epoch": 0.09266409266409266,
+      "grad_norm": 0.39104709029197693,
+      "learning_rate": 9.823718134071623e-05,
+      "loss": 1.1822,
+      "step": 54
+    },
+    {
+      "epoch": 0.09438009438009438,
+      "grad_norm": 0.37261953949928284,
+      "learning_rate": 9.816494319914203e-05,
+      "loss": 1.0831,
+      "step": 55
+    },
+    {
+      "epoch": 0.0960960960960961,
+      "grad_norm": 0.32925930619239807,
+      "learning_rate": 9.809128215864097e-05,
+      "loss": 1.0933,
+      "step": 56
+    },
+    {
+      "epoch": 0.0978120978120978,
+      "grad_norm": 0.3111138939857483,
+      "learning_rate": 9.801620039532302e-05,
+      "loss": 1.0682,
+      "step": 57
+    },
+    {
+      "epoch": 0.09952809952809953,
+      "grad_norm": 0.32866913080215454,
+      "learning_rate": 9.793970012726954e-05,
+      "loss": 1.144,
+      "step": 58
+    },
+    {
+      "epoch": 0.10124410124410124,
+      "grad_norm": 0.32640203833580017,
+      "learning_rate": 9.786178361446759e-05,
+      "loss": 1.0917,
+      "step": 59
+    },
+    {
+      "epoch": 0.10296010296010295,
+      "grad_norm": 0.3133028447628021,
+      "learning_rate": 9.778245315874326e-05,
+      "loss": 1.1157,
+      "step": 60
+    },
+    {
+      "epoch": 0.10467610467610468,
+      "grad_norm": 0.3583860695362091,
+      "learning_rate": 9.770171110369362e-05,
+      "loss": 1.1668,
+      "step": 61
+    },
+    {
+      "epoch": 0.10639210639210639,
+      "grad_norm": 0.3254780173301697,
+      "learning_rate": 9.761955983461754e-05,
+      "loss": 1.089,
+      "step": 62
+    },
+    {
+      "epoch": 0.10810810810810811,
+      "grad_norm": 0.3556046783924103,
+      "learning_rate": 9.753600177844513e-05,
+      "loss": 1.0566,
+      "step": 63
+    },
+    {
+      "epoch": 0.10982410982410983,
+      "grad_norm": 0.3384726941585541,
+      "learning_rate": 9.745103940366616e-05,
+      "loss": 1.0697,
+      "step": 64
+    },
+    {
+      "epoch": 0.11154011154011154,
+      "grad_norm": 0.3599175214767456,
+      "learning_rate": 9.736467522025705e-05,
+      "loss": 1.1045,
+      "step": 65
+    },
+    {
+      "epoch": 0.11325611325611326,
+      "grad_norm": 0.3554559648036957,
+      "learning_rate": 9.727691177960677e-05,
+      "loss": 1.0962,
+      "step": 66
+    },
+    {
+      "epoch": 0.11497211497211497,
+      "grad_norm": 0.3742257356643677,
+      "learning_rate": 9.718775167444139e-05,
+      "loss": 1.2057,
+      "step": 67
+    },
+    {
+      "epoch": 0.11668811668811668,
+      "grad_norm": 0.3769656717777252,
+      "learning_rate": 9.709719753874758e-05,
+      "loss": 1.0612,
+      "step": 68
+    },
+    {
+      "epoch": 0.11840411840411841,
+      "grad_norm": 0.3800525963306427,
+      "learning_rate": 9.700525204769475e-05,
+      "loss": 1.0964,
+      "step": 69
+    },
+    {
+      "epoch": 0.12012012012012012,
+      "grad_norm": 0.37507057189941406,
+      "learning_rate": 9.691191791755603e-05,
+      "loss": 1.0917,
+      "step": 70
+    },
+    {
+      "epoch": 0.12183612183612183,
+      "grad_norm": 0.3837190568447113,
+      "learning_rate": 9.681719790562801e-05,
+      "loss": 1.0983,
+      "step": 71
+    },
+    {
+      "epoch": 0.12355212355212356,
+      "grad_norm": 0.4015348255634308,
+      "learning_rate": 9.672109481014929e-05,
+      "loss": 1.171,
+      "step": 72
+    },
+    {
+      "epoch": 0.12526812526812528,
+      "grad_norm": 0.40058571100234985,
+      "learning_rate": 9.662361147021779e-05,
+      "loss": 1.0362,
+      "step": 73
+    },
+    {
+      "epoch": 0.12698412698412698,
+      "grad_norm": 0.43071097135543823,
+      "learning_rate": 9.652475076570697e-05,
+      "loss": 1.1434,
+      "step": 74
+    },
+    {
+      "epoch": 0.1287001287001287,
+      "grad_norm": 0.40707287192344666,
+      "learning_rate": 9.642451561718064e-05,
+      "loss": 1.0598,
+      "step": 75
+    },
+    {
+      "epoch": 0.13041613041613043,
+      "grad_norm": 0.4879066050052643,
+      "learning_rate": 9.632290898580671e-05,
+      "loss": 1.135,
+      "step": 76
+    },
+    {
+      "epoch": 0.13213213213213212,
+      "grad_norm": 0.4728144407272339,
+      "learning_rate": 9.621993387326978e-05,
+      "loss": 1.0582,
+      "step": 77
+    },
+    {
+      "epoch": 0.13384813384813385,
+      "grad_norm": 0.45900958776474,
+      "learning_rate": 9.611559332168234e-05,
+      "loss": 1.1124,
+      "step": 78
+    },
+    {
+      "epoch": 0.13556413556413557,
+      "grad_norm": 0.44445809721946716,
+      "learning_rate": 9.600989041349505e-05,
+      "loss": 0.9992,
+      "step": 79
+    },
+    {
+      "epoch": 0.13728013728013727,
+      "grad_norm": 0.46638357639312744,
+      "learning_rate": 9.590282827140551e-05,
+      "loss": 1.0844,
+      "step": 80
+    },
+    {
+      "epoch": 0.138996138996139,
+      "grad_norm": 0.4662153720855713,
+      "learning_rate": 9.579441005826618e-05,
+      "loss": 1.0646,
+      "step": 81
+    },
+    {
+      "epoch": 0.14071214071214072,
+      "grad_norm": 0.5344727039337158,
+      "learning_rate": 9.568463897699079e-05,
+      "loss": 1.0671,
+      "step": 82
+    },
+    {
+      "epoch": 0.14242814242814242,
+      "grad_norm": 0.4650517702102661,
+      "learning_rate": 9.557351827045981e-05,
+      "loss": 0.9377,
+      "step": 83
+    },
+    {
+      "epoch": 0.14414414414414414,
+      "grad_norm": 0.5511200428009033,
+      "learning_rate": 9.546105122142463e-05,
+      "loss": 0.9586,
+      "step": 84
+    },
+    {
+      "epoch": 0.14586014586014587,
+      "grad_norm": 0.5554044842720032,
+      "learning_rate": 9.534724115241059e-05,
+      "loss": 1.0638,
+      "step": 85
+    },
+    {
+      "epoch": 0.14757614757614756,
+      "grad_norm": 0.5424264669418335,
+      "learning_rate": 9.523209142561877e-05,
+      "loss": 0.9638,
+      "step": 86
+    },
+    {
+      "epoch": 0.1492921492921493,
+      "grad_norm": 0.5627579092979431,
+      "learning_rate": 9.511560544282676e-05,
+      "loss": 1.072,
+      "step": 87
+    },
+    {
+      "epoch": 0.15100815100815101,
+      "grad_norm": 0.4964519441127777,
+      "learning_rate": 9.499778664528802e-05,
+      "loss": 0.9864,
+      "step": 88
+    },
+    {
+      "epoch": 0.1527241527241527,
+      "grad_norm": 0.5996085405349731,
+      "learning_rate": 9.487863851363038e-05,
+      "loss": 1.1311,
+      "step": 89
+    },
+    {
+      "epoch": 0.15444015444015444,
+      "grad_norm": 0.5993834733963013,
+      "learning_rate": 9.475816456775313e-05,
+      "loss": 1.0738,
+      "step": 90
+    },
+    {
+      "epoch": 0.15615615615615616,
+      "grad_norm": 0.6142958998680115,
+      "learning_rate": 9.4636368366723e-05,
+      "loss": 1.0995,
+      "step": 91
+    },
+    {
+      "epoch": 0.15787215787215786,
+      "grad_norm": 0.6285070180892944,
+      "learning_rate": 9.45132535086691e-05,
+      "loss": 1.0875,
+      "step": 92
+    },
+    {
+      "epoch": 0.15958815958815958,
+      "grad_norm": 0.6185257434844971,
+      "learning_rate": 9.43888236306766e-05,
+      "loss": 1.0276,
+      "step": 93
+    },
+    {
+      "epoch": 0.1613041613041613,
+      "grad_norm": 0.653634786605835,
+      "learning_rate": 9.426308240867921e-05,
+      "loss": 0.9737,
+      "step": 94
+    },
+    {
+      "epoch": 0.16302016302016303,
+      "grad_norm": 0.7628722786903381,
+      "learning_rate": 9.413603355735069e-05,
+      "loss": 1.1355,
+      "step": 95
+    },
+    {
+      "epoch": 0.16473616473616473,
+      "grad_norm": 0.7738482356071472,
+      "learning_rate": 9.400768082999504e-05,
+      "loss": 1.0764,
+      "step": 96
+    },
+    {
+      "epoch": 0.16645216645216646,
+      "grad_norm": 0.8096074461936951,
+      "learning_rate": 9.387802801843563e-05,
+      "loss": 1.01,
+      "step": 97
+    },
+    {
+      "epoch": 0.16816816816816818,
+      "grad_norm": 0.8919829726219177,
+      "learning_rate": 9.374707895290324e-05,
+      "loss": 1.0838,
+      "step": 98
+    },
+    {
+      "epoch": 0.16988416988416988,
+      "grad_norm": 0.8296171426773071,
+      "learning_rate": 9.361483750192282e-05,
+      "loss": 1.0158,
+      "step": 99
+    },
+    {
+      "epoch": 0.1716001716001716,
+      "grad_norm": 1.0655070543289185,
+      "learning_rate": 9.348130757219924e-05,
+      "loss": 1.0737,
+      "step": 100
+    },
+    {
+      "epoch": 0.17331617331617333,
+      "grad_norm": 0.3891691565513611,
+      "learning_rate": 9.334649310850189e-05,
+      "loss": 1.146,
+      "step": 101
+    },
+    {
+      "epoch": 0.17503217503217502,
+      "grad_norm": 0.42428693175315857,
+      "learning_rate": 9.321039809354814e-05,
+      "loss": 1.2445,
+      "step": 102
+    },
+    {
+      "epoch": 0.17674817674817675,
+      "grad_norm": 0.4223214089870453,
+      "learning_rate": 9.307302654788568e-05,
+      "loss": 1.1798,
+      "step": 103
+    },
+    {
+      "epoch": 0.17846417846417847,
+      "grad_norm": 0.4346042275428772,
+      "learning_rate": 9.293438252977371e-05,
+      "loss": 1.1602,
+      "step": 104
+    },
+    {
+      "epoch": 0.18018018018018017,
+      "grad_norm": 0.44222313165664673,
+      "learning_rate": 9.279447013506313e-05,
+      "loss": 1.095,
+      "step": 105
+    },
+    {
+      "epoch": 0.1818961818961819,
+      "grad_norm": 0.3971473276615143,
+      "learning_rate": 9.265329349707543e-05,
+      "loss": 1.0941,
+      "step": 106
+    },
+    {
+      "epoch": 0.18361218361218362,
+      "grad_norm": 0.3860844373703003,
+      "learning_rate": 9.251085678648072e-05,
+      "loss": 1.1248,
+      "step": 107
+    },
+    {
+      "epoch": 0.18532818532818532,
+      "grad_norm": 0.41026297211647034,
+      "learning_rate": 9.236716421117434e-05,
+      "loss": 1.2089,
+      "step": 108
+    },
+    {
+      "epoch": 0.18704418704418704,
+      "grad_norm": 0.3841565251350403,
+      "learning_rate": 9.222222001615274e-05,
+      "loss": 1.0596,
+      "step": 109
+    },
+    {
+      "epoch": 0.18876018876018877,
+      "grad_norm": 0.3690682053565979,
+      "learning_rate": 9.207602848338795e-05,
+      "loss": 1.0207,
+      "step": 110
+    },
+    {
+      "epoch": 0.19047619047619047,
+      "grad_norm": 0.38501814007759094,
+      "learning_rate": 9.192859393170108e-05,
+      "loss": 1.0972,
+      "step": 111
+    },
+    {
+      "epoch": 0.1921921921921922,
+      "grad_norm": 0.3843235671520233,
+      "learning_rate": 9.177992071663484e-05,
+      "loss": 1.1131,
+      "step": 112
+    },
+    {
+      "epoch": 0.19390819390819392,
+      "grad_norm": 0.38288792967796326,
+      "learning_rate": 9.163001323032474e-05,
+      "loss": 1.0493,
+      "step": 113
+    },
+    {
+      "epoch": 0.1956241956241956,
+      "grad_norm": 0.3953230679035187,
+      "learning_rate": 9.147887590136941e-05,
+      "loss": 1.0486,
+      "step": 114
+    },
+    {
+      "epoch": 0.19734019734019734,
+      "grad_norm": 0.40545186400413513,
+      "learning_rate": 9.132651319469975e-05,
+      "loss": 1.0206,
+      "step": 115
+    },
+    {
+      "epoch": 0.19905619905619906,
+      "grad_norm": 0.417492538690567,
+      "learning_rate": 9.117292961144704e-05,
+      "loss": 1.0398,
+      "step": 116
+    },
+    {
+      "epoch": 0.20077220077220076,
+      "grad_norm": 0.4677084982395172,
+      "learning_rate": 9.10181296888099e-05,
+      "loss": 1.0636,
+      "step": 117
+    },
+    {
+      "epoch": 0.20248820248820248,
+      "grad_norm": 0.4335651993751526,
+      "learning_rate": 9.08621179999204e-05,
+      "loss": 1.0361,
+      "step": 118
+    },
+    {
+      "epoch": 0.2042042042042042,
+      "grad_norm": 0.45066243410110474,
+      "learning_rate": 9.070489915370877e-05,
+      "loss": 1.0674,
+      "step": 119
+    },
+    {
+      "epoch": 0.2059202059202059,
+      "grad_norm": 0.4746512174606323,
+      "learning_rate": 9.05464777947674e-05,
+      "loss": 1.0673,
+      "step": 120
+    },
+    {
+      "epoch": 0.20763620763620763,
+      "grad_norm": 0.47187408804893494,
+      "learning_rate": 9.038685860321354e-05,
+      "loss": 0.9671,
+      "step": 121
+    },
+    {
+      "epoch": 0.20935220935220936,
+      "grad_norm": 0.47008371353149414,
+      "learning_rate": 9.022604629455105e-05,
+      "loss": 0.9979,
+      "step": 122
+    },
+    {
+      "epoch": 0.21106821106821108,
+      "grad_norm": 0.47808748483657837,
+      "learning_rate": 9.006404561953114e-05,
+      "loss": 1.0833,
+      "step": 123
+    },
+    {
+      "epoch": 0.21278421278421278,
+      "grad_norm": 0.4980291426181793,
+      "learning_rate": 8.9900861364012e-05,
+      "loss": 1.0631,
+      "step": 124
+    },
+    {
+      "epoch": 0.2145002145002145,
+      "grad_norm": 0.5272586345672607,
+      "learning_rate": 8.97364983488173e-05,
+      "loss": 1.1243,
+      "step": 125
+    },
+    {
+      "epoch": 0.21621621621621623,
+      "grad_norm": 0.5583837032318115,
+      "learning_rate": 8.957096142959403e-05,
+      "loss": 1.0698,
+      "step": 126
+    },
+    {
+      "epoch": 0.21793221793221793,
+      "grad_norm": 0.5071900486946106,
+      "learning_rate": 8.940425549666881e-05,
+      "loss": 0.9623,
+      "step": 127
+    },
+    {
+      "epoch": 0.21964821964821965,
+      "grad_norm": 0.5482715964317322,
+      "learning_rate": 8.923638547490351e-05,
+      "loss": 1.0047,
+      "step": 128
+    },
+    {
+      "epoch": 0.22136422136422138,
+      "grad_norm": 0.5646862983703613,
+      "learning_rate": 8.906735632354979e-05,
+      "loss": 0.9226,
+      "step": 129
+    },
+    {
+      "epoch": 0.22308022308022307,
+      "grad_norm": 0.59130859375,
+      "learning_rate": 8.889717303610255e-05,
+      "loss": 1.0076,
+      "step": 130
+    },
+    {
+      "epoch": 0.2247962247962248,
+      "grad_norm": 0.585742712020874,
+      "learning_rate": 8.872584064015241e-05,
+      "loss": 1.033,
+      "step": 131
+    },
+    {
+      "epoch": 0.22651222651222652,
+      "grad_norm": 0.6126258373260498,
+      "learning_rate": 8.85533641972372e-05,
+      "loss": 1.1072,
+      "step": 132
+    },
+    {
+      "epoch": 0.22822822822822822,
+      "grad_norm": 0.634410560131073,
+      "learning_rate": 8.837974880269246e-05,
+      "loss": 1.154,
+      "step": 133
+    },
+    {
+      "epoch": 0.22994422994422994,
+      "grad_norm": 0.6154122352600098,
+      "learning_rate": 8.820499958550082e-05,
+      "loss": 0.9805,
+      "step": 134
+    },
+    {
+      "epoch": 0.23166023166023167,
+      "grad_norm": 0.6427770256996155,
+      "learning_rate": 8.802912170814059e-05,
+      "loss": 1.0178,
+      "step": 135
+    },
+    {
+      "epoch": 0.23337623337623337,
+      "grad_norm": 0.6499301791191101,
+      "learning_rate": 8.785212036643317e-05,
+      "loss": 1.0243,
+      "step": 136
+    },
+    {
+      "epoch": 0.2350922350922351,
+      "grad_norm": 0.6780106425285339,
+      "learning_rate": 8.767400078938959e-05,
+      "loss": 1.1068,
+      "step": 137
+    },
+    {
+      "epoch": 0.23680823680823682,
+      "grad_norm": 0.6625889539718628,
+      "learning_rate": 8.7494768239056e-05,
+      "loss": 1.027,
+      "step": 138
+    },
+    {
+      "epoch": 0.2385242385242385,
+      "grad_norm": 0.6621667146682739,
+      "learning_rate": 8.731442801035831e-05,
+      "loss": 1.1563,
+      "step": 139
+    },
+    {
+      "epoch": 0.24024024024024024,
+      "grad_norm": 0.6983554363250732,
+      "learning_rate": 8.713298543094563e-05,
+      "loss": 1.0156,
+      "step": 140
+    },
+    {
+      "epoch": 0.24195624195624196,
+      "grad_norm": 0.6744945049285889,
+      "learning_rate": 8.695044586103296e-05,
+      "loss": 0.889,
+      "step": 141
+    },
+    {
+      "epoch": 0.24367224367224366,
+      "grad_norm": 0.7230704426765442,
+      "learning_rate": 8.676681469324286e-05,
+      "loss": 1.1166,
+      "step": 142
+    },
+    {
+      "epoch": 0.24538824538824539,
+      "grad_norm": 0.7530918121337891,
+      "learning_rate": 8.658209735244604e-05,
+      "loss": 1.0138,
+      "step": 143
+    },
+    {
+      "epoch": 0.2471042471042471,
+      "grad_norm": 0.8435884118080139,
+      "learning_rate": 8.639629929560127e-05,
+      "loss": 1.0743,
+      "step": 144
+    },
+    {
+      "epoch": 0.2488202488202488,
+      "grad_norm": 0.8213812708854675,
+      "learning_rate": 8.620942601159394e-05,
+      "loss": 1.077,
+      "step": 145
+    },
+    {
+      "epoch": 0.25053625053625056,
+      "grad_norm": 0.8471152782440186,
+      "learning_rate": 8.602148302107409e-05,
+      "loss": 1.0078,
+      "step": 146
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 583,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 146,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 9.82135792950313e+16,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f573e9d40174a4d8012325be4350256b3ed6b39fd7d02abcc28d941875d3234
+size 6776