Upload 16 files

Browse files

Files changed (16) hide show

checkpoint-10000/README.md +202 -0
checkpoint-10000/adapter_config.json +31 -0
checkpoint-10000/adapter_model.safetensors +3 -0
checkpoint-10000/optimizer.pt +3 -0
checkpoint-10000/rng_state.pth +3 -0
checkpoint-10000/scheduler.pt +3 -0
checkpoint-10000/trainer_state.json +321 -0
checkpoint-10000/training_args.bin +3 -0
checkpoint-15000/README.md +202 -0
checkpoint-15000/adapter_config.json +31 -0
checkpoint-15000/adapter_model.safetensors +3 -0
checkpoint-15000/optimizer.pt +3 -0
checkpoint-15000/rng_state.pth +3 -0
checkpoint-15000/scheduler.pt +3 -0
checkpoint-15000/trainer_state.json +461 -0
checkpoint-15000/training_args.bin +3 -0

checkpoint-10000/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: mistralai/Mistral-7B-Instruct-v0.3
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.1.dev0

checkpoint-10000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj",
+    "o_proj",
+    "k_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-10000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88f8382c39f31529eae6cff17b42872b6702cb97606fe66f186d9bb1a909218f
+size 218138576

checkpoint-10000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e432f8307a8167e698bcd6a3b088e19fdeded1b080cf610e961dddd575a3f50
+size 109573654

checkpoint-10000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:309855861fcc2cc62e52564e621ff316c4ab4643bcacc4c1e98e913dee6c8246
+size 14244

checkpoint-10000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:faab340554922de73396118a9f31bc4cf25c8f5ee09759ed41b96da2aedc029b
+size 1064

checkpoint-10000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,321 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.17230090630276715,
+  "eval_steps": 10000,
+  "global_step": 10000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.004307522657569179,
+      "grad_norm": 0.4925552308559418,
+      "learning_rate": 2.4998864565799097e-05,
+      "loss": 0.9861,
+      "step": 250
+    },
+    {
+      "epoch": 0.008615045315138358,
+      "grad_norm": 0.9346778988838196,
+      "learning_rate": 2.4995440213189432e-05,
+      "loss": 0.8053,
+      "step": 500
+    },
+    {
+      "epoch": 0.012922567972707537,
+      "grad_norm": 0.9080485701560974,
+      "learning_rate": 2.498972755096465e-05,
+      "loss": 0.778,
+      "step": 750
+    },
+    {
+      "epoch": 0.017230090630276716,
+      "grad_norm": 0.8981316685676575,
+      "learning_rate": 2.498172762529356e-05,
+      "loss": 0.7423,
+      "step": 1000
+    },
+    {
+      "epoch": 0.021537613287845894,
+      "grad_norm": 0.8113252520561218,
+      "learning_rate": 2.4971441901215136e-05,
+      "loss": 0.7344,
+      "step": 1250
+    },
+    {
+      "epoch": 0.025845135945415074,
+      "grad_norm": 0.7558934688568115,
+      "learning_rate": 2.4958872262370206e-05,
+      "loss": 0.7288,
+      "step": 1500
+    },
+    {
+      "epoch": 0.03015265860298425,
+      "grad_norm": 0.9201549887657166,
+      "learning_rate": 2.4944021010656492e-05,
+      "loss": 0.725,
+      "step": 1750
+    },
+    {
+      "epoch": 0.03446018126055343,
+      "grad_norm": 0.7769947648048401,
+      "learning_rate": 2.4926890865807073e-05,
+      "loss": 0.7205,
+      "step": 2000
+    },
+    {
+      "epoch": 0.038767703918122606,
+      "grad_norm": 0.9018628597259521,
+      "learning_rate": 2.4907484964892315e-05,
+      "loss": 0.7156,
+      "step": 2250
+    },
+    {
+      "epoch": 0.04307522657569179,
+      "grad_norm": 0.8375287055969238,
+      "learning_rate": 2.4885806861745365e-05,
+      "loss": 0.7137,
+      "step": 2500
+    },
+    {
+      "epoch": 0.04738274923326097,
+      "grad_norm": 0.9508460760116577,
+      "learning_rate": 2.4861860526311346e-05,
+      "loss": 0.7101,
+      "step": 2750
+    },
+    {
+      "epoch": 0.05169027189083015,
+      "grad_norm": 0.8501839637756348,
+      "learning_rate": 2.4835650343920313e-05,
+      "loss": 0.7105,
+      "step": 3000
+    },
+    {
+      "epoch": 0.05599779454839932,
+      "grad_norm": 0.9571662545204163,
+      "learning_rate": 2.480718111448419e-05,
+      "loss": 0.7056,
+      "step": 3250
+    },
+    {
+      "epoch": 0.0603053172059685,
+      "grad_norm": 0.9728882312774658,
+      "learning_rate": 2.4776458051617728e-05,
+      "loss": 0.7051,
+      "step": 3500
+    },
+    {
+      "epoch": 0.06461283986353768,
+      "grad_norm": 0.8361214995384216,
+      "learning_rate": 2.4743486781683745e-05,
+      "loss": 0.7076,
+      "step": 3750
+    },
+    {
+      "epoch": 0.06892036252110686,
+      "grad_norm": 0.8233757615089417,
+      "learning_rate": 2.4708273342762746e-05,
+      "loss": 0.7011,
+      "step": 4000
+    },
+    {
+      "epoch": 0.07322788517867604,
+      "grad_norm": 0.883545994758606,
+      "learning_rate": 2.467082418354717e-05,
+      "loss": 0.6976,
+      "step": 4250
+    },
+    {
+      "epoch": 0.07753540783624521,
+      "grad_norm": 0.7400261759757996,
+      "learning_rate": 2.463114616216044e-05,
+      "loss": 0.6992,
+      "step": 4500
+    },
+    {
+      "epoch": 0.0818429304938144,
+      "grad_norm": 0.8324729204177856,
+      "learning_rate": 2.4589246544901002e-05,
+      "loss": 0.6964,
+      "step": 4750
+    },
+    {
+      "epoch": 0.08615045315138357,
+      "grad_norm": 0.7931947112083435,
+      "learning_rate": 2.4545133004911653e-05,
+      "loss": 0.697,
+      "step": 5000
+    },
+    {
+      "epoch": 0.09045797580895276,
+      "grad_norm": 1.0109580755233765,
+      "learning_rate": 2.4498813620774335e-05,
+      "loss": 0.6984,
+      "step": 5250
+    },
+    {
+      "epoch": 0.09476549846652194,
+      "grad_norm": 0.8544988632202148,
+      "learning_rate": 2.4450296875030704e-05,
+      "loss": 0.6924,
+      "step": 5500
+    },
+    {
+      "epoch": 0.09907302112409111,
+      "grad_norm": 0.7610718011856079,
+      "learning_rate": 2.4399591652628703e-05,
+      "loss": 0.6899,
+      "step": 5750
+    },
+    {
+      "epoch": 0.1033805437816603,
+      "grad_norm": 0.7059574723243713,
+      "learning_rate": 2.434670723929544e-05,
+      "loss": 0.6913,
+      "step": 6000
+    },
+    {
+      "epoch": 0.10768806643922947,
+      "grad_norm": 0.8259685635566711,
+      "learning_rate": 2.4291653319836688e-05,
+      "loss": 0.6909,
+      "step": 6250
+    },
+    {
+      "epoch": 0.11199558909679865,
+      "grad_norm": 0.7509642839431763,
+      "learning_rate": 2.423443997636329e-05,
+      "loss": 0.687,
+      "step": 6500
+    },
+    {
+      "epoch": 0.11630311175436783,
+      "grad_norm": 0.8483273386955261,
+      "learning_rate": 2.4175077686444806e-05,
+      "loss": 0.6853,
+      "step": 6750
+    },
+    {
+      "epoch": 0.120610634411937,
+      "grad_norm": 0.8091647028923035,
+      "learning_rate": 2.411357732119073e-05,
+      "loss": 0.6873,
+      "step": 7000
+    },
+    {
+      "epoch": 0.12491815706950618,
+      "grad_norm": 0.8299710750579834,
+      "learning_rate": 2.4049950143259663e-05,
+      "loss": 0.6858,
+      "step": 7250
+    },
+    {
+      "epoch": 0.12922567972707535,
+      "grad_norm": 0.7714266777038574,
+      "learning_rate": 2.398420780479675e-05,
+      "loss": 0.6827,
+      "step": 7500
+    },
+    {
+      "epoch": 0.13353320238464456,
+      "grad_norm": 0.7820263504981995,
+      "learning_rate": 2.3916362345299814e-05,
+      "loss": 0.6822,
+      "step": 7750
+    },
+    {
+      "epoch": 0.13784072504221373,
+      "grad_norm": 0.7763279676437378,
+      "learning_rate": 2.3846426189414538e-05,
+      "loss": 0.6809,
+      "step": 8000
+    },
+    {
+      "epoch": 0.1421482476997829,
+      "grad_norm": 0.7766652703285217,
+      "learning_rate": 2.3774412144659126e-05,
+      "loss": 0.6777,
+      "step": 8250
+    },
+    {
+      "epoch": 0.14645577035735208,
+      "grad_norm": 0.6801514029502869,
+      "learning_rate": 2.370033339907883e-05,
+      "loss": 0.6782,
+      "step": 8500
+    },
+    {
+      "epoch": 0.15076329301492125,
+      "grad_norm": 0.8070161938667297,
+      "learning_rate": 2.3624203518830812e-05,
+      "loss": 0.6758,
+      "step": 8750
+    },
+    {
+      "epoch": 0.15507081567249043,
+      "grad_norm": 0.7602735757827759,
+      "learning_rate": 2.3546036445699768e-05,
+      "loss": 0.6757,
+      "step": 9000
+    },
+    {
+      "epoch": 0.15937833833005963,
+      "grad_norm": 0.9054674506187439,
+      "learning_rate": 2.3465846494544707e-05,
+      "loss": 0.6751,
+      "step": 9250
+    },
+    {
+      "epoch": 0.1636858609876288,
+      "grad_norm": 61.813472747802734,
+      "learning_rate": 2.3383648350677484e-05,
+      "loss": 0.6874,
+      "step": 9500
+    },
+    {
+      "epoch": 0.16799338364519797,
+      "grad_norm": NaN,
+      "learning_rate": 2.3299457067173435e-05,
+      "loss": 0.694,
+      "step": 9750
+    },
+    {
+      "epoch": 0.17230090630276715,
+      "grad_norm": NaN,
+      "learning_rate": 2.321328806211471e-05,
+      "loss": 0.2783,
+      "step": 10000
+    },
+    {
+      "epoch": 0.17230090630276715,
+      "eval_loss": NaN,
+      "eval_runtime": 23846.4172,
+      "eval_samples_per_second": 5.215,
+      "eval_steps_per_second": 0.522,
+      "step": 10000
+    }
+  ],
+  "logging_steps": 250,
+  "max_steps": 58038,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 5000,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 4.94679407529984e+17,
+  "train_batch_size": 10,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-10000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:24d7a7a251761e255f6bedff7b9574903bc57679a072ffc2d734a96acf1eec2a
+size 5176

checkpoint-15000/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: mistralai/Mistral-7B-Instruct-v0.3
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.1.dev0

checkpoint-15000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj",
+    "o_proj",
+    "k_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-15000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88f8382c39f31529eae6cff17b42872b6702cb97606fe66f186d9bb1a909218f
+size 218138576

checkpoint-15000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cb3a9d9a10ddfbbefe2695a1b4d620a14e3eb105c2bdf55c20571fca35a5cfcf
+size 109573654

checkpoint-15000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fdca9506f960dd8d65c244fefe800042cbaafab042672671f1afd765d2171855
+size 14244

checkpoint-15000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d1284c66fac3bcdcf005329877e2cd7b120264617b070252a8e58981fa8655a
+size 1064

checkpoint-15000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,461 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.2584513594541507,
+  "eval_steps": 10000,
+  "global_step": 15000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.004307522657569179,
+      "grad_norm": 0.4925552308559418,
+      "learning_rate": 2.4998864565799097e-05,
+      "loss": 0.9861,
+      "step": 250
+    },
+    {
+      "epoch": 0.008615045315138358,
+      "grad_norm": 0.9346778988838196,
+      "learning_rate": 2.4995440213189432e-05,
+      "loss": 0.8053,
+      "step": 500
+    },
+    {
+      "epoch": 0.012922567972707537,
+      "grad_norm": 0.9080485701560974,
+      "learning_rate": 2.498972755096465e-05,
+      "loss": 0.778,
+      "step": 750
+    },
+    {
+      "epoch": 0.017230090630276716,
+      "grad_norm": 0.8981316685676575,
+      "learning_rate": 2.498172762529356e-05,
+      "loss": 0.7423,
+      "step": 1000
+    },
+    {
+      "epoch": 0.021537613287845894,
+      "grad_norm": 0.8113252520561218,
+      "learning_rate": 2.4971441901215136e-05,
+      "loss": 0.7344,
+      "step": 1250
+    },
+    {
+      "epoch": 0.025845135945415074,
+      "grad_norm": 0.7558934688568115,
+      "learning_rate": 2.4958872262370206e-05,
+      "loss": 0.7288,
+      "step": 1500
+    },
+    {
+      "epoch": 0.03015265860298425,
+      "grad_norm": 0.9201549887657166,
+      "learning_rate": 2.4944021010656492e-05,
+      "loss": 0.725,
+      "step": 1750
+    },
+    {
+      "epoch": 0.03446018126055343,
+      "grad_norm": 0.7769947648048401,
+      "learning_rate": 2.4926890865807073e-05,
+      "loss": 0.7205,
+      "step": 2000
+    },
+    {
+      "epoch": 0.038767703918122606,
+      "grad_norm": 0.9018628597259521,
+      "learning_rate": 2.4907484964892315e-05,
+      "loss": 0.7156,
+      "step": 2250
+    },
+    {
+      "epoch": 0.04307522657569179,
+      "grad_norm": 0.8375287055969238,
+      "learning_rate": 2.4885806861745365e-05,
+      "loss": 0.7137,
+      "step": 2500
+    },
+    {
+      "epoch": 0.04738274923326097,
+      "grad_norm": 0.9508460760116577,
+      "learning_rate": 2.4861860526311346e-05,
+      "loss": 0.7101,
+      "step": 2750
+    },
+    {
+      "epoch": 0.05169027189083015,
+      "grad_norm": 0.8501839637756348,
+      "learning_rate": 2.4835650343920313e-05,
+      "loss": 0.7105,
+      "step": 3000
+    },
+    {
+      "epoch": 0.05599779454839932,
+      "grad_norm": 0.9571662545204163,
+      "learning_rate": 2.480718111448419e-05,
+      "loss": 0.7056,
+      "step": 3250
+    },
+    {
+      "epoch": 0.0603053172059685,
+      "grad_norm": 0.9728882312774658,
+      "learning_rate": 2.4776458051617728e-05,
+      "loss": 0.7051,
+      "step": 3500
+    },
+    {
+      "epoch": 0.06461283986353768,
+      "grad_norm": 0.8361214995384216,
+      "learning_rate": 2.4743486781683745e-05,
+      "loss": 0.7076,
+      "step": 3750
+    },
+    {
+      "epoch": 0.06892036252110686,
+      "grad_norm": 0.8233757615089417,
+      "learning_rate": 2.4708273342762746e-05,
+      "loss": 0.7011,
+      "step": 4000
+    },
+    {
+      "epoch": 0.07322788517867604,
+      "grad_norm": 0.883545994758606,
+      "learning_rate": 2.467082418354717e-05,
+      "loss": 0.6976,
+      "step": 4250
+    },
+    {
+      "epoch": 0.07753540783624521,
+      "grad_norm": 0.7400261759757996,
+      "learning_rate": 2.463114616216044e-05,
+      "loss": 0.6992,
+      "step": 4500
+    },
+    {
+      "epoch": 0.0818429304938144,
+      "grad_norm": 0.8324729204177856,
+      "learning_rate": 2.4589246544901002e-05,
+      "loss": 0.6964,
+      "step": 4750
+    },
+    {
+      "epoch": 0.08615045315138357,
+      "grad_norm": 0.7931947112083435,
+      "learning_rate": 2.4545133004911653e-05,
+      "loss": 0.697,
+      "step": 5000
+    },
+    {
+      "epoch": 0.09045797580895276,
+      "grad_norm": 1.0109580755233765,
+      "learning_rate": 2.4498813620774335e-05,
+      "loss": 0.6984,
+      "step": 5250
+    },
+    {
+      "epoch": 0.09476549846652194,
+      "grad_norm": 0.8544988632202148,
+      "learning_rate": 2.4450296875030704e-05,
+      "loss": 0.6924,
+      "step": 5500
+    },
+    {
+      "epoch": 0.09907302112409111,
+      "grad_norm": 0.7610718011856079,
+      "learning_rate": 2.4399591652628703e-05,
+      "loss": 0.6899,
+      "step": 5750
+    },
+    {
+      "epoch": 0.1033805437816603,
+      "grad_norm": 0.7059574723243713,
+      "learning_rate": 2.434670723929544e-05,
+      "loss": 0.6913,
+      "step": 6000
+    },
+    {
+      "epoch": 0.10768806643922947,
+      "grad_norm": 0.8259685635566711,
+      "learning_rate": 2.4291653319836688e-05,
+      "loss": 0.6909,
+      "step": 6250
+    },
+    {
+      "epoch": 0.11199558909679865,
+      "grad_norm": 0.7509642839431763,
+      "learning_rate": 2.423443997636329e-05,
+      "loss": 0.687,
+      "step": 6500
+    },
+    {
+      "epoch": 0.11630311175436783,
+      "grad_norm": 0.8483273386955261,
+      "learning_rate": 2.4175077686444806e-05,
+      "loss": 0.6853,
+      "step": 6750
+    },
+    {
+      "epoch": 0.120610634411937,
+      "grad_norm": 0.8091647028923035,
+      "learning_rate": 2.411357732119073e-05,
+      "loss": 0.6873,
+      "step": 7000
+    },
+    {
+      "epoch": 0.12491815706950618,
+      "grad_norm": 0.8299710750579834,
+      "learning_rate": 2.4049950143259663e-05,
+      "loss": 0.6858,
+      "step": 7250
+    },
+    {
+      "epoch": 0.12922567972707535,
+      "grad_norm": 0.7714266777038574,
+      "learning_rate": 2.398420780479675e-05,
+      "loss": 0.6827,
+      "step": 7500
+    },
+    {
+      "epoch": 0.13353320238464456,
+      "grad_norm": 0.7820263504981995,
+      "learning_rate": 2.3916362345299814e-05,
+      "loss": 0.6822,
+      "step": 7750
+    },
+    {
+      "epoch": 0.13784072504221373,
+      "grad_norm": 0.7763279676437378,
+      "learning_rate": 2.3846426189414538e-05,
+      "loss": 0.6809,
+      "step": 8000
+    },
+    {
+      "epoch": 0.1421482476997829,
+      "grad_norm": 0.7766652703285217,
+      "learning_rate": 2.3774412144659126e-05,
+      "loss": 0.6777,
+      "step": 8250
+    },
+    {
+      "epoch": 0.14645577035735208,
+      "grad_norm": 0.6801514029502869,
+      "learning_rate": 2.370033339907883e-05,
+      "loss": 0.6782,
+      "step": 8500
+    },
+    {
+      "epoch": 0.15076329301492125,
+      "grad_norm": 0.8070161938667297,
+      "learning_rate": 2.3624203518830812e-05,
+      "loss": 0.6758,
+      "step": 8750
+    },
+    {
+      "epoch": 0.15507081567249043,
+      "grad_norm": 0.7602735757827759,
+      "learning_rate": 2.3546036445699768e-05,
+      "loss": 0.6757,
+      "step": 9000
+    },
+    {
+      "epoch": 0.15937833833005963,
+      "grad_norm": 0.9054674506187439,
+      "learning_rate": 2.3465846494544707e-05,
+      "loss": 0.6751,
+      "step": 9250
+    },
+    {
+      "epoch": 0.1636858609876288,
+      "grad_norm": 61.813472747802734,
+      "learning_rate": 2.3383648350677484e-05,
+      "loss": 0.6874,
+      "step": 9500
+    },
+    {
+      "epoch": 0.16799338364519797,
+      "grad_norm": NaN,
+      "learning_rate": 2.3299457067173435e-05,
+      "loss": 0.694,
+      "step": 9750
+    },
+    {
+      "epoch": 0.17230090630276715,
+      "grad_norm": NaN,
+      "learning_rate": 2.321328806211471e-05,
+      "loss": 0.2783,
+      "step": 10000
+    },
+    {
+      "epoch": 0.17230090630276715,
+      "eval_loss": NaN,
+      "eval_runtime": 23846.4172,
+      "eval_samples_per_second": 5.215,
+      "eval_steps_per_second": 0.522,
+      "step": 10000
+    },
+    {
+      "epoch": 0.17660842896033632,
+      "grad_norm": NaN,
+      "learning_rate": 2.3125157115766692e-05,
+      "loss": 0.3376,
+      "step": 10250
+    },
+    {
+      "epoch": 0.18091595161790552,
+      "grad_norm": NaN,
+      "learning_rate": 2.3035080367688184e-05,
+      "loss": 0.1085,
+      "step": 10500
+    },
+    {
+      "epoch": 0.1852234742754747,
+      "grad_norm": NaN,
+      "learning_rate": 2.294307431377569e-05,
+      "loss": 0.1951,
+      "step": 10750
+    },
+    {
+      "epoch": 0.18953099693304387,
+      "grad_norm": NaN,
+      "learning_rate": 2.2849155803242555e-05,
+      "loss": 0.329,
+      "step": 11000
+    },
+    {
+      "epoch": 0.19383851959061305,
+      "grad_norm": NaN,
+      "learning_rate": 2.2753342035533286e-05,
+      "loss": 0.2104,
+      "step": 11250
+    },
+    {
+      "epoch": 0.19814604224818222,
+      "grad_norm": NaN,
+      "learning_rate": 2.265565055717384e-05,
+      "loss": 0.2739,
+      "step": 11500
+    },
+    {
+      "epoch": 0.2024535649057514,
+      "grad_norm": NaN,
+      "learning_rate": 2.255609925855826e-05,
+      "loss": 0.3571,
+      "step": 11750
+    },
+    {
+      "epoch": 0.2067610875633206,
+      "grad_norm": NaN,
+      "learning_rate": 2.2454706370672406e-05,
+      "loss": 0.1438,
+      "step": 12000
+    },
+    {
+      "epoch": 0.21106861022088977,
+      "grad_norm": NaN,
+      "learning_rate": 2.2351490461755282e-05,
+      "loss": 0.2218,
+      "step": 12250
+    },
+    {
+      "epoch": 0.21537613287845894,
+      "grad_norm": NaN,
+      "learning_rate": 2.224647043389858e-05,
+      "loss": 0.313,
+      "step": 12500
+    },
+    {
+      "epoch": 0.21968365553602812,
+      "grad_norm": NaN,
+      "learning_rate": 2.2139665519585153e-05,
+      "loss": 0.2029,
+      "step": 12750
+    },
+    {
+      "epoch": 0.2239911781935973,
+      "grad_norm": NaN,
+      "learning_rate": 2.2031095278166907e-05,
+      "loss": 0.2838,
+      "step": 13000
+    },
+    {
+      "epoch": 0.22829870085116646,
+      "grad_norm": NaN,
+      "learning_rate": 2.1920779592282877e-05,
+      "loss": 0.1918,
+      "step": 13250
+    },
+    {
+      "epoch": 0.23260622350873567,
+      "grad_norm": NaN,
+      "learning_rate": 2.180873866421809e-05,
+      "loss": 0.2968,
+      "step": 13500
+    },
+    {
+      "epoch": 0.23691374616630484,
+      "grad_norm": NaN,
+      "learning_rate": 2.1694993012203884e-05,
+      "loss": 0.4669,
+      "step": 13750
+    },
+    {
+      "epoch": 0.241221268823874,
+      "grad_norm": NaN,
+      "learning_rate": 2.1579563466660373e-05,
+      "loss": 0.2517,
+      "step": 14000
+    },
+    {
+      "epoch": 0.2455287914814432,
+      "grad_norm": NaN,
+      "learning_rate": 2.1462471166381747e-05,
+      "loss": 0.2358,
+      "step": 14250
+    },
+    {
+      "epoch": 0.24983631413901236,
+      "grad_norm": NaN,
+      "learning_rate": 2.134373755466508e-05,
+      "loss": 0.201,
+      "step": 14500
+    },
+    {
+      "epoch": 0.25414383679658153,
+      "grad_norm": NaN,
+      "learning_rate": 2.1223384375383393e-05,
+      "loss": 0.2276,
+      "step": 14750
+    },
+    {
+      "epoch": 0.2584513594541507,
+      "grad_norm": NaN,
+      "learning_rate": 2.1101433669003662e-05,
+      "loss": 0.5012,
+      "step": 15000
+    }
+  ],
+  "logging_steps": 250,
+  "max_steps": 58038,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 5000,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 7.41990724701143e+17,
+  "train_batch_size": 10,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-15000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:24d7a7a251761e255f6bedff7b9574903bc57679a072ffc2d734a96acf1eec2a
+size 5176