Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

README.md +175 -98
adapter_model.safetensors +1 -1
checkpoint-1416/adapter_model.safetensors +1 -1
checkpoint-1416/optimizer.pt +1 -1
checkpoint-1416/scaler.pt +1 -1
checkpoint-1416/trainer_state.json +39 -39
checkpoint-2124/adapter_model.safetensors +1 -1
checkpoint-2124/optimizer.pt +1 -1
checkpoint-2124/scaler.pt +1 -1
checkpoint-2124/trainer_state.json +58 -58
checkpoint-708/adapter_model.safetensors +1 -1
checkpoint-708/optimizer.pt +1 -1
checkpoint-708/scaler.pt +1 -1
checkpoint-708/trainer_state.json +20 -20

README.md CHANGED Viewed

@@ -1,129 +1,206 @@
 ---
-language: en
-license: apache-2.0
 tags:
-- text-classification
-- bert
 - lora
-- peft
-- 20-newsgroups
-datasets:
-- SetFit/20_newsgroups
-base_model: bert-base-uncased
-metrics:
-- accuracy
-model-index:
-- name: bert-lora-20newsgroups
-  results:
-  - task:
-      type: text-classification
-      name: Text Classification
-    dataset:
-      name: 20 Newsgroups
-      type: SetFit/20_newsgroups
-    metrics:
-    - type: accuracy
-      value: 0.82
-      name: Accuracy
 ---
-# BERT-LoRA for 20 Newsgroups Classification
-## Model Description
-This model is a **BERT-base-uncased** fine-tuned with **LoRA (Low-Rank Adaptation)** for multi-class text classification on the 20 Newsgroups dataset.
-- **Base Model:** bert-base-uncased
-- **Method:** LoRA (Parameter-Efficient Fine-Tuning)
-- **Task:** Multi-class text classification (20 categories)
-- **Dataset:** 20 Newsgroups (~11K training, ~7K test samples)
-- **Trainable Parameters:** ~300K (0.3% of total)
-- **Adapter Size:** ~2 MB
-## Categories
-The model classifies text into 20 newsgroup topics:
-- `alt.atheism`, `comp.graphics`, `comp.os.ms-windows.misc`, `comp.sys.ibm.pc.hardware`
-- `comp.sys.mac.hardware`, `comp.windows.x`, `misc.forsale`, `rec.autos`
-- `rec.motorcycles`, `rec.sport.baseball`, `rec.sport.hockey`, `sci.crypt`
-- `sci.electronics`, `sci.med`, `sci.space`, `soc.religion.christian`
-- `talk.politics.guns`, `talk.politics.mideast`, `talk.politics.misc`, `talk.religion.misc`
-## Usage
-### Installation
-```bash
-pip install transformers peft torch
-```
-### Load Model
-```python
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
-from peft import PeftModel
-import torch
-# Load tokenizer
-tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
-# Load base model
-base_model = AutoModelForSequenceClassification.from_pretrained(
-    "bert-base-uncased",
-    num_labels=20
-)
-# Load LoRA adapters
-model = PeftModel.from_pretrained(base_model, "alialialialaiali/bert-lora-20newsgroups")
-model.eval()
-```
-### Make Predictions
-```python
-text = "NASA announced a new mission to Mars with advanced rovers."
-inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
-with torch.no_grad():
-    outputs = model(**inputs)
-    prediction = outputs.logits.argmax(-1).item()
-categories = [
-    "alt.atheism", "comp.graphics", "comp.os.ms-windows.misc",
-    "comp.sys.ibm.pc.hardware", "comp.sys.mac.hardware", "comp.windows.x",
-    "misc.forsale", "rec.autos", "rec.motorcycles", "rec.sport.baseball",
-    "rec.sport.hockey", "sci.crypt", "sci.electronics", "sci.med",
-    "sci.space", "soc.religion.christian", "talk.politics.guns",
-    "talk.politics.mideast", "talk.politics.misc", "talk.religion.misc"
-]
-print(f"Predicted category: {categories[prediction]}")
-# Output: sci.space
-```
-## Why LoRA?
-LoRA provides:
-- **99% smaller model size** (2 MB vs 440 MB)
-- **100x fewer trainable parameters** (300K vs 110M)
-- **Faster training** (15 min vs 2+ hours)
-- **Same accuracy** as full fine-tuning (~82%)
-Perfect for deployment, experimentation, and resource-constrained environments.
-## Citation
-```bibtex
-@misc{bert-lora-20newsgroups,
-  author = {Your Name},
-  title = {BERT-LoRA for 20 Newsgroups Classification},
-  year = {2024},
-  publisher = {HuggingFace},
-  howpublished = {\url{https://huggingface.co/your-username/bert-lora-20newsgroups}}
-}
-```
-## License
-Apache 2.0 (following base BERT model license)

 ---
+base_model: bert-base-uncased
+library_name: peft
 tags:
+- base_model:adapter:bert-base-uncased
 - lora
+- transformers
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e9e4d8228f0562591a714cfbf9221c349a3177a65e8a99cb8e6aa999dc3aa8b9
 size 1248048

 version https://git-lfs.github.com/spec/v1
+oid sha256:69614487361448ac6f44cb9a64edbfa931a38fe2f6edc52f913d4550dbd62074
 size 1248048

checkpoint-1416/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:66aefa940619a9f9eb66e12d0603a8fba5b82b49ab35cba4a162d06aefe133c8
 size 1248048

 version https://git-lfs.github.com/spec/v1
+oid sha256:d422d38d19ea46fb5984a56f291ad3bdcb738fbb30dfa2811240b4dcb6057cd9
 size 1248048

checkpoint-1416/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:368796f41898a0bde6df4fd0d5231ceddc59fad592fff171e0c2160d2e7c1349
 size 2525771

 version https://git-lfs.github.com/spec/v1
+oid sha256:20b8ec4cc54c20180a510cc920959ebd12a2ef67bfbe6033b68585f7058bfff3
 size 2525771

checkpoint-1416/scaler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6174a0364b7b63dec3190ee042579706a2fb2134581c334e4f17057f0ee66353
 size 1383

 version https://git-lfs.github.com/spec/v1
+oid sha256:095aa02adb069a3c22551d16a914393ee95dfcb22f2e4d00568ad9f4e17128dd
 size 1383

checkpoint-1416/trainer_state.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "best_global_step": 1416,
-  "best_metric": 0.5379713223579394,
   "best_model_checkpoint": "./bert-lora-newsgroups/checkpoint-1416",
   "epoch": 2.0,
   "eval_steps": 500,
@@ -11,118 +11,118 @@
   "log_history": [
     {
       "epoch": 0.14124293785310735,
-      "grad_norm": 4.030381679534912,
       "learning_rate": 0.00019067796610169492,
-      "loss": 2.9403,
       "step": 100
     },
     {
       "epoch": 0.2824858757062147,
-      "grad_norm": 5.417972564697266,
       "learning_rate": 0.0001812617702448211,
-      "loss": 2.5091,
       "step": 200
     },
     {
       "epoch": 0.423728813559322,
-      "grad_norm": 4.303308486938477,
       "learning_rate": 0.00017184557438794729,
-      "loss": 2.1815,
       "step": 300
     },
     {
       "epoch": 0.5649717514124294,
-      "grad_norm": 5.2155537605285645,
       "learning_rate": 0.00016242937853107344,
-      "loss": 1.9917,
       "step": 400
     },
     {
       "epoch": 0.7062146892655368,
-      "grad_norm": 5.118275165557861,
       "learning_rate": 0.00015301318267419963,
-      "loss": 1.9013,
       "step": 500
     },
     {
       "epoch": 0.847457627118644,
-      "grad_norm": 5.051901817321777,
       "learning_rate": 0.0001435969868173258,
-      "loss": 1.7503,
       "step": 600
     },
     {
       "epoch": 0.9887005649717514,
-      "grad_norm": 5.84205961227417,
       "learning_rate": 0.00013418079096045197,
-      "loss": 1.6455,
       "step": 700
     },
     {
       "epoch": 1.0,
-      "eval_accuracy": 0.4633563462559745,
-      "eval_loss": 1.6542061567306519,
-      "eval_runtime": 61.4551,
-      "eval_samples_per_second": 122.561,
-      "eval_steps_per_second": 7.664,
       "step": 708
     },
     {
       "epoch": 1.1299435028248588,
-      "grad_norm": 5.543461322784424,
       "learning_rate": 0.00012476459510357815,
-      "loss": 1.5649,
       "step": 800
     },
     {
       "epoch": 1.271186440677966,
-      "grad_norm": 4.303509712219238,
       "learning_rate": 0.00011534839924670434,
-      "loss": 1.5064,
       "step": 900
     },
     {
       "epoch": 1.4124293785310735,
-      "grad_norm": 5.7473578453063965,
       "learning_rate": 0.00010593220338983052,
-      "loss": 1.4491,
       "step": 1000
     },
     {
       "epoch": 1.5536723163841808,
-      "grad_norm": 4.23817777633667,
       "learning_rate": 9.651600753295669e-05,
-      "loss": 1.4401,
       "step": 1100
     },
     {
       "epoch": 1.694915254237288,
-      "grad_norm": 5.211511611938477,
       "learning_rate": 8.709981167608286e-05,
-      "loss": 1.3813,
       "step": 1200
     },
     {
       "epoch": 1.8361581920903953,
-      "grad_norm": 4.599623203277588,
       "learning_rate": 7.768361581920904e-05,
-      "loss": 1.4401,
       "step": 1300
     },
     {
       "epoch": 1.9774011299435028,
-      "grad_norm": 3.9729771614074707,
       "learning_rate": 6.826741996233523e-05,
-      "loss": 1.3853,
       "step": 1400
     },
     {
       "epoch": 2.0,
-      "eval_accuracy": 0.5379713223579394,
-      "eval_loss": 1.4326964616775513,
-      "eval_runtime": 61.5418,
-      "eval_samples_per_second": 122.388,
-      "eval_steps_per_second": 7.653,
       "step": 1416
     }
   ],

 {
   "best_global_step": 1416,
+  "best_metric": 0.54182156133829,
   "best_model_checkpoint": "./bert-lora-newsgroups/checkpoint-1416",
   "epoch": 2.0,
   "eval_steps": 500,
   "log_history": [
     {
       "epoch": 0.14124293785310735,
+      "grad_norm": 4.619589805603027,
       "learning_rate": 0.00019067796610169492,
+      "loss": 2.9563,
       "step": 100
     },
     {
       "epoch": 0.2824858757062147,
+      "grad_norm": 4.369770526885986,
       "learning_rate": 0.0001812617702448211,
+      "loss": 2.4516,
       "step": 200
     },
     {
       "epoch": 0.423728813559322,
+      "grad_norm": 4.454040050506592,
       "learning_rate": 0.00017184557438794729,
+      "loss": 2.1271,
       "step": 300
     },
     {
       "epoch": 0.5649717514124294,
+      "grad_norm": 5.155438423156738,
       "learning_rate": 0.00016242937853107344,
+      "loss": 1.9125,
       "step": 400
     },
     {
       "epoch": 0.7062146892655368,
+      "grad_norm": 4.892629623413086,
       "learning_rate": 0.00015301318267419963,
+      "loss": 1.7896,
       "step": 500
     },
     {
       "epoch": 0.847457627118644,
+      "grad_norm": 4.983877658843994,
       "learning_rate": 0.0001435969868173258,
+      "loss": 1.6968,
       "step": 600
     },
     {
       "epoch": 0.9887005649717514,
+      "grad_norm": 8.334493637084961,
       "learning_rate": 0.00013418079096045197,
+      "loss": 1.5862,
       "step": 700
     },
     {
       "epoch": 1.0,
+      "eval_accuracy": 0.47357939458311205,
+      "eval_loss": 1.6120948791503906,
+      "eval_runtime": 59.1685,
+      "eval_samples_per_second": 127.297,
+      "eval_steps_per_second": 7.96,
       "step": 708
     },
     {
       "epoch": 1.1299435028248588,
+      "grad_norm": 6.713998794555664,
       "learning_rate": 0.00012476459510357815,
+      "loss": 1.5267,
       "step": 800
     },
     {
       "epoch": 1.271186440677966,
+      "grad_norm": 4.822694778442383,
       "learning_rate": 0.00011534839924670434,
+      "loss": 1.4934,
       "step": 900
     },
     {
       "epoch": 1.4124293785310735,
+      "grad_norm": 4.339609146118164,
       "learning_rate": 0.00010593220338983052,
+      "loss": 1.4728,
       "step": 1000
     },
     {
       "epoch": 1.5536723163841808,
+      "grad_norm": 3.8593039512634277,
       "learning_rate": 9.651600753295669e-05,
+      "loss": 1.4145,
       "step": 1100
     },
     {
       "epoch": 1.694915254237288,
+      "grad_norm": 4.826875686645508,
       "learning_rate": 8.709981167608286e-05,
+      "loss": 1.3815,
       "step": 1200
     },
     {
       "epoch": 1.8361581920903953,
+      "grad_norm": 4.669344902038574,
       "learning_rate": 7.768361581920904e-05,
+      "loss": 1.4495,
       "step": 1300
     },
     {
       "epoch": 1.9774011299435028,
+      "grad_norm": 4.768439769744873,
       "learning_rate": 6.826741996233523e-05,
+      "loss": 1.3959,
       "step": 1400
     },
     {
       "epoch": 2.0,
+      "eval_accuracy": 0.54182156133829,
+      "eval_loss": 1.4169427156448364,
+      "eval_runtime": 59.3986,
+      "eval_samples_per_second": 126.804,
+      "eval_steps_per_second": 7.929,
       "step": 1416
     }
   ],

checkpoint-2124/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e9e4d8228f0562591a714cfbf9221c349a3177a65e8a99cb8e6aa999dc3aa8b9
 size 1248048

 version https://git-lfs.github.com/spec/v1
+oid sha256:69614487361448ac6f44cb9a64edbfa931a38fe2f6edc52f913d4550dbd62074
 size 1248048

checkpoint-2124/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:80695a6e4bcaba681c66fc00cf28fbcbdcbae5feb36f3600fe535a0683f666e3
 size 2525771

 version https://git-lfs.github.com/spec/v1
+oid sha256:1f6f391741835ef82bea84849674667449cac8c88bfd85169f23569bb00357b8
 size 2525771

checkpoint-2124/scaler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:892ec425c2ad3d890afc4a30cd25cc06fe08542e6227a06b4f0c45a4de576716
 size 1383

 version https://git-lfs.github.com/spec/v1
+oid sha256:0d0c2fc6768514eef43c0b00e557d40961600f8d31084be0d527d115b41589fc
 size 1383

checkpoint-2124/trainer_state.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "best_global_step": 2124,
-  "best_metric": 0.5643919277748274,
   "best_model_checkpoint": "./bert-lora-newsgroups/checkpoint-2124",
   "epoch": 3.0,
   "eval_steps": 500,
@@ -11,176 +11,176 @@
   "log_history": [
     {
       "epoch": 0.14124293785310735,
-      "grad_norm": 4.030381679534912,
       "learning_rate": 0.00019067796610169492,
-      "loss": 2.9403,
       "step": 100
     },
     {
       "epoch": 0.2824858757062147,
-      "grad_norm": 5.417972564697266,
       "learning_rate": 0.0001812617702448211,
-      "loss": 2.5091,
       "step": 200
     },
     {
       "epoch": 0.423728813559322,
-      "grad_norm": 4.303308486938477,
       "learning_rate": 0.00017184557438794729,
-      "loss": 2.1815,
       "step": 300
     },
     {
       "epoch": 0.5649717514124294,
-      "grad_norm": 5.2155537605285645,
       "learning_rate": 0.00016242937853107344,
-      "loss": 1.9917,
       "step": 400
     },
     {
       "epoch": 0.7062146892655368,
-      "grad_norm": 5.118275165557861,
       "learning_rate": 0.00015301318267419963,
-      "loss": 1.9013,
       "step": 500
     },
     {
       "epoch": 0.847457627118644,
-      "grad_norm": 5.051901817321777,
       "learning_rate": 0.0001435969868173258,
-      "loss": 1.7503,
       "step": 600
     },
     {
       "epoch": 0.9887005649717514,
-      "grad_norm": 5.84205961227417,
       "learning_rate": 0.00013418079096045197,
-      "loss": 1.6455,
       "step": 700
     },
     {
       "epoch": 1.0,
-      "eval_accuracy": 0.4633563462559745,
-      "eval_loss": 1.6542061567306519,
-      "eval_runtime": 61.4551,
-      "eval_samples_per_second": 122.561,
-      "eval_steps_per_second": 7.664,
       "step": 708
     },
     {
       "epoch": 1.1299435028248588,
-      "grad_norm": 5.543461322784424,
       "learning_rate": 0.00012476459510357815,
-      "loss": 1.5649,
       "step": 800
     },
     {
       "epoch": 1.271186440677966,
-      "grad_norm": 4.303509712219238,
       "learning_rate": 0.00011534839924670434,
-      "loss": 1.5064,
       "step": 900
     },
     {
       "epoch": 1.4124293785310735,
-      "grad_norm": 5.7473578453063965,
       "learning_rate": 0.00010593220338983052,
-      "loss": 1.4491,
       "step": 1000
     },
     {
       "epoch": 1.5536723163841808,
-      "grad_norm": 4.23817777633667,
       "learning_rate": 9.651600753295669e-05,
-      "loss": 1.4401,
       "step": 1100
     },
     {
       "epoch": 1.694915254237288,
-      "grad_norm": 5.211511611938477,
       "learning_rate": 8.709981167608286e-05,
-      "loss": 1.3813,
       "step": 1200
     },
     {
       "epoch": 1.8361581920903953,
-      "grad_norm": 4.599623203277588,
       "learning_rate": 7.768361581920904e-05,
-      "loss": 1.4401,
       "step": 1300
     },
     {
       "epoch": 1.9774011299435028,
-      "grad_norm": 3.9729771614074707,
       "learning_rate": 6.826741996233523e-05,
-      "loss": 1.3853,
       "step": 1400
     },
     {
       "epoch": 2.0,
-      "eval_accuracy": 0.5379713223579394,
-      "eval_loss": 1.4326964616775513,
-      "eval_runtime": 61.5418,
-      "eval_samples_per_second": 122.388,
-      "eval_steps_per_second": 7.653,
       "step": 1416
     },
     {
       "epoch": 2.1186440677966103,
-      "grad_norm": 4.936285018920898,
       "learning_rate": 5.88512241054614e-05,
-      "loss": 1.3455,
       "step": 1500
     },
     {
       "epoch": 2.2598870056497176,
-      "grad_norm": 3.9144532680511475,
       "learning_rate": 4.9435028248587575e-05,
-      "loss": 1.303,
       "step": 1600
     },
     {
       "epoch": 2.401129943502825,
-      "grad_norm": 6.503249168395996,
       "learning_rate": 4.001883239171375e-05,
-      "loss": 1.2968,
       "step": 1700
     },
     {
       "epoch": 2.542372881355932,
-      "grad_norm": 4.896490573883057,
       "learning_rate": 3.060263653483992e-05,
-      "loss": 1.2855,
       "step": 1800
     },
     {
       "epoch": 2.68361581920904,
-      "grad_norm": 5.819763660430908,
       "learning_rate": 2.1186440677966103e-05,
-      "loss": 1.2719,
       "step": 1900
     },
     {
       "epoch": 2.824858757062147,
-      "grad_norm": 8.788325309753418,
       "learning_rate": 1.1770244821092279e-05,
-      "loss": 1.296,
       "step": 2000
     },
     {
       "epoch": 2.9661016949152543,
-      "grad_norm": 5.9179792404174805,
       "learning_rate": 2.3540489642184557e-06,
-      "loss": 1.2127,
       "step": 2100
     },
     {
       "epoch": 3.0,
-      "eval_accuracy": 0.5643919277748274,
-      "eval_loss": 1.3525854349136353,
-      "eval_runtime": 61.1908,
-      "eval_samples_per_second": 123.09,
-      "eval_steps_per_second": 7.697,
       "step": 2124
     }
   ],

 {
   "best_global_step": 2124,
+  "best_metric": 0.5742166755177908,
   "best_model_checkpoint": "./bert-lora-newsgroups/checkpoint-2124",
   "epoch": 3.0,
   "eval_steps": 500,
   "log_history": [
     {
       "epoch": 0.14124293785310735,
+      "grad_norm": 4.619589805603027,
       "learning_rate": 0.00019067796610169492,
+      "loss": 2.9563,
       "step": 100
     },
     {
       "epoch": 0.2824858757062147,
+      "grad_norm": 4.369770526885986,
       "learning_rate": 0.0001812617702448211,
+      "loss": 2.4516,
       "step": 200
     },
     {
       "epoch": 0.423728813559322,
+      "grad_norm": 4.454040050506592,
       "learning_rate": 0.00017184557438794729,
+      "loss": 2.1271,
       "step": 300
     },
     {
       "epoch": 0.5649717514124294,
+      "grad_norm": 5.155438423156738,
       "learning_rate": 0.00016242937853107344,
+      "loss": 1.9125,
       "step": 400
     },
     {
       "epoch": 0.7062146892655368,
+      "grad_norm": 4.892629623413086,
       "learning_rate": 0.00015301318267419963,
+      "loss": 1.7896,
       "step": 500
     },
     {
       "epoch": 0.847457627118644,
+      "grad_norm": 4.983877658843994,
       "learning_rate": 0.0001435969868173258,
+      "loss": 1.6968,
       "step": 600
     },
     {
       "epoch": 0.9887005649717514,
+      "grad_norm": 8.334493637084961,
       "learning_rate": 0.00013418079096045197,
+      "loss": 1.5862,
       "step": 700
     },
     {
       "epoch": 1.0,
+      "eval_accuracy": 0.47357939458311205,
+      "eval_loss": 1.6120948791503906,
+      "eval_runtime": 59.1685,
+      "eval_samples_per_second": 127.297,
+      "eval_steps_per_second": 7.96,
       "step": 708
     },
     {
       "epoch": 1.1299435028248588,
+      "grad_norm": 6.713998794555664,
       "learning_rate": 0.00012476459510357815,
+      "loss": 1.5267,
       "step": 800
     },
     {
       "epoch": 1.271186440677966,
+      "grad_norm": 4.822694778442383,
       "learning_rate": 0.00011534839924670434,
+      "loss": 1.4934,
       "step": 900
     },
     {
       "epoch": 1.4124293785310735,
+      "grad_norm": 4.339609146118164,
       "learning_rate": 0.00010593220338983052,
+      "loss": 1.4728,
       "step": 1000
     },
     {
       "epoch": 1.5536723163841808,
+      "grad_norm": 3.8593039512634277,
       "learning_rate": 9.651600753295669e-05,
+      "loss": 1.4145,
       "step": 1100
     },
     {
       "epoch": 1.694915254237288,
+      "grad_norm": 4.826875686645508,
       "learning_rate": 8.709981167608286e-05,
+      "loss": 1.3815,
       "step": 1200
     },
     {
       "epoch": 1.8361581920903953,
+      "grad_norm": 4.669344902038574,
       "learning_rate": 7.768361581920904e-05,
+      "loss": 1.4495,
       "step": 1300
     },
     {
       "epoch": 1.9774011299435028,
+      "grad_norm": 4.768439769744873,
       "learning_rate": 6.826741996233523e-05,
+      "loss": 1.3959,
       "step": 1400
     },
     {
       "epoch": 2.0,
+      "eval_accuracy": 0.54182156133829,
+      "eval_loss": 1.4169427156448364,
+      "eval_runtime": 59.3986,
+      "eval_samples_per_second": 126.804,
+      "eval_steps_per_second": 7.929,
       "step": 1416
     },
     {
       "epoch": 2.1186440677966103,
+      "grad_norm": 3.956120491027832,
       "learning_rate": 5.88512241054614e-05,
+      "loss": 1.3278,
       "step": 1500
     },
     {
       "epoch": 2.2598870056497176,
+      "grad_norm": 4.364845275878906,
       "learning_rate": 4.9435028248587575e-05,
+      "loss": 1.3065,
       "step": 1600
     },
     {
       "epoch": 2.401129943502825,
+      "grad_norm": 7.486156463623047,
       "learning_rate": 4.001883239171375e-05,
+      "loss": 1.305,
       "step": 1700
     },
     {
       "epoch": 2.542372881355932,
+      "grad_norm": 5.2779693603515625,
       "learning_rate": 3.060263653483992e-05,
+      "loss": 1.2618,
       "step": 1800
     },
     {
       "epoch": 2.68361581920904,
+      "grad_norm": 6.177374839782715,
       "learning_rate": 2.1186440677966103e-05,
+      "loss": 1.2691,
       "step": 1900
     },
     {
       "epoch": 2.824858757062147,
+      "grad_norm": 6.994251251220703,
       "learning_rate": 1.1770244821092279e-05,
+      "loss": 1.2931,
       "step": 2000
     },
     {
       "epoch": 2.9661016949152543,
+      "grad_norm": 5.824560642242432,
       "learning_rate": 2.3540489642184557e-06,
+      "loss": 1.25,
       "step": 2100
     },
     {
       "epoch": 3.0,
+      "eval_accuracy": 0.5742166755177908,
+      "eval_loss": 1.346989393234253,
+      "eval_runtime": 59.373,
+      "eval_samples_per_second": 126.859,
+      "eval_steps_per_second": 7.933,
       "step": 2124
     }
   ],

checkpoint-708/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d23c675dc12816a7a6a43c6ffeaadd521d986b474193565a427d76fc72bd49a6
 size 1248048

 version https://git-lfs.github.com/spec/v1
+oid sha256:549011656f93fc51cc72d23359bc8e21b3dd81250e615d92d34451e3e8b99002
 size 1248048

checkpoint-708/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:572ac0046da1bb8df53d3a9e0f4eaafc20af278b9d312ddb2a20d3b54d7d3ad0
 size 2525771

 version https://git-lfs.github.com/spec/v1
+oid sha256:bd585be2ff94003b8f997489cb0bcf893d95af22684e17f820fafeeceb422bac
 size 2525771

checkpoint-708/scaler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0c592f329dc70c5676fffd35ef12a6c61d92ef0f0adf8134964e0033a1eb7e49
 size 1383

 version https://git-lfs.github.com/spec/v1
+oid sha256:0891fd11350acd22ac0c1e453dacc4966f9ea6e3940a6a560d05315fbefb6f3b
 size 1383

checkpoint-708/trainer_state.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "best_global_step": 708,
-  "best_metric": 0.4633563462559745,
   "best_model_checkpoint": "./bert-lora-newsgroups/checkpoint-708",
   "epoch": 1.0,
   "eval_steps": 500,
@@ -11,60 +11,60 @@
   "log_history": [
     {
       "epoch": 0.14124293785310735,
-      "grad_norm": 4.030381679534912,
       "learning_rate": 0.00019067796610169492,
-      "loss": 2.9403,
       "step": 100
     },
     {
       "epoch": 0.2824858757062147,
-      "grad_norm": 5.417972564697266,
       "learning_rate": 0.0001812617702448211,
-      "loss": 2.5091,
       "step": 200
     },
     {
       "epoch": 0.423728813559322,
-      "grad_norm": 4.303308486938477,
       "learning_rate": 0.00017184557438794729,
-      "loss": 2.1815,
       "step": 300
     },
     {
       "epoch": 0.5649717514124294,
-      "grad_norm": 5.2155537605285645,
       "learning_rate": 0.00016242937853107344,
-      "loss": 1.9917,
       "step": 400
     },
     {
       "epoch": 0.7062146892655368,
-      "grad_norm": 5.118275165557861,
       "learning_rate": 0.00015301318267419963,
-      "loss": 1.9013,
       "step": 500
     },
     {
       "epoch": 0.847457627118644,
-      "grad_norm": 5.051901817321777,
       "learning_rate": 0.0001435969868173258,
-      "loss": 1.7503,
       "step": 600
     },
     {
       "epoch": 0.9887005649717514,
-      "grad_norm": 5.84205961227417,
       "learning_rate": 0.00013418079096045197,
-      "loss": 1.6455,
       "step": 700
     },
     {
       "epoch": 1.0,
-      "eval_accuracy": 0.4633563462559745,
-      "eval_loss": 1.6542061567306519,
-      "eval_runtime": 61.4551,
-      "eval_samples_per_second": 122.561,
-      "eval_steps_per_second": 7.664,
       "step": 708
     }
   ],

 {
   "best_global_step": 708,
+  "best_metric": 0.47357939458311205,
   "best_model_checkpoint": "./bert-lora-newsgroups/checkpoint-708",
   "epoch": 1.0,
   "eval_steps": 500,
   "log_history": [
     {
       "epoch": 0.14124293785310735,
+      "grad_norm": 4.619589805603027,
       "learning_rate": 0.00019067796610169492,
+      "loss": 2.9563,
       "step": 100
     },
     {
       "epoch": 0.2824858757062147,
+      "grad_norm": 4.369770526885986,
       "learning_rate": 0.0001812617702448211,
+      "loss": 2.4516,
       "step": 200
     },
     {
       "epoch": 0.423728813559322,
+      "grad_norm": 4.454040050506592,
       "learning_rate": 0.00017184557438794729,
+      "loss": 2.1271,
       "step": 300
     },
     {
       "epoch": 0.5649717514124294,
+      "grad_norm": 5.155438423156738,
       "learning_rate": 0.00016242937853107344,
+      "loss": 1.9125,
       "step": 400
     },
     {
       "epoch": 0.7062146892655368,
+      "grad_norm": 4.892629623413086,
       "learning_rate": 0.00015301318267419963,
+      "loss": 1.7896,
       "step": 500
     },
     {
       "epoch": 0.847457627118644,
+      "grad_norm": 4.983877658843994,
       "learning_rate": 0.0001435969868173258,
+      "loss": 1.6968,
       "step": 600
     },
     {
       "epoch": 0.9887005649717514,
+      "grad_norm": 8.334493637084961,
       "learning_rate": 0.00013418079096045197,
+      "loss": 1.5862,
       "step": 700
     },
     {
       "epoch": 1.0,
+      "eval_accuracy": 0.47357939458311205,
+      "eval_loss": 1.6120948791503906,
+      "eval_runtime": 59.1685,
+      "eval_samples_per_second": 127.297,
+      "eval_steps_per_second": 7.96,
       "step": 708
     }
   ],