Ali Assi commited on
Commit
ec14de0
·
verified ·
1 Parent(s): 71fc12e

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,129 +1,206 @@
1
  ---
2
- language: en
3
- license: apache-2.0
4
  tags:
5
- - text-classification
6
- - bert
7
  - lora
8
- - peft
9
- - 20-newsgroups
10
- datasets:
11
- - SetFit/20_newsgroups
12
- base_model: bert-base-uncased
13
- metrics:
14
- - accuracy
15
- model-index:
16
- - name: bert-lora-20newsgroups
17
- results:
18
- - task:
19
- type: text-classification
20
- name: Text Classification
21
- dataset:
22
- name: 20 Newsgroups
23
- type: SetFit/20_newsgroups
24
- metrics:
25
- - type: accuracy
26
- value: 0.82
27
- name: Accuracy
28
  ---
29
 
30
- # BERT-LoRA for 20 Newsgroups Classification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
- ## Model Description
 
 
 
 
33
 
34
- This model is a **BERT-base-uncased** fine-tuned with **LoRA (Low-Rank Adaptation)** for multi-class text classification on the 20 Newsgroups dataset.
35
 
36
- - **Base Model:** bert-base-uncased
37
- - **Method:** LoRA (Parameter-Efficient Fine-Tuning)
38
- - **Task:** Multi-class text classification (20 categories)
39
- - **Dataset:** 20 Newsgroups (~11K training, ~7K test samples)
40
- - **Trainable Parameters:** ~300K (0.3% of total)
41
- - **Adapter Size:** ~2 MB
42
 
43
- ## Categories
44
 
45
- The model classifies text into 20 newsgroup topics:
46
- - `alt.atheism`, `comp.graphics`, `comp.os.ms-windows.misc`, `comp.sys.ibm.pc.hardware`
47
- - `comp.sys.mac.hardware`, `comp.windows.x`, `misc.forsale`, `rec.autos`
48
- - `rec.motorcycles`, `rec.sport.baseball`, `rec.sport.hockey`, `sci.crypt`
49
- - `sci.electronics`, `sci.med`, `sci.space`, `soc.religion.christian`
50
- - `talk.politics.guns`, `talk.politics.mideast`, `talk.politics.misc`, `talk.religion.misc`
51
 
52
- ## Usage
53
 
54
- ### Installation
55
 
56
- ```bash
57
- pip install transformers peft torch
58
- ```
59
 
60
- ### Load Model
61
 
62
- ```python
63
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
64
- from peft import PeftModel
65
- import torch
66
 
67
- # Load tokenizer
68
- tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
69
 
70
- # Load base model
71
- base_model = AutoModelForSequenceClassification.from_pretrained(
72
- "bert-base-uncased",
73
- num_labels=20
74
- )
75
 
76
- # Load LoRA adapters
77
- model = PeftModel.from_pretrained(base_model, "alialialialaiali/bert-lora-20newsgroups")
78
- model.eval()
79
- ```
80
 
81
- ### Make Predictions
82
 
83
- ```python
84
- text = "NASA announced a new mission to Mars with advanced rovers."
85
 
86
- inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
87
 
88
- with torch.no_grad():
89
- outputs = model(**inputs)
90
- prediction = outputs.logits.argmax(-1).item()
91
 
92
- categories = [
93
- "alt.atheism", "comp.graphics", "comp.os.ms-windows.misc",
94
- "comp.sys.ibm.pc.hardware", "comp.sys.mac.hardware", "comp.windows.x",
95
- "misc.forsale", "rec.autos", "rec.motorcycles", "rec.sport.baseball",
96
- "rec.sport.hockey", "sci.crypt", "sci.electronics", "sci.med",
97
- "sci.space", "soc.religion.christian", "talk.politics.guns",
98
- "talk.politics.mideast", "talk.politics.misc", "talk.religion.misc"
99
- ]
100
 
101
- print(f"Predicted category: {categories[prediction]}")
102
- # Output: sci.space
103
- ```
104
 
105
- ## Why LoRA?
106
 
107
- LoRA provides:
108
- - **99% smaller model size** (2 MB vs 440 MB)
109
- - **100x fewer trainable parameters** (300K vs 110M)
110
- - **Faster training** (15 min vs 2+ hours)
111
- - **Same accuracy** as full fine-tuning (~82%)
112
 
113
- Perfect for deployment, experimentation, and resource-constrained environments.
114
 
115
- ## Citation
116
 
117
- ```bibtex
118
- @misc{bert-lora-20newsgroups,
119
- author = {Your Name},
120
- title = {BERT-LoRA for 20 Newsgroups Classification},
121
- year = {2024},
122
- publisher = {HuggingFace},
123
- howpublished = {\url{https://huggingface.co/your-username/bert-lora-20newsgroups}}
124
- }
125
- ```
126
 
127
- ## License
 
128
 
129
- Apache 2.0 (following base BERT model license)
 
1
  ---
2
+ base_model: bert-base-uncased
3
+ library_name: peft
4
  tags:
5
+ - base_model:adapter:bert-base-uncased
 
6
  - lora
7
+ - transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
+ # Model Card for Model ID
11
+
12
+ <!-- Provide a quick summary of what the model is/does. -->
13
+
14
+
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+
20
+ <!-- Provide a longer summary of what this model is. -->
21
+
22
+
23
+
24
+ - **Developed by:** [More Information Needed]
25
+ - **Funded by [optional]:** [More Information Needed]
26
+ - **Shared by [optional]:** [More Information Needed]
27
+ - **Model type:** [More Information Needed]
28
+ - **Language(s) (NLP):** [More Information Needed]
29
+ - **License:** [More Information Needed]
30
+ - **Finetuned from model [optional]:** [More Information Needed]
31
+
32
+ ### Model Sources [optional]
33
+
34
+ <!-- Provide the basic links for the model. -->
35
+
36
+ - **Repository:** [More Information Needed]
37
+ - **Paper [optional]:** [More Information Needed]
38
+ - **Demo [optional]:** [More Information Needed]
39
+
40
+ ## Uses
41
+
42
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
43
+
44
+ ### Direct Use
45
+
46
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
47
+
48
+ [More Information Needed]
49
+
50
+ ### Downstream Use [optional]
51
+
52
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
53
+
54
+ [More Information Needed]
55
+
56
+ ### Out-of-Scope Use
57
+
58
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
59
+
60
+ [More Information Needed]
61
+
62
+ ## Bias, Risks, and Limitations
63
+
64
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
65
+
66
+ [More Information Needed]
67
+
68
+ ### Recommendations
69
+
70
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
71
+
72
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
73
+
74
+ ## How to Get Started with the Model
75
+
76
+ Use the code below to get started with the model.
77
+
78
+ [More Information Needed]
79
+
80
+ ## Training Details
81
+
82
+ ### Training Data
83
+
84
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
85
+
86
+ [More Information Needed]
87
+
88
+ ### Training Procedure
89
+
90
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
91
+
92
+ #### Preprocessing [optional]
93
+
94
+ [More Information Needed]
95
+
96
+
97
+ #### Training Hyperparameters
98
+
99
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
100
+
101
+ #### Speeds, Sizes, Times [optional]
102
+
103
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
104
+
105
+ [More Information Needed]
106
+
107
+ ## Evaluation
108
+
109
+ <!-- This section describes the evaluation protocols and provides the results. -->
110
+
111
+ ### Testing Data, Factors & Metrics
112
+
113
+ #### Testing Data
114
+
115
+ <!-- This should link to a Dataset Card if possible. -->
116
+
117
+ [More Information Needed]
118
+
119
+ #### Factors
120
+
121
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
122
+
123
+ [More Information Needed]
124
+
125
+ #### Metrics
126
+
127
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
128
+
129
+ [More Information Needed]
130
+
131
+ ### Results
132
+
133
+ [More Information Needed]
134
+
135
+ #### Summary
136
+
137
+
138
+
139
+ ## Model Examination [optional]
140
+
141
+ <!-- Relevant interpretability work for the model goes here -->
142
+
143
+ [More Information Needed]
144
+
145
+ ## Environmental Impact
146
+
147
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
148
+
149
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
150
 
151
+ - **Hardware Type:** [More Information Needed]
152
+ - **Hours used:** [More Information Needed]
153
+ - **Cloud Provider:** [More Information Needed]
154
+ - **Compute Region:** [More Information Needed]
155
+ - **Carbon Emitted:** [More Information Needed]
156
 
157
+ ## Technical Specifications [optional]
158
 
159
+ ### Model Architecture and Objective
 
 
 
 
 
160
 
161
+ [More Information Needed]
162
 
163
+ ### Compute Infrastructure
 
 
 
 
 
164
 
165
+ [More Information Needed]
166
 
167
+ #### Hardware
168
 
169
+ [More Information Needed]
 
 
170
 
171
+ #### Software
172
 
173
+ [More Information Needed]
 
 
 
174
 
175
+ ## Citation [optional]
 
176
 
177
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
 
 
 
178
 
179
+ **BibTeX:**
 
 
 
180
 
181
+ [More Information Needed]
182
 
183
+ **APA:**
 
184
 
185
+ [More Information Needed]
186
 
187
+ ## Glossary [optional]
 
 
188
 
189
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 
 
 
 
 
 
 
190
 
191
+ [More Information Needed]
 
 
192
 
193
+ ## More Information [optional]
194
 
195
+ [More Information Needed]
 
 
 
 
196
 
197
+ ## Model Card Authors [optional]
198
 
199
+ [More Information Needed]
200
 
201
+ ## Model Card Contact
 
 
 
 
 
 
 
 
202
 
203
+ [More Information Needed]
204
+ ### Framework versions
205
 
206
+ - PEFT 0.18.0
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e9e4d8228f0562591a714cfbf9221c349a3177a65e8a99cb8e6aa999dc3aa8b9
3
  size 1248048
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69614487361448ac6f44cb9a64edbfa931a38fe2f6edc52f913d4550dbd62074
3
  size 1248048
checkpoint-1416/adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:66aefa940619a9f9eb66e12d0603a8fba5b82b49ab35cba4a162d06aefe133c8
3
  size 1248048
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d422d38d19ea46fb5984a56f291ad3bdcb738fbb30dfa2811240b4dcb6057cd9
3
  size 1248048
checkpoint-1416/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:368796f41898a0bde6df4fd0d5231ceddc59fad592fff171e0c2160d2e7c1349
3
  size 2525771
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20b8ec4cc54c20180a510cc920959ebd12a2ef67bfbe6033b68585f7058bfff3
3
  size 2525771
checkpoint-1416/scaler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6174a0364b7b63dec3190ee042579706a2fb2134581c334e4f17057f0ee66353
3
  size 1383
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:095aa02adb069a3c22551d16a914393ee95dfcb22f2e4d00568ad9f4e17128dd
3
  size 1383
checkpoint-1416/trainer_state.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "best_global_step": 1416,
3
- "best_metric": 0.5379713223579394,
4
  "best_model_checkpoint": "./bert-lora-newsgroups/checkpoint-1416",
5
  "epoch": 2.0,
6
  "eval_steps": 500,
@@ -11,118 +11,118 @@
11
  "log_history": [
12
  {
13
  "epoch": 0.14124293785310735,
14
- "grad_norm": 4.030381679534912,
15
  "learning_rate": 0.00019067796610169492,
16
- "loss": 2.9403,
17
  "step": 100
18
  },
19
  {
20
  "epoch": 0.2824858757062147,
21
- "grad_norm": 5.417972564697266,
22
  "learning_rate": 0.0001812617702448211,
23
- "loss": 2.5091,
24
  "step": 200
25
  },
26
  {
27
  "epoch": 0.423728813559322,
28
- "grad_norm": 4.303308486938477,
29
  "learning_rate": 0.00017184557438794729,
30
- "loss": 2.1815,
31
  "step": 300
32
  },
33
  {
34
  "epoch": 0.5649717514124294,
35
- "grad_norm": 5.2155537605285645,
36
  "learning_rate": 0.00016242937853107344,
37
- "loss": 1.9917,
38
  "step": 400
39
  },
40
  {
41
  "epoch": 0.7062146892655368,
42
- "grad_norm": 5.118275165557861,
43
  "learning_rate": 0.00015301318267419963,
44
- "loss": 1.9013,
45
  "step": 500
46
  },
47
  {
48
  "epoch": 0.847457627118644,
49
- "grad_norm": 5.051901817321777,
50
  "learning_rate": 0.0001435969868173258,
51
- "loss": 1.7503,
52
  "step": 600
53
  },
54
  {
55
  "epoch": 0.9887005649717514,
56
- "grad_norm": 5.84205961227417,
57
  "learning_rate": 0.00013418079096045197,
58
- "loss": 1.6455,
59
  "step": 700
60
  },
61
  {
62
  "epoch": 1.0,
63
- "eval_accuracy": 0.4633563462559745,
64
- "eval_loss": 1.6542061567306519,
65
- "eval_runtime": 61.4551,
66
- "eval_samples_per_second": 122.561,
67
- "eval_steps_per_second": 7.664,
68
  "step": 708
69
  },
70
  {
71
  "epoch": 1.1299435028248588,
72
- "grad_norm": 5.543461322784424,
73
  "learning_rate": 0.00012476459510357815,
74
- "loss": 1.5649,
75
  "step": 800
76
  },
77
  {
78
  "epoch": 1.271186440677966,
79
- "grad_norm": 4.303509712219238,
80
  "learning_rate": 0.00011534839924670434,
81
- "loss": 1.5064,
82
  "step": 900
83
  },
84
  {
85
  "epoch": 1.4124293785310735,
86
- "grad_norm": 5.7473578453063965,
87
  "learning_rate": 0.00010593220338983052,
88
- "loss": 1.4491,
89
  "step": 1000
90
  },
91
  {
92
  "epoch": 1.5536723163841808,
93
- "grad_norm": 4.23817777633667,
94
  "learning_rate": 9.651600753295669e-05,
95
- "loss": 1.4401,
96
  "step": 1100
97
  },
98
  {
99
  "epoch": 1.694915254237288,
100
- "grad_norm": 5.211511611938477,
101
  "learning_rate": 8.709981167608286e-05,
102
- "loss": 1.3813,
103
  "step": 1200
104
  },
105
  {
106
  "epoch": 1.8361581920903953,
107
- "grad_norm": 4.599623203277588,
108
  "learning_rate": 7.768361581920904e-05,
109
- "loss": 1.4401,
110
  "step": 1300
111
  },
112
  {
113
  "epoch": 1.9774011299435028,
114
- "grad_norm": 3.9729771614074707,
115
  "learning_rate": 6.826741996233523e-05,
116
- "loss": 1.3853,
117
  "step": 1400
118
  },
119
  {
120
  "epoch": 2.0,
121
- "eval_accuracy": 0.5379713223579394,
122
- "eval_loss": 1.4326964616775513,
123
- "eval_runtime": 61.5418,
124
- "eval_samples_per_second": 122.388,
125
- "eval_steps_per_second": 7.653,
126
  "step": 1416
127
  }
128
  ],
 
1
  {
2
  "best_global_step": 1416,
3
+ "best_metric": 0.54182156133829,
4
  "best_model_checkpoint": "./bert-lora-newsgroups/checkpoint-1416",
5
  "epoch": 2.0,
6
  "eval_steps": 500,
 
11
  "log_history": [
12
  {
13
  "epoch": 0.14124293785310735,
14
+ "grad_norm": 4.619589805603027,
15
  "learning_rate": 0.00019067796610169492,
16
+ "loss": 2.9563,
17
  "step": 100
18
  },
19
  {
20
  "epoch": 0.2824858757062147,
21
+ "grad_norm": 4.369770526885986,
22
  "learning_rate": 0.0001812617702448211,
23
+ "loss": 2.4516,
24
  "step": 200
25
  },
26
  {
27
  "epoch": 0.423728813559322,
28
+ "grad_norm": 4.454040050506592,
29
  "learning_rate": 0.00017184557438794729,
30
+ "loss": 2.1271,
31
  "step": 300
32
  },
33
  {
34
  "epoch": 0.5649717514124294,
35
+ "grad_norm": 5.155438423156738,
36
  "learning_rate": 0.00016242937853107344,
37
+ "loss": 1.9125,
38
  "step": 400
39
  },
40
  {
41
  "epoch": 0.7062146892655368,
42
+ "grad_norm": 4.892629623413086,
43
  "learning_rate": 0.00015301318267419963,
44
+ "loss": 1.7896,
45
  "step": 500
46
  },
47
  {
48
  "epoch": 0.847457627118644,
49
+ "grad_norm": 4.983877658843994,
50
  "learning_rate": 0.0001435969868173258,
51
+ "loss": 1.6968,
52
  "step": 600
53
  },
54
  {
55
  "epoch": 0.9887005649717514,
56
+ "grad_norm": 8.334493637084961,
57
  "learning_rate": 0.00013418079096045197,
58
+ "loss": 1.5862,
59
  "step": 700
60
  },
61
  {
62
  "epoch": 1.0,
63
+ "eval_accuracy": 0.47357939458311205,
64
+ "eval_loss": 1.6120948791503906,
65
+ "eval_runtime": 59.1685,
66
+ "eval_samples_per_second": 127.297,
67
+ "eval_steps_per_second": 7.96,
68
  "step": 708
69
  },
70
  {
71
  "epoch": 1.1299435028248588,
72
+ "grad_norm": 6.713998794555664,
73
  "learning_rate": 0.00012476459510357815,
74
+ "loss": 1.5267,
75
  "step": 800
76
  },
77
  {
78
  "epoch": 1.271186440677966,
79
+ "grad_norm": 4.822694778442383,
80
  "learning_rate": 0.00011534839924670434,
81
+ "loss": 1.4934,
82
  "step": 900
83
  },
84
  {
85
  "epoch": 1.4124293785310735,
86
+ "grad_norm": 4.339609146118164,
87
  "learning_rate": 0.00010593220338983052,
88
+ "loss": 1.4728,
89
  "step": 1000
90
  },
91
  {
92
  "epoch": 1.5536723163841808,
93
+ "grad_norm": 3.8593039512634277,
94
  "learning_rate": 9.651600753295669e-05,
95
+ "loss": 1.4145,
96
  "step": 1100
97
  },
98
  {
99
  "epoch": 1.694915254237288,
100
+ "grad_norm": 4.826875686645508,
101
  "learning_rate": 8.709981167608286e-05,
102
+ "loss": 1.3815,
103
  "step": 1200
104
  },
105
  {
106
  "epoch": 1.8361581920903953,
107
+ "grad_norm": 4.669344902038574,
108
  "learning_rate": 7.768361581920904e-05,
109
+ "loss": 1.4495,
110
  "step": 1300
111
  },
112
  {
113
  "epoch": 1.9774011299435028,
114
+ "grad_norm": 4.768439769744873,
115
  "learning_rate": 6.826741996233523e-05,
116
+ "loss": 1.3959,
117
  "step": 1400
118
  },
119
  {
120
  "epoch": 2.0,
121
+ "eval_accuracy": 0.54182156133829,
122
+ "eval_loss": 1.4169427156448364,
123
+ "eval_runtime": 59.3986,
124
+ "eval_samples_per_second": 126.804,
125
+ "eval_steps_per_second": 7.929,
126
  "step": 1416
127
  }
128
  ],
checkpoint-2124/adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e9e4d8228f0562591a714cfbf9221c349a3177a65e8a99cb8e6aa999dc3aa8b9
3
  size 1248048
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69614487361448ac6f44cb9a64edbfa931a38fe2f6edc52f913d4550dbd62074
3
  size 1248048
checkpoint-2124/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:80695a6e4bcaba681c66fc00cf28fbcbdcbae5feb36f3600fe535a0683f666e3
3
  size 2525771
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f6f391741835ef82bea84849674667449cac8c88bfd85169f23569bb00357b8
3
  size 2525771
checkpoint-2124/scaler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:892ec425c2ad3d890afc4a30cd25cc06fe08542e6227a06b4f0c45a4de576716
3
  size 1383
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d0c2fc6768514eef43c0b00e557d40961600f8d31084be0d527d115b41589fc
3
  size 1383
checkpoint-2124/trainer_state.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "best_global_step": 2124,
3
- "best_metric": 0.5643919277748274,
4
  "best_model_checkpoint": "./bert-lora-newsgroups/checkpoint-2124",
5
  "epoch": 3.0,
6
  "eval_steps": 500,
@@ -11,176 +11,176 @@
11
  "log_history": [
12
  {
13
  "epoch": 0.14124293785310735,
14
- "grad_norm": 4.030381679534912,
15
  "learning_rate": 0.00019067796610169492,
16
- "loss": 2.9403,
17
  "step": 100
18
  },
19
  {
20
  "epoch": 0.2824858757062147,
21
- "grad_norm": 5.417972564697266,
22
  "learning_rate": 0.0001812617702448211,
23
- "loss": 2.5091,
24
  "step": 200
25
  },
26
  {
27
  "epoch": 0.423728813559322,
28
- "grad_norm": 4.303308486938477,
29
  "learning_rate": 0.00017184557438794729,
30
- "loss": 2.1815,
31
  "step": 300
32
  },
33
  {
34
  "epoch": 0.5649717514124294,
35
- "grad_norm": 5.2155537605285645,
36
  "learning_rate": 0.00016242937853107344,
37
- "loss": 1.9917,
38
  "step": 400
39
  },
40
  {
41
  "epoch": 0.7062146892655368,
42
- "grad_norm": 5.118275165557861,
43
  "learning_rate": 0.00015301318267419963,
44
- "loss": 1.9013,
45
  "step": 500
46
  },
47
  {
48
  "epoch": 0.847457627118644,
49
- "grad_norm": 5.051901817321777,
50
  "learning_rate": 0.0001435969868173258,
51
- "loss": 1.7503,
52
  "step": 600
53
  },
54
  {
55
  "epoch": 0.9887005649717514,
56
- "grad_norm": 5.84205961227417,
57
  "learning_rate": 0.00013418079096045197,
58
- "loss": 1.6455,
59
  "step": 700
60
  },
61
  {
62
  "epoch": 1.0,
63
- "eval_accuracy": 0.4633563462559745,
64
- "eval_loss": 1.6542061567306519,
65
- "eval_runtime": 61.4551,
66
- "eval_samples_per_second": 122.561,
67
- "eval_steps_per_second": 7.664,
68
  "step": 708
69
  },
70
  {
71
  "epoch": 1.1299435028248588,
72
- "grad_norm": 5.543461322784424,
73
  "learning_rate": 0.00012476459510357815,
74
- "loss": 1.5649,
75
  "step": 800
76
  },
77
  {
78
  "epoch": 1.271186440677966,
79
- "grad_norm": 4.303509712219238,
80
  "learning_rate": 0.00011534839924670434,
81
- "loss": 1.5064,
82
  "step": 900
83
  },
84
  {
85
  "epoch": 1.4124293785310735,
86
- "grad_norm": 5.7473578453063965,
87
  "learning_rate": 0.00010593220338983052,
88
- "loss": 1.4491,
89
  "step": 1000
90
  },
91
  {
92
  "epoch": 1.5536723163841808,
93
- "grad_norm": 4.23817777633667,
94
  "learning_rate": 9.651600753295669e-05,
95
- "loss": 1.4401,
96
  "step": 1100
97
  },
98
  {
99
  "epoch": 1.694915254237288,
100
- "grad_norm": 5.211511611938477,
101
  "learning_rate": 8.709981167608286e-05,
102
- "loss": 1.3813,
103
  "step": 1200
104
  },
105
  {
106
  "epoch": 1.8361581920903953,
107
- "grad_norm": 4.599623203277588,
108
  "learning_rate": 7.768361581920904e-05,
109
- "loss": 1.4401,
110
  "step": 1300
111
  },
112
  {
113
  "epoch": 1.9774011299435028,
114
- "grad_norm": 3.9729771614074707,
115
  "learning_rate": 6.826741996233523e-05,
116
- "loss": 1.3853,
117
  "step": 1400
118
  },
119
  {
120
  "epoch": 2.0,
121
- "eval_accuracy": 0.5379713223579394,
122
- "eval_loss": 1.4326964616775513,
123
- "eval_runtime": 61.5418,
124
- "eval_samples_per_second": 122.388,
125
- "eval_steps_per_second": 7.653,
126
  "step": 1416
127
  },
128
  {
129
  "epoch": 2.1186440677966103,
130
- "grad_norm": 4.936285018920898,
131
  "learning_rate": 5.88512241054614e-05,
132
- "loss": 1.3455,
133
  "step": 1500
134
  },
135
  {
136
  "epoch": 2.2598870056497176,
137
- "grad_norm": 3.9144532680511475,
138
  "learning_rate": 4.9435028248587575e-05,
139
- "loss": 1.303,
140
  "step": 1600
141
  },
142
  {
143
  "epoch": 2.401129943502825,
144
- "grad_norm": 6.503249168395996,
145
  "learning_rate": 4.001883239171375e-05,
146
- "loss": 1.2968,
147
  "step": 1700
148
  },
149
  {
150
  "epoch": 2.542372881355932,
151
- "grad_norm": 4.896490573883057,
152
  "learning_rate": 3.060263653483992e-05,
153
- "loss": 1.2855,
154
  "step": 1800
155
  },
156
  {
157
  "epoch": 2.68361581920904,
158
- "grad_norm": 5.819763660430908,
159
  "learning_rate": 2.1186440677966103e-05,
160
- "loss": 1.2719,
161
  "step": 1900
162
  },
163
  {
164
  "epoch": 2.824858757062147,
165
- "grad_norm": 8.788325309753418,
166
  "learning_rate": 1.1770244821092279e-05,
167
- "loss": 1.296,
168
  "step": 2000
169
  },
170
  {
171
  "epoch": 2.9661016949152543,
172
- "grad_norm": 5.9179792404174805,
173
  "learning_rate": 2.3540489642184557e-06,
174
- "loss": 1.2127,
175
  "step": 2100
176
  },
177
  {
178
  "epoch": 3.0,
179
- "eval_accuracy": 0.5643919277748274,
180
- "eval_loss": 1.3525854349136353,
181
- "eval_runtime": 61.1908,
182
- "eval_samples_per_second": 123.09,
183
- "eval_steps_per_second": 7.697,
184
  "step": 2124
185
  }
186
  ],
 
1
  {
2
  "best_global_step": 2124,
3
+ "best_metric": 0.5742166755177908,
4
  "best_model_checkpoint": "./bert-lora-newsgroups/checkpoint-2124",
5
  "epoch": 3.0,
6
  "eval_steps": 500,
 
11
  "log_history": [
12
  {
13
  "epoch": 0.14124293785310735,
14
+ "grad_norm": 4.619589805603027,
15
  "learning_rate": 0.00019067796610169492,
16
+ "loss": 2.9563,
17
  "step": 100
18
  },
19
  {
20
  "epoch": 0.2824858757062147,
21
+ "grad_norm": 4.369770526885986,
22
  "learning_rate": 0.0001812617702448211,
23
+ "loss": 2.4516,
24
  "step": 200
25
  },
26
  {
27
  "epoch": 0.423728813559322,
28
+ "grad_norm": 4.454040050506592,
29
  "learning_rate": 0.00017184557438794729,
30
+ "loss": 2.1271,
31
  "step": 300
32
  },
33
  {
34
  "epoch": 0.5649717514124294,
35
+ "grad_norm": 5.155438423156738,
36
  "learning_rate": 0.00016242937853107344,
37
+ "loss": 1.9125,
38
  "step": 400
39
  },
40
  {
41
  "epoch": 0.7062146892655368,
42
+ "grad_norm": 4.892629623413086,
43
  "learning_rate": 0.00015301318267419963,
44
+ "loss": 1.7896,
45
  "step": 500
46
  },
47
  {
48
  "epoch": 0.847457627118644,
49
+ "grad_norm": 4.983877658843994,
50
  "learning_rate": 0.0001435969868173258,
51
+ "loss": 1.6968,
52
  "step": 600
53
  },
54
  {
55
  "epoch": 0.9887005649717514,
56
+ "grad_norm": 8.334493637084961,
57
  "learning_rate": 0.00013418079096045197,
58
+ "loss": 1.5862,
59
  "step": 700
60
  },
61
  {
62
  "epoch": 1.0,
63
+ "eval_accuracy": 0.47357939458311205,
64
+ "eval_loss": 1.6120948791503906,
65
+ "eval_runtime": 59.1685,
66
+ "eval_samples_per_second": 127.297,
67
+ "eval_steps_per_second": 7.96,
68
  "step": 708
69
  },
70
  {
71
  "epoch": 1.1299435028248588,
72
+ "grad_norm": 6.713998794555664,
73
  "learning_rate": 0.00012476459510357815,
74
+ "loss": 1.5267,
75
  "step": 800
76
  },
77
  {
78
  "epoch": 1.271186440677966,
79
+ "grad_norm": 4.822694778442383,
80
  "learning_rate": 0.00011534839924670434,
81
+ "loss": 1.4934,
82
  "step": 900
83
  },
84
  {
85
  "epoch": 1.4124293785310735,
86
+ "grad_norm": 4.339609146118164,
87
  "learning_rate": 0.00010593220338983052,
88
+ "loss": 1.4728,
89
  "step": 1000
90
  },
91
  {
92
  "epoch": 1.5536723163841808,
93
+ "grad_norm": 3.8593039512634277,
94
  "learning_rate": 9.651600753295669e-05,
95
+ "loss": 1.4145,
96
  "step": 1100
97
  },
98
  {
99
  "epoch": 1.694915254237288,
100
+ "grad_norm": 4.826875686645508,
101
  "learning_rate": 8.709981167608286e-05,
102
+ "loss": 1.3815,
103
  "step": 1200
104
  },
105
  {
106
  "epoch": 1.8361581920903953,
107
+ "grad_norm": 4.669344902038574,
108
  "learning_rate": 7.768361581920904e-05,
109
+ "loss": 1.4495,
110
  "step": 1300
111
  },
112
  {
113
  "epoch": 1.9774011299435028,
114
+ "grad_norm": 4.768439769744873,
115
  "learning_rate": 6.826741996233523e-05,
116
+ "loss": 1.3959,
117
  "step": 1400
118
  },
119
  {
120
  "epoch": 2.0,
121
+ "eval_accuracy": 0.54182156133829,
122
+ "eval_loss": 1.4169427156448364,
123
+ "eval_runtime": 59.3986,
124
+ "eval_samples_per_second": 126.804,
125
+ "eval_steps_per_second": 7.929,
126
  "step": 1416
127
  },
128
  {
129
  "epoch": 2.1186440677966103,
130
+ "grad_norm": 3.956120491027832,
131
  "learning_rate": 5.88512241054614e-05,
132
+ "loss": 1.3278,
133
  "step": 1500
134
  },
135
  {
136
  "epoch": 2.2598870056497176,
137
+ "grad_norm": 4.364845275878906,
138
  "learning_rate": 4.9435028248587575e-05,
139
+ "loss": 1.3065,
140
  "step": 1600
141
  },
142
  {
143
  "epoch": 2.401129943502825,
144
+ "grad_norm": 7.486156463623047,
145
  "learning_rate": 4.001883239171375e-05,
146
+ "loss": 1.305,
147
  "step": 1700
148
  },
149
  {
150
  "epoch": 2.542372881355932,
151
+ "grad_norm": 5.2779693603515625,
152
  "learning_rate": 3.060263653483992e-05,
153
+ "loss": 1.2618,
154
  "step": 1800
155
  },
156
  {
157
  "epoch": 2.68361581920904,
158
+ "grad_norm": 6.177374839782715,
159
  "learning_rate": 2.1186440677966103e-05,
160
+ "loss": 1.2691,
161
  "step": 1900
162
  },
163
  {
164
  "epoch": 2.824858757062147,
165
+ "grad_norm": 6.994251251220703,
166
  "learning_rate": 1.1770244821092279e-05,
167
+ "loss": 1.2931,
168
  "step": 2000
169
  },
170
  {
171
  "epoch": 2.9661016949152543,
172
+ "grad_norm": 5.824560642242432,
173
  "learning_rate": 2.3540489642184557e-06,
174
+ "loss": 1.25,
175
  "step": 2100
176
  },
177
  {
178
  "epoch": 3.0,
179
+ "eval_accuracy": 0.5742166755177908,
180
+ "eval_loss": 1.346989393234253,
181
+ "eval_runtime": 59.373,
182
+ "eval_samples_per_second": 126.859,
183
+ "eval_steps_per_second": 7.933,
184
  "step": 2124
185
  }
186
  ],
checkpoint-708/adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d23c675dc12816a7a6a43c6ffeaadd521d986b474193565a427d76fc72bd49a6
3
  size 1248048
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:549011656f93fc51cc72d23359bc8e21b3dd81250e615d92d34451e3e8b99002
3
  size 1248048
checkpoint-708/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:572ac0046da1bb8df53d3a9e0f4eaafc20af278b9d312ddb2a20d3b54d7d3ad0
3
  size 2525771
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd585be2ff94003b8f997489cb0bcf893d95af22684e17f820fafeeceb422bac
3
  size 2525771
checkpoint-708/scaler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0c592f329dc70c5676fffd35ef12a6c61d92ef0f0adf8134964e0033a1eb7e49
3
  size 1383
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0891fd11350acd22ac0c1e453dacc4966f9ea6e3940a6a560d05315fbefb6f3b
3
  size 1383
checkpoint-708/trainer_state.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "best_global_step": 708,
3
- "best_metric": 0.4633563462559745,
4
  "best_model_checkpoint": "./bert-lora-newsgroups/checkpoint-708",
5
  "epoch": 1.0,
6
  "eval_steps": 500,
@@ -11,60 +11,60 @@
11
  "log_history": [
12
  {
13
  "epoch": 0.14124293785310735,
14
- "grad_norm": 4.030381679534912,
15
  "learning_rate": 0.00019067796610169492,
16
- "loss": 2.9403,
17
  "step": 100
18
  },
19
  {
20
  "epoch": 0.2824858757062147,
21
- "grad_norm": 5.417972564697266,
22
  "learning_rate": 0.0001812617702448211,
23
- "loss": 2.5091,
24
  "step": 200
25
  },
26
  {
27
  "epoch": 0.423728813559322,
28
- "grad_norm": 4.303308486938477,
29
  "learning_rate": 0.00017184557438794729,
30
- "loss": 2.1815,
31
  "step": 300
32
  },
33
  {
34
  "epoch": 0.5649717514124294,
35
- "grad_norm": 5.2155537605285645,
36
  "learning_rate": 0.00016242937853107344,
37
- "loss": 1.9917,
38
  "step": 400
39
  },
40
  {
41
  "epoch": 0.7062146892655368,
42
- "grad_norm": 5.118275165557861,
43
  "learning_rate": 0.00015301318267419963,
44
- "loss": 1.9013,
45
  "step": 500
46
  },
47
  {
48
  "epoch": 0.847457627118644,
49
- "grad_norm": 5.051901817321777,
50
  "learning_rate": 0.0001435969868173258,
51
- "loss": 1.7503,
52
  "step": 600
53
  },
54
  {
55
  "epoch": 0.9887005649717514,
56
- "grad_norm": 5.84205961227417,
57
  "learning_rate": 0.00013418079096045197,
58
- "loss": 1.6455,
59
  "step": 700
60
  },
61
  {
62
  "epoch": 1.0,
63
- "eval_accuracy": 0.4633563462559745,
64
- "eval_loss": 1.6542061567306519,
65
- "eval_runtime": 61.4551,
66
- "eval_samples_per_second": 122.561,
67
- "eval_steps_per_second": 7.664,
68
  "step": 708
69
  }
70
  ],
 
1
  {
2
  "best_global_step": 708,
3
+ "best_metric": 0.47357939458311205,
4
  "best_model_checkpoint": "./bert-lora-newsgroups/checkpoint-708",
5
  "epoch": 1.0,
6
  "eval_steps": 500,
 
11
  "log_history": [
12
  {
13
  "epoch": 0.14124293785310735,
14
+ "grad_norm": 4.619589805603027,
15
  "learning_rate": 0.00019067796610169492,
16
+ "loss": 2.9563,
17
  "step": 100
18
  },
19
  {
20
  "epoch": 0.2824858757062147,
21
+ "grad_norm": 4.369770526885986,
22
  "learning_rate": 0.0001812617702448211,
23
+ "loss": 2.4516,
24
  "step": 200
25
  },
26
  {
27
  "epoch": 0.423728813559322,
28
+ "grad_norm": 4.454040050506592,
29
  "learning_rate": 0.00017184557438794729,
30
+ "loss": 2.1271,
31
  "step": 300
32
  },
33
  {
34
  "epoch": 0.5649717514124294,
35
+ "grad_norm": 5.155438423156738,
36
  "learning_rate": 0.00016242937853107344,
37
+ "loss": 1.9125,
38
  "step": 400
39
  },
40
  {
41
  "epoch": 0.7062146892655368,
42
+ "grad_norm": 4.892629623413086,
43
  "learning_rate": 0.00015301318267419963,
44
+ "loss": 1.7896,
45
  "step": 500
46
  },
47
  {
48
  "epoch": 0.847457627118644,
49
+ "grad_norm": 4.983877658843994,
50
  "learning_rate": 0.0001435969868173258,
51
+ "loss": 1.6968,
52
  "step": 600
53
  },
54
  {
55
  "epoch": 0.9887005649717514,
56
+ "grad_norm": 8.334493637084961,
57
  "learning_rate": 0.00013418079096045197,
58
+ "loss": 1.5862,
59
  "step": 700
60
  },
61
  {
62
  "epoch": 1.0,
63
+ "eval_accuracy": 0.47357939458311205,
64
+ "eval_loss": 1.6120948791503906,
65
+ "eval_runtime": 59.1685,
66
+ "eval_samples_per_second": 127.297,
67
+ "eval_steps_per_second": 7.96,
68
  "step": 708
69
  }
70
  ],