rlogh commited on
Commit
8edcbb2
·
verified ·
1 Parent(s): 0787107

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +176 -0
  2. checkpoint-18/config.json +36 -0
  3. checkpoint-18/model.safetensors +3 -0
  4. checkpoint-18/optimizer.pt +3 -0
  5. checkpoint-18/rng_state.pth +3 -0
  6. checkpoint-18/scaler.pt +3 -0
  7. checkpoint-18/scheduler.pt +3 -0
  8. checkpoint-18/special_tokens_map.json +7 -0
  9. checkpoint-18/tokenizer_config.json +58 -0
  10. checkpoint-18/trainer_state.json +73 -0
  11. checkpoint-18/training_args.bin +3 -0
  12. checkpoint-18/vocab.txt +0 -0
  13. checkpoint-27/config.json +36 -0
  14. checkpoint-27/model.safetensors +3 -0
  15. checkpoint-27/optimizer.pt +3 -0
  16. checkpoint-27/rng_state.pth +3 -0
  17. checkpoint-27/scaler.pt +3 -0
  18. checkpoint-27/scheduler.pt +3 -0
  19. checkpoint-27/special_tokens_map.json +7 -0
  20. checkpoint-27/tokenizer_config.json +58 -0
  21. checkpoint-27/trainer_state.json +96 -0
  22. checkpoint-27/training_args.bin +3 -0
  23. checkpoint-27/vocab.txt +0 -0
  24. checkpoint-36/config.json +36 -0
  25. checkpoint-36/model.safetensors +3 -0
  26. checkpoint-36/optimizer.pt +3 -0
  27. checkpoint-36/rng_state.pth +3 -0
  28. checkpoint-36/scaler.pt +3 -0
  29. checkpoint-36/scheduler.pt +3 -0
  30. checkpoint-36/special_tokens_map.json +7 -0
  31. checkpoint-36/tokenizer_config.json +58 -0
  32. checkpoint-36/trainer_state.json +119 -0
  33. checkpoint-36/training_args.bin +3 -0
  34. checkpoint-36/vocab.txt +0 -0
  35. checkpoint-45/config.json +36 -0
  36. checkpoint-45/model.safetensors +3 -0
  37. checkpoint-45/optimizer.pt +3 -0
  38. checkpoint-45/rng_state.pth +3 -0
  39. checkpoint-45/scaler.pt +3 -0
  40. checkpoint-45/scheduler.pt +3 -0
  41. checkpoint-45/special_tokens_map.json +7 -0
  42. checkpoint-45/tokenizer_config.json +58 -0
  43. checkpoint-45/trainer_state.json +142 -0
  44. checkpoint-45/training_args.bin +3 -0
  45. checkpoint-45/vocab.txt +0 -0
  46. checkpoint-54/config.json +36 -0
  47. checkpoint-54/model.safetensors +3 -0
  48. checkpoint-54/optimizer.pt +3 -0
  49. checkpoint-54/rng_state.pth +3 -0
  50. checkpoint-54/scaler.pt +3 -0
README.md ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - text-classification
5
+ - cheese
6
+ - texture
7
+ - distilbert
8
+ - transformers
9
+ - fine-tuned
10
+ datasets:
11
+ - aslan-ng/cheese-text
12
+ metrics:
13
+ - accuracy
14
+ model-index:
15
+ - name: Cheese Texture Classifier (DistilBERT)
16
+ results:
17
+ - task:
18
+ type: text-classification
19
+ name: Cheese Texture Classification
20
+ dataset:
21
+ type: aslan-ng/cheese-text
22
+ name: Cheese Text Dataset
23
+ metrics:
24
+ - type: accuracy
25
+ value: 0.400
26
+ name: Test Accuracy
27
+ ---
28
+
29
+ # Cheese Texture Classifier (DistilBERT)
30
+
31
+ **Model Creator**: Rumi Loghmani (@rlogh)
32
+ **Original Dataset**: aslan-ng/cheese-text (by Aslan Noorghasemi)
33
+
34
+ This model performs 4-class texture classification on cheese descriptions using fine-tuned DistilBERT.
35
+
36
+ ## Model Description
37
+
38
+ - **Architecture**: DistilBERT-base-uncased fine-tuned for sequence classification
39
+ - **Task**: 4-class texture classification (hard, semi-hard, semi-soft, soft)
40
+ - **Input**: Cheese description text (up to 512 tokens)
41
+ - **Output**: 4-class probability distribution
42
+
43
+ ## Training Details
44
+
45
+ ### Data
46
+ - **Dataset**: [aslan-ng/cheese-text](https://huggingface.co/datasets/aslan-ng/cheese-text) (original split: 100 samples)
47
+ - **Train/Val/Test Split**: 70/15/15 (stratified)
48
+ - **Text Source**: Cheese descriptions from the dataset
49
+ - **Labels**: Texture categories (hard, semi-hard, semi-soft, soft)
50
+
51
+ ### Preprocessing
52
+ - **Tokenization**: DistilBERT tokenizer with 512 max length
53
+ - **Padding**: Max length padding
54
+ - **Truncation**: Long descriptions truncated to 512 tokens
55
+
56
+ ### Training Setup
57
+ - **Model**: distilbert-base-uncased
58
+ - **Epochs**: 10
59
+ - **Batch Size**: 8 (train/val)
60
+ - **Learning Rate**: 2e-5
61
+ - **Warmup Steps**: 10
62
+ - **Weight Decay**: 0.01
63
+ - **Optimizer**: AdamW
64
+ - **Scheduler**: Linear warmup + linear decay
65
+ - **Mixed Precision**: FP16 (if GPU available)
66
+ - **Seed**: 42 (for reproducibility)
67
+
68
+ ### Hardware/Compute
69
+ - **Training Device**: GPU (CUDA)
70
+ - **Training Time**: ~5-10 minutes on GPU
71
+ - **Model Size**: ~67M parameters
72
+ - **Memory Usage**: ~2-4GB GPU memory
73
+
74
+ ## Performance
75
+
76
+ - **Test Accuracy**: 0.400
77
+ - **Test Loss**: 1.274
78
+
79
+ ### Class-wise Performance
80
+ precision recall f1-score support
81
+
82
+ hard 0.50 0.33 0.40 3
83
+ semi-hard 0.33 0.50 0.40 4
84
+ semi-soft 0.33 0.50 0.40 4
85
+ soft 1.00 0.25 0.40 4
86
+
87
+ accuracy 0.40 15
88
+ macro avg 0.54 0.40 0.40 15
89
+ weighted avg 0.54 0.40 0.40 15
90
+
91
+
92
+ ## Usage
93
+
94
+ ```python
95
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
96
+ import torch
97
+
98
+ # Load model and tokenizer
99
+ model_name = "rlogh/cheese-texture-classifier-distilbert"
100
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
101
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
102
+
103
+ # Example prediction
104
+ text = "Feta is a crumbly, tangy Greek cheese with a salty bite and creamy undertones."
105
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
106
+
107
+ with torch.no_grad():
108
+ outputs = model(**inputs)
109
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
110
+ predicted_class = torch.argmax(predictions, dim=-1).item()
111
+
112
+ class_names = ["hard", "semi-hard", "semi-soft", "soft"]
113
+ print(f"Predicted texture: {class_names[predicted_class]}")
114
+ ```
115
+
116
+ ## Class Definitions
117
+
118
+ - **Hard**: Firm, aged cheeses that are dense and can be grated (e.g., Parmesan, Cheddar)
119
+ - **Semi-hard**: Moderately firm cheeses with some flexibility (e.g., Gouda, Swiss)
120
+ - **Semi-soft**: Cheeses with some give but maintain shape (e.g., Mozzarella, Blue cheese)
121
+ - **Soft**: Creamy, spreadable cheeses (e.g., Brie, Camembert, Cottage cheese)
122
+
123
+ ## Limitations and Ethics
124
+
125
+ ### Limitations
126
+ - **Small Dataset**: Trained on only 100 samples, limiting generalization
127
+ - **Text Quality**: Performance depends on description quality and consistency
128
+ - **Subjective Labels**: Texture classification has inherent subjectivity
129
+ - **Domain Specific**: Only applicable to cheese texture classification
130
+ - **Language**: English-only model
131
+
132
+ ### Ethical Considerations
133
+ - **Bias**: Model may reflect biases in the original dataset
134
+ - **Cultural Context**: Cheese descriptions may be culturally specific
135
+ - **Commercial Use**: Not intended for commercial cheese production decisions
136
+ - **Accuracy**: Should not be used for critical food safety applications
137
+
138
+ ### Recommendations
139
+ - Use for educational/research purposes only
140
+ - Validate predictions with domain experts
141
+ - Consider cultural context when applying to different regions
142
+ - Retrain with larger, more diverse datasets for production use
143
+
144
+ ## AI Usage Disclosure
145
+
146
+ This model was developed using:
147
+ - **Base Model**: DistilBERT (distilbert-base-uncased)
148
+ - **Training Framework**: Hugging Face Transformers
149
+ - **Fine-tuning**: Standard BERT fine-tuning techniques
150
+ - **No Additional AI**: No other AI systems were used in development
151
+
152
+ ## Citation
153
+
154
+ **Model Citation:**
155
+ ```bibtex
156
+ @model{rlogh/cheese-texture-classifier-distilbert,
157
+ title={Cheese Texture Classifier (DistilBERT)},
158
+ author={Rumi Loghmani},
159
+ year={2024},
160
+ url={https://huggingface.co/rlogh/cheese-texture-classifier-distilbert}
161
+ }
162
+ ```
163
+
164
+ **Dataset Citation:**
165
+ ```bibtex
166
+ @dataset{aslan-ng/cheese-text,
167
+ title={Cheese Text Dataset},
168
+ author={Aslan Noorghasemi},
169
+ year={2024},
170
+ url={https://huggingface.co/datasets/aslan-ng/cheese-text}
171
+ }
172
+ ```
173
+
174
+ ## License
175
+
176
+ MIT License - See LICENSE file for details.
checkpoint-18/config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "id2label": {
12
+ "0": "hard",
13
+ "1": "semi-hard",
14
+ "2": "semi-soft",
15
+ "3": "soft"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "label2id": {
19
+ "hard": 0,
20
+ "semi-hard": 1,
21
+ "semi-soft": 2,
22
+ "soft": 3
23
+ },
24
+ "max_position_embeddings": 512,
25
+ "model_type": "distilbert",
26
+ "n_heads": 12,
27
+ "n_layers": 6,
28
+ "pad_token_id": 0,
29
+ "problem_type": "single_label_classification",
30
+ "qa_dropout": 0.1,
31
+ "seq_classif_dropout": 0.2,
32
+ "sinusoidal_pos_embds": false,
33
+ "tie_weights_": true,
34
+ "transformers_version": "4.56.1",
35
+ "vocab_size": 30522
36
+ }
checkpoint-18/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9186fe36c00e96c7e1bcb163360a0c126b68f912e75a7b99162c4d2bb613851c
3
+ size 267838720
checkpoint-18/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:764865d51baf2ef3802c555c974821b44033f32c9711af3cc57760694e04f01a
3
+ size 535740043
checkpoint-18/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:128fd6789dbecdaa703802c84141d4eeb7956a1f3aa57027a4a20d800b5b22e4
3
+ size 14645
checkpoint-18/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b7bc59812fbca62deae46d0e188040dcc3d3d78eeda796733537d2b966b875be
3
+ size 1383
checkpoint-18/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5cb1142c9841e543151dfa10e04da4d1ddc82e8206c217979304fd6bbdcf000
3
+ size 1465
checkpoint-18/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-18/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "DistilBertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-18/trainer_state.json ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 9,
3
+ "best_metric": 0.4,
4
+ "best_model_checkpoint": "./cheese-text-classifier\\checkpoint-9",
5
+ "epoch": 2.0,
6
+ "eval_steps": 500,
7
+ "global_step": 18,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.5555555555555556,
14
+ "grad_norm": 2.3992674350738525,
15
+ "learning_rate": 8.000000000000001e-06,
16
+ "loss": 1.3393,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 1.0,
21
+ "eval_accuracy": 0.4,
22
+ "eval_loss": 1.3667316436767578,
23
+ "eval_runtime": 0.1993,
24
+ "eval_samples_per_second": 75.254,
25
+ "eval_steps_per_second": 10.034,
26
+ "step": 9
27
+ },
28
+ {
29
+ "epoch": 1.1111111111111112,
30
+ "grad_norm": 3.988067865371704,
31
+ "learning_rate": 1.8e-05,
32
+ "loss": 1.334,
33
+ "step": 10
34
+ },
35
+ {
36
+ "epoch": 1.6666666666666665,
37
+ "grad_norm": 3.5621254444122314,
38
+ "learning_rate": 1.9e-05,
39
+ "loss": 1.3276,
40
+ "step": 15
41
+ },
42
+ {
43
+ "epoch": 2.0,
44
+ "eval_accuracy": 0.4,
45
+ "eval_loss": 1.351041555404663,
46
+ "eval_runtime": 0.1866,
47
+ "eval_samples_per_second": 80.391,
48
+ "eval_steps_per_second": 10.719,
49
+ "step": 18
50
+ }
51
+ ],
52
+ "logging_steps": 5,
53
+ "max_steps": 90,
54
+ "num_input_tokens_seen": 0,
55
+ "num_train_epochs": 10,
56
+ "save_steps": 500,
57
+ "stateful_callbacks": {
58
+ "TrainerControl": {
59
+ "args": {
60
+ "should_epoch_stop": false,
61
+ "should_evaluate": false,
62
+ "should_log": false,
63
+ "should_save": true,
64
+ "should_training_stop": false
65
+ },
66
+ "attributes": {}
67
+ }
68
+ },
69
+ "total_flos": 18546097274880.0,
70
+ "train_batch_size": 8,
71
+ "trial_name": null,
72
+ "trial_params": null
73
+ }
checkpoint-18/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e09a73f1acd70625a9205129f2e812d36048a9c26a346e45f2c59de8cf03c1d
3
+ size 5713
checkpoint-18/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-27/config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "id2label": {
12
+ "0": "hard",
13
+ "1": "semi-hard",
14
+ "2": "semi-soft",
15
+ "3": "soft"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "label2id": {
19
+ "hard": 0,
20
+ "semi-hard": 1,
21
+ "semi-soft": 2,
22
+ "soft": 3
23
+ },
24
+ "max_position_embeddings": 512,
25
+ "model_type": "distilbert",
26
+ "n_heads": 12,
27
+ "n_layers": 6,
28
+ "pad_token_id": 0,
29
+ "problem_type": "single_label_classification",
30
+ "qa_dropout": 0.1,
31
+ "seq_classif_dropout": 0.2,
32
+ "sinusoidal_pos_embds": false,
33
+ "tie_weights_": true,
34
+ "transformers_version": "4.56.1",
35
+ "vocab_size": 30522
36
+ }
checkpoint-27/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:55b4ae0a98e84722b94644c4c957706abff999d48271c524f0fbcc939eef3fb4
3
+ size 267838720
checkpoint-27/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e43ef309428ff39308802dfcd91f6e2e108ced7d58eee8b2a0454b509db0c9e7
3
+ size 535740043
checkpoint-27/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67c9ded891071528d1d7d37f98c82a4150c15973ace82e86232ed82afc455292
3
+ size 14645
checkpoint-27/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a581324481e0fdf58dbfb67a41d1998d5a0b90e44f2e68a020bc18d50eaa9a6d
3
+ size 1383
checkpoint-27/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:261ab7bb7f2dd6da966c1a5d536f05b09c8e99cc5b65e2e3c3057a488f68aed4
3
+ size 1465
checkpoint-27/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-27/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "DistilBertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-27/trainer_state.json ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 9,
3
+ "best_metric": 0.4,
4
+ "best_model_checkpoint": "./cheese-text-classifier\\checkpoint-9",
5
+ "epoch": 3.0,
6
+ "eval_steps": 500,
7
+ "global_step": 27,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.5555555555555556,
14
+ "grad_norm": 2.3992674350738525,
15
+ "learning_rate": 8.000000000000001e-06,
16
+ "loss": 1.3393,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 1.0,
21
+ "eval_accuracy": 0.4,
22
+ "eval_loss": 1.3667316436767578,
23
+ "eval_runtime": 0.1993,
24
+ "eval_samples_per_second": 75.254,
25
+ "eval_steps_per_second": 10.034,
26
+ "step": 9
27
+ },
28
+ {
29
+ "epoch": 1.1111111111111112,
30
+ "grad_norm": 3.988067865371704,
31
+ "learning_rate": 1.8e-05,
32
+ "loss": 1.334,
33
+ "step": 10
34
+ },
35
+ {
36
+ "epoch": 1.6666666666666665,
37
+ "grad_norm": 3.5621254444122314,
38
+ "learning_rate": 1.9e-05,
39
+ "loss": 1.3276,
40
+ "step": 15
41
+ },
42
+ {
43
+ "epoch": 2.0,
44
+ "eval_accuracy": 0.4,
45
+ "eval_loss": 1.351041555404663,
46
+ "eval_runtime": 0.1866,
47
+ "eval_samples_per_second": 80.391,
48
+ "eval_steps_per_second": 10.719,
49
+ "step": 18
50
+ },
51
+ {
52
+ "epoch": 2.2222222222222223,
53
+ "grad_norm": 3.5041863918304443,
54
+ "learning_rate": 1.775e-05,
55
+ "loss": 1.3127,
56
+ "step": 20
57
+ },
58
+ {
59
+ "epoch": 2.7777777777777777,
60
+ "grad_norm": 3.1173243522644043,
61
+ "learning_rate": 1.65e-05,
62
+ "loss": 1.2905,
63
+ "step": 25
64
+ },
65
+ {
66
+ "epoch": 3.0,
67
+ "eval_accuracy": 0.4,
68
+ "eval_loss": 1.3322917222976685,
69
+ "eval_runtime": 0.187,
70
+ "eval_samples_per_second": 80.199,
71
+ "eval_steps_per_second": 10.693,
72
+ "step": 27
73
+ }
74
+ ],
75
+ "logging_steps": 5,
76
+ "max_steps": 90,
77
+ "num_input_tokens_seen": 0,
78
+ "num_train_epochs": 10,
79
+ "save_steps": 500,
80
+ "stateful_callbacks": {
81
+ "TrainerControl": {
82
+ "args": {
83
+ "should_epoch_stop": false,
84
+ "should_evaluate": false,
85
+ "should_log": false,
86
+ "should_save": true,
87
+ "should_training_stop": false
88
+ },
89
+ "attributes": {}
90
+ }
91
+ },
92
+ "total_flos": 27819145912320.0,
93
+ "train_batch_size": 8,
94
+ "trial_name": null,
95
+ "trial_params": null
96
+ }
checkpoint-27/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e09a73f1acd70625a9205129f2e812d36048a9c26a346e45f2c59de8cf03c1d
3
+ size 5713
checkpoint-27/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-36/config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "id2label": {
12
+ "0": "hard",
13
+ "1": "semi-hard",
14
+ "2": "semi-soft",
15
+ "3": "soft"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "label2id": {
19
+ "hard": 0,
20
+ "semi-hard": 1,
21
+ "semi-soft": 2,
22
+ "soft": 3
23
+ },
24
+ "max_position_embeddings": 512,
25
+ "model_type": "distilbert",
26
+ "n_heads": 12,
27
+ "n_layers": 6,
28
+ "pad_token_id": 0,
29
+ "problem_type": "single_label_classification",
30
+ "qa_dropout": 0.1,
31
+ "seq_classif_dropout": 0.2,
32
+ "sinusoidal_pos_embds": false,
33
+ "tie_weights_": true,
34
+ "transformers_version": "4.56.1",
35
+ "vocab_size": 30522
36
+ }
checkpoint-36/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c9103761626f348ad8ca62f8ef15453e3af535e38f46e03550cf0d68cfff73a
3
+ size 267838720
checkpoint-36/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cedb5646f450b04e186c4c25cbb1dca1a8f0c7dd82ac5ae174215f95f7ea0834
3
+ size 535740043
checkpoint-36/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf1e78dd4b5227c2191e07b09d8fa38a94689402f68cf150e526371b1a4a54c5
3
+ size 14645
checkpoint-36/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63893af7eb668aa4fbcd14246ec2138823b66be543ec33d10f54a50a14402964
3
+ size 1383
checkpoint-36/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:df566675e03ea54508b1a61983473558080c554f19f4207d0ed860743b01bbbf
3
+ size 1465
checkpoint-36/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-36/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "DistilBertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-36/trainer_state.json ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 9,
3
+ "best_metric": 0.4,
4
+ "best_model_checkpoint": "./cheese-text-classifier\\checkpoint-9",
5
+ "epoch": 4.0,
6
+ "eval_steps": 500,
7
+ "global_step": 36,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.5555555555555556,
14
+ "grad_norm": 2.3992674350738525,
15
+ "learning_rate": 8.000000000000001e-06,
16
+ "loss": 1.3393,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 1.0,
21
+ "eval_accuracy": 0.4,
22
+ "eval_loss": 1.3667316436767578,
23
+ "eval_runtime": 0.1993,
24
+ "eval_samples_per_second": 75.254,
25
+ "eval_steps_per_second": 10.034,
26
+ "step": 9
27
+ },
28
+ {
29
+ "epoch": 1.1111111111111112,
30
+ "grad_norm": 3.988067865371704,
31
+ "learning_rate": 1.8e-05,
32
+ "loss": 1.334,
33
+ "step": 10
34
+ },
35
+ {
36
+ "epoch": 1.6666666666666665,
37
+ "grad_norm": 3.5621254444122314,
38
+ "learning_rate": 1.9e-05,
39
+ "loss": 1.3276,
40
+ "step": 15
41
+ },
42
+ {
43
+ "epoch": 2.0,
44
+ "eval_accuracy": 0.4,
45
+ "eval_loss": 1.351041555404663,
46
+ "eval_runtime": 0.1866,
47
+ "eval_samples_per_second": 80.391,
48
+ "eval_steps_per_second": 10.719,
49
+ "step": 18
50
+ },
51
+ {
52
+ "epoch": 2.2222222222222223,
53
+ "grad_norm": 3.5041863918304443,
54
+ "learning_rate": 1.775e-05,
55
+ "loss": 1.3127,
56
+ "step": 20
57
+ },
58
+ {
59
+ "epoch": 2.7777777777777777,
60
+ "grad_norm": 3.1173243522644043,
61
+ "learning_rate": 1.65e-05,
62
+ "loss": 1.2905,
63
+ "step": 25
64
+ },
65
+ {
66
+ "epoch": 3.0,
67
+ "eval_accuracy": 0.4,
68
+ "eval_loss": 1.3322917222976685,
69
+ "eval_runtime": 0.187,
70
+ "eval_samples_per_second": 80.199,
71
+ "eval_steps_per_second": 10.693,
72
+ "step": 27
73
+ },
74
+ {
75
+ "epoch": 3.3333333333333335,
76
+ "grad_norm": 5.531054973602295,
77
+ "learning_rate": 1.525e-05,
78
+ "loss": 1.2266,
79
+ "step": 30
80
+ },
81
+ {
82
+ "epoch": 3.888888888888889,
83
+ "grad_norm": 3.253871440887451,
84
+ "learning_rate": 1.4e-05,
85
+ "loss": 1.1597,
86
+ "step": 35
87
+ },
88
+ {
89
+ "epoch": 4.0,
90
+ "eval_accuracy": 0.4,
91
+ "eval_loss": 1.316666841506958,
92
+ "eval_runtime": 0.156,
93
+ "eval_samples_per_second": 96.134,
94
+ "eval_steps_per_second": 12.818,
95
+ "step": 36
96
+ }
97
+ ],
98
+ "logging_steps": 5,
99
+ "max_steps": 90,
100
+ "num_input_tokens_seen": 0,
101
+ "num_train_epochs": 10,
102
+ "save_steps": 500,
103
+ "stateful_callbacks": {
104
+ "TrainerControl": {
105
+ "args": {
106
+ "should_epoch_stop": false,
107
+ "should_evaluate": false,
108
+ "should_log": false,
109
+ "should_save": true,
110
+ "should_training_stop": false
111
+ },
112
+ "attributes": {}
113
+ }
114
+ },
115
+ "total_flos": 37092194549760.0,
116
+ "train_batch_size": 8,
117
+ "trial_name": null,
118
+ "trial_params": null
119
+ }
checkpoint-36/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e09a73f1acd70625a9205129f2e812d36048a9c26a346e45f2c59de8cf03c1d
3
+ size 5713
checkpoint-36/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-45/config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "id2label": {
12
+ "0": "hard",
13
+ "1": "semi-hard",
14
+ "2": "semi-soft",
15
+ "3": "soft"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "label2id": {
19
+ "hard": 0,
20
+ "semi-hard": 1,
21
+ "semi-soft": 2,
22
+ "soft": 3
23
+ },
24
+ "max_position_embeddings": 512,
25
+ "model_type": "distilbert",
26
+ "n_heads": 12,
27
+ "n_layers": 6,
28
+ "pad_token_id": 0,
29
+ "problem_type": "single_label_classification",
30
+ "qa_dropout": 0.1,
31
+ "seq_classif_dropout": 0.2,
32
+ "sinusoidal_pos_embds": false,
33
+ "tie_weights_": true,
34
+ "transformers_version": "4.56.1",
35
+ "vocab_size": 30522
36
+ }
checkpoint-45/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de112e603b7ce6bea67f3c24b99fafac7bf34253d9bdf5ff989b09ac2b83ca4a
3
+ size 267838720
checkpoint-45/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:22a00865eaa9f97851e56a88891b92d080ddf724e23e3f3cfcdf1ccfa7f9ec5e
3
+ size 535740043
checkpoint-45/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5cbb01837febe7919f0932a54877b346f17063c101f3fe1e012d2f57b20df246
3
+ size 14645
checkpoint-45/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f0287937d4c6749477c019f5157bca25cfc7729fbfaf6d7a22ba9c2acf294b7
3
+ size 1383
checkpoint-45/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb288caf5a3a28e45dd5cedb94250ec50410cd795dd5578017e9d2d2319ad46d
3
+ size 1465
checkpoint-45/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-45/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "DistilBertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-45/trainer_state.json ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 9,
3
+ "best_metric": 0.4,
4
+ "best_model_checkpoint": "./cheese-text-classifier\\checkpoint-9",
5
+ "epoch": 5.0,
6
+ "eval_steps": 500,
7
+ "global_step": 45,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.5555555555555556,
14
+ "grad_norm": 2.3992674350738525,
15
+ "learning_rate": 8.000000000000001e-06,
16
+ "loss": 1.3393,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 1.0,
21
+ "eval_accuracy": 0.4,
22
+ "eval_loss": 1.3667316436767578,
23
+ "eval_runtime": 0.1993,
24
+ "eval_samples_per_second": 75.254,
25
+ "eval_steps_per_second": 10.034,
26
+ "step": 9
27
+ },
28
+ {
29
+ "epoch": 1.1111111111111112,
30
+ "grad_norm": 3.988067865371704,
31
+ "learning_rate": 1.8e-05,
32
+ "loss": 1.334,
33
+ "step": 10
34
+ },
35
+ {
36
+ "epoch": 1.6666666666666665,
37
+ "grad_norm": 3.5621254444122314,
38
+ "learning_rate": 1.9e-05,
39
+ "loss": 1.3276,
40
+ "step": 15
41
+ },
42
+ {
43
+ "epoch": 2.0,
44
+ "eval_accuracy": 0.4,
45
+ "eval_loss": 1.351041555404663,
46
+ "eval_runtime": 0.1866,
47
+ "eval_samples_per_second": 80.391,
48
+ "eval_steps_per_second": 10.719,
49
+ "step": 18
50
+ },
51
+ {
52
+ "epoch": 2.2222222222222223,
53
+ "grad_norm": 3.5041863918304443,
54
+ "learning_rate": 1.775e-05,
55
+ "loss": 1.3127,
56
+ "step": 20
57
+ },
58
+ {
59
+ "epoch": 2.7777777777777777,
60
+ "grad_norm": 3.1173243522644043,
61
+ "learning_rate": 1.65e-05,
62
+ "loss": 1.2905,
63
+ "step": 25
64
+ },
65
+ {
66
+ "epoch": 3.0,
67
+ "eval_accuracy": 0.4,
68
+ "eval_loss": 1.3322917222976685,
69
+ "eval_runtime": 0.187,
70
+ "eval_samples_per_second": 80.199,
71
+ "eval_steps_per_second": 10.693,
72
+ "step": 27
73
+ },
74
+ {
75
+ "epoch": 3.3333333333333335,
76
+ "grad_norm": 5.531054973602295,
77
+ "learning_rate": 1.525e-05,
78
+ "loss": 1.2266,
79
+ "step": 30
80
+ },
81
+ {
82
+ "epoch": 3.888888888888889,
83
+ "grad_norm": 3.253871440887451,
84
+ "learning_rate": 1.4e-05,
85
+ "loss": 1.1597,
86
+ "step": 35
87
+ },
88
+ {
89
+ "epoch": 4.0,
90
+ "eval_accuracy": 0.4,
91
+ "eval_loss": 1.316666841506958,
92
+ "eval_runtime": 0.156,
93
+ "eval_samples_per_second": 96.134,
94
+ "eval_steps_per_second": 12.818,
95
+ "step": 36
96
+ },
97
+ {
98
+ "epoch": 4.444444444444445,
99
+ "grad_norm": 5.929306507110596,
100
+ "learning_rate": 1.275e-05,
101
+ "loss": 1.1161,
102
+ "step": 40
103
+ },
104
+ {
105
+ "epoch": 5.0,
106
+ "grad_norm": 4.499341011047363,
107
+ "learning_rate": 1.15e-05,
108
+ "loss": 1.1096,
109
+ "step": 45
110
+ },
111
+ {
112
+ "epoch": 5.0,
113
+ "eval_accuracy": 0.4,
114
+ "eval_loss": 1.304785132408142,
115
+ "eval_runtime": 0.1465,
116
+ "eval_samples_per_second": 102.38,
117
+ "eval_steps_per_second": 13.651,
118
+ "step": 45
119
+ }
120
+ ],
121
+ "logging_steps": 5,
122
+ "max_steps": 90,
123
+ "num_input_tokens_seen": 0,
124
+ "num_train_epochs": 10,
125
+ "save_steps": 500,
126
+ "stateful_callbacks": {
127
+ "TrainerControl": {
128
+ "args": {
129
+ "should_epoch_stop": false,
130
+ "should_evaluate": false,
131
+ "should_log": false,
132
+ "should_save": true,
133
+ "should_training_stop": false
134
+ },
135
+ "attributes": {}
136
+ }
137
+ },
138
+ "total_flos": 46365243187200.0,
139
+ "train_batch_size": 8,
140
+ "trial_name": null,
141
+ "trial_params": null
142
+ }
checkpoint-45/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e09a73f1acd70625a9205129f2e812d36048a9c26a346e45f2c59de8cf03c1d
3
+ size 5713
checkpoint-45/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-54/config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "id2label": {
12
+ "0": "hard",
13
+ "1": "semi-hard",
14
+ "2": "semi-soft",
15
+ "3": "soft"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "label2id": {
19
+ "hard": 0,
20
+ "semi-hard": 1,
21
+ "semi-soft": 2,
22
+ "soft": 3
23
+ },
24
+ "max_position_embeddings": 512,
25
+ "model_type": "distilbert",
26
+ "n_heads": 12,
27
+ "n_layers": 6,
28
+ "pad_token_id": 0,
29
+ "problem_type": "single_label_classification",
30
+ "qa_dropout": 0.1,
31
+ "seq_classif_dropout": 0.2,
32
+ "sinusoidal_pos_embds": false,
33
+ "tie_weights_": true,
34
+ "transformers_version": "4.56.1",
35
+ "vocab_size": 30522
36
+ }
checkpoint-54/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5aa846c5c226056cf5ce5353619db80041460988d55162b17a591cffe149a800
3
+ size 267838720
checkpoint-54/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1eb8fea032ede5f8effc13bef01b07d87685d4ecd7c665082c802a5915b57d1
3
+ size 535740043
checkpoint-54/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:067a193dac0906a35e128e6d201f4390897d736a9c9e8d5a2830394c21b775c0
3
+ size 14645
checkpoint-54/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8d96272c15c0b853424c972b8f4df29eb217faa78f36f6789f145f3aa953a26
3
+ size 1383