ucmp137538 commited on
Commit
cb023f3
·
verified ·
1 Parent(s): 95c1f27

End of training

Browse files
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: distilbert-base-cased
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: distilbert-base-cased-wikitext2
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # distilbert-base-cased-wikitext2
15
+
16
+ This model is a fine-tuned version of [distilbert-base-cased](https://huggingface.co/distilbert-base-cased) on an unknown dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - eval_loss: 3.3849
19
+ - eval_runtime: 14.1161
20
+ - eval_samples_per_second: 139.982
21
+ - eval_steps_per_second: 17.498
22
+ - step: 0
23
+
24
+ ## Model description
25
+
26
+ More information needed
27
+
28
+ ## Intended uses & limitations
29
+
30
+ More information needed
31
+
32
+ ## Training and evaluation data
33
+
34
+ More information needed
35
+
36
+ ## Training procedure
37
+
38
+ ### Training hyperparameters
39
+
40
+ The following hyperparameters were used during training:
41
+ - learning_rate: 2e-05
42
+ - train_batch_size: 8
43
+ - eval_batch_size: 8
44
+ - seed: 42
45
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
+ - lr_scheduler_type: linear
47
+ - num_epochs: 3.0
48
+
49
+ ### Framework versions
50
+
51
+ - Transformers 4.38.2
52
+ - Pytorch 2.2.1+cu121
53
+ - Datasets 2.18.0
54
+ - Tokenizers 0.15.2
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "distilbert-base-cased",
3
+ "activation": "gelu",
4
+ "architectures": [
5
+ "DistilBertForMaskedLM"
6
+ ],
7
+ "attention_dropout": 0.1,
8
+ "dim": 768,
9
+ "dropout": 0.1,
10
+ "hidden_dim": 3072,
11
+ "initializer_range": 0.02,
12
+ "max_position_embeddings": 512,
13
+ "model_type": "distilbert",
14
+ "n_heads": 12,
15
+ "n_layers": 6,
16
+ "output_past": true,
17
+ "pad_token_id": 0,
18
+ "qa_dropout": 0.1,
19
+ "seq_classif_dropout": 0.2,
20
+ "sinusoidal_pos_embds": false,
21
+ "tie_weights_": true,
22
+ "torch_dtype": "float32",
23
+ "transformers_version": "4.38.2",
24
+ "vocab_size": 28996
25
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64080c2b1998c3bd1214311ab9f1e97b40052fa81d508b8b534a936c4ba1626b
3
+ size 263260784
runs/Mar21_15-37-51_f9a552b9426d/events.out.tfevents.1711035501.f9a552b9426d.2324.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8c074b8c864eaa1748444a1b0cda7a72e506a39f2072573494c8975452b2c31
3
+ size 297
runs/Mar21_15-39-21_f9a552b9426d/events.out.tfevents.1711035562.f9a552b9426d.2324.4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18e29e18c92ddeedd08c565bb329bd1cb04f0082cbdfc963c97503526003b4b6
3
+ size 4852
runs/Mar21_15-39-21_f9a552b9426d/events.out.tfevents.1711035655.f9a552b9426d.2324.5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06fe997735447fd19a431adf566e7547809a2e409861b324b61767be4b36fda3
3
+ size 297
runs/Mar21_15-41-18_f9a552b9426d/events.out.tfevents.1711035715.f9a552b9426d.2324.6 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b4ef064ea9913b3340bdd8641f9373e9f94bdcba975ae6328e8dac9e0af28b2
3
+ size 297
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e30ee225c29fa99346308039837ec43f8b6a6d7a77187654203d6f87fbf4ff1
3
+ size 4984