Improve language tag

#1
by lbourdois - opened
Files changed (1) hide show
  1. README.md +97 -83
README.md CHANGED
@@ -1,84 +1,98 @@
1
- ---
2
- library_name: transformers
3
- license: apache-2.0
4
- base_model: Qwen/Qwen2.5-0.5B
5
- tags:
6
- - generated_from_trainer
7
- - qwen
8
- - GGUF
9
- - worldmodel
10
- - worldbuilding
11
- model-index:
12
- - name: capybara_finetuned_results3
13
- results: []
14
- datasets:
15
- - archit11/worldbuilding
16
- ---
17
-
18
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
- should probably proofread and complete it, then remove this comment. -->
20
-
21
- # capybara_finetuned_results3
22
-
23
- This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on an unknown dataset.
24
- It achieves the following results on the evaluation set:
25
- - Loss: 5.6542
26
-
27
- ## video demo : (its pretty bad)
28
-
29
- <video controls autoplay muted src="https://0x0.st/XgZs.mp4"></video>
30
-
31
- More information needed
32
-
33
- ## Intended uses & limitations
34
-
35
- More information needed
36
-
37
- ## Training and evaluation data
38
-
39
- More information needed
40
-
41
- ## Training procedure
42
-
43
- ### Training hyperparameters
44
-
45
- The following hyperparameters were used during training:
46
- - learning_rate: 0.0002
47
- - train_batch_size: 1
48
- - eval_batch_size: 8
49
- - seed: 42
50
- - gradient_accumulation_steps: 4
51
- - total_train_batch_size: 4
52
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
53
- - lr_scheduler_type: cosine
54
- - lr_scheduler_warmup_steps: 5
55
- - training_steps: 800
56
-
57
- ### Training results
58
-
59
- | Training Loss | Epoch | Step | Validation Loss |
60
- |:-------------:|:------:|:----:|:---------------:|
61
- | 15.5311 | 0.0230 | 50 | 14.5422 |
62
- | 8.7477 | 0.0460 | 100 | 9.2952 |
63
- | 7.3554 | 0.0690 | 150 | 7.1992 |
64
- | 6.828 | 0.0920 | 200 | 6.7258 |
65
- | 6.4694 | 0.1150 | 250 | 6.3597 |
66
- | 6.3401 | 0.1381 | 300 | 6.1703 |
67
- | 6.1256 | 0.1611 | 350 | 6.0395 |
68
- | 6.0372 | 0.1841 | 400 | 5.9271 |
69
- | 6.0221 | 0.2071 | 450 | 5.8464 |
70
- | 5.8783 | 0.2301 | 500 | 5.7810 |
71
- | 5.8339 | 0.2531 | 550 | 5.7335 |
72
- | 5.8546 | 0.2761 | 600 | 5.6904 |
73
- | 5.9169 | 0.2991 | 650 | 5.6690 |
74
- | 5.7959 | 0.3221 | 700 | 5.6565 |
75
- | 5.7271 | 0.3451 | 750 | 5.6543 |
76
- | 5.8734 | 0.3682 | 800 | 5.6542 |
77
-
78
-
79
- ### Framework versions
80
-
81
- - Transformers 4.44.2
82
- - Pytorch 2.4.0
83
- - Datasets 3.0.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  - Tokenizers 0.19.1
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-0.5B
5
+ tags:
6
+ - generated_from_trainer
7
+ - qwen
8
+ - GGUF
9
+ - worldmodel
10
+ - worldbuilding
11
+ datasets:
12
+ - archit11/worldbuilding
13
+ language:
14
+ - zho
15
+ - eng
16
+ - fra
17
+ - spa
18
+ - por
19
+ - deu
20
+ - ita
21
+ - rus
22
+ - jpn
23
+ - kor
24
+ - vie
25
+ - tha
26
+ - ara
27
+ model-index:
28
+ - name: capybara_finetuned_results3
29
+ results: []
30
+ ---
31
+
32
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
33
+ should probably proofread and complete it, then remove this comment. -->
34
+
35
+ # capybara_finetuned_results3
36
+
37
+ This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on an unknown dataset.
38
+ It achieves the following results on the evaluation set:
39
+ - Loss: 5.6542
40
+
41
+ ## video demo : (its pretty bad)
42
+
43
+ <video controls autoplay muted src="https://0x0.st/XgZs.mp4"></video>
44
+
45
+ More information needed
46
+
47
+ ## Intended uses & limitations
48
+
49
+ More information needed
50
+
51
+ ## Training and evaluation data
52
+
53
+ More information needed
54
+
55
+ ## Training procedure
56
+
57
+ ### Training hyperparameters
58
+
59
+ The following hyperparameters were used during training:
60
+ - learning_rate: 0.0002
61
+ - train_batch_size: 1
62
+ - eval_batch_size: 8
63
+ - seed: 42
64
+ - gradient_accumulation_steps: 4
65
+ - total_train_batch_size: 4
66
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
67
+ - lr_scheduler_type: cosine
68
+ - lr_scheduler_warmup_steps: 5
69
+ - training_steps: 800
70
+
71
+ ### Training results
72
+
73
+ | Training Loss | Epoch | Step | Validation Loss |
74
+ |:-------------:|:------:|:----:|:---------------:|
75
+ | 15.5311 | 0.0230 | 50 | 14.5422 |
76
+ | 8.7477 | 0.0460 | 100 | 9.2952 |
77
+ | 7.3554 | 0.0690 | 150 | 7.1992 |
78
+ | 6.828 | 0.0920 | 200 | 6.7258 |
79
+ | 6.4694 | 0.1150 | 250 | 6.3597 |
80
+ | 6.3401 | 0.1381 | 300 | 6.1703 |
81
+ | 6.1256 | 0.1611 | 350 | 6.0395 |
82
+ | 6.0372 | 0.1841 | 400 | 5.9271 |
83
+ | 6.0221 | 0.2071 | 450 | 5.8464 |
84
+ | 5.8783 | 0.2301 | 500 | 5.7810 |
85
+ | 5.8339 | 0.2531 | 550 | 5.7335 |
86
+ | 5.8546 | 0.2761 | 600 | 5.6904 |
87
+ | 5.9169 | 0.2991 | 650 | 5.6690 |
88
+ | 5.7959 | 0.3221 | 700 | 5.6565 |
89
+ | 5.7271 | 0.3451 | 750 | 5.6543 |
90
+ | 5.8734 | 0.3682 | 800 | 5.6542 |
91
+
92
+
93
+ ### Framework versions
94
+
95
+ - Transformers 4.44.2
96
+ - Pytorch 2.4.0
97
+ - Datasets 3.0.0
98
  - Tokenizers 0.19.1