训练结束，上传最终模型

Browse files

Files changed (6) hide show

README.md +29 -19
all_results.json +6 -6
config.json +1 -1
model.safetensors +2 -2
train_results.json +6 -6
trainer_state.json +0 -0

README.md CHANGED Viewed

@@ -16,9 +16,9 @@ should probably proofread and complete it, then remove this comment. -->
 This model was trained from scratch on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.1519
-- Wer: 96.7751
-- Cer: 49.6435
 ## Model description
@@ -38,31 +38,41 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
-- train_batch_size: 12
-- eval_batch_size: 12
 - seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 48
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 500
-- training_steps: 10000
-- mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch   | Step  | Validation Loss | Wer     | Cer     |
 |:-------------:|:-------:|:-----:|:---------------:|:-------:|:-------:|
-| 1.8339        | 1.6507  | 1000  | 1.9115          | 99.6794 | 93.6205 |
-| 0.9948        | 3.3006  | 2000  | 1.2763          | 97.3503 | 59.4213 |
-| 0.7577        | 4.9513  | 3000  | 1.1085          | 96.6431 | 53.3468 |
-| 0.5464        | 6.6012  | 4000  | 1.0575          | 95.4927 | 48.2507 |
-| 0.4182        | 8.2510  | 5000  | 1.0574          | 96.2376 | 47.2929 |
-| 0.3164        | 9.9017  | 6000  | 1.0616          | 96.3885 | 49.4417 |
-| 0.2319        | 11.5516 | 7000  | 1.0929          | 96.2565 | 49.5535 |
-| 0.1899        | 13.2015 | 8000  | 1.1223          | 97.2749 | 48.6737 |
-| 0.1425        | 14.8522 | 9000  | 1.1422          | 96.6148 | 48.4484 |
-| 0.161         | 16.5021 | 10000 | 1.1519          | 96.7751 | 49.6435 |
 ### Framework versions

 This model was trained from scratch on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.0545
+- Wer: 96.5771
+- Cer: 54.8789
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
+- train_batch_size: 4
+- eval_batch_size: 4
 - seed: 42
+- distributed_type: multi-GPU
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 32
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 500
+- training_steps: 20000
 ### Training results
 | Training Loss | Epoch   | Step  | Validation Loss | Wer     | Cer     |
 |:-------------:|:-------:|:-----:|:---------------:|:-------:|:-------:|
+| 0.9609        | 1.1013  | 1000  | 1.1190          | 99.2079 | 98.0002 |
+| 0.6483        | 2.2026  | 2000  | 0.8776          | 98.7082 | 82.7038 |
+| 0.4333        | 3.3040  | 3000  | 0.8139          | 97.8689 | 67.8442 |
+| 0.3453        | 4.4053  | 4000  | 0.7950          | 98.0481 | 68.7091 |
+| 0.2517        | 5.5066  | 5000  | 0.8068          | 96.9448 | 64.5243 |
+| 0.1854        | 6.6079  | 6000  | 0.8310          | 97.9915 | 73.3730 |
+| 0.1173        | 7.7093  | 7000  | 0.8566          | 97.6426 | 64.1145 |
+| 0.1049        | 8.8106  | 8000  | 0.8806          | 97.7275 | 70.6504 |
+| 0.0566        | 9.9119  | 9000  | 0.9025          | 97.7935 | 66.4983 |
+| 0.037         | 11.0132 | 10000 | 0.9284          | 97.5389 | 63.1154 |
+| 0.0139        | 12.1145 | 11000 | 0.9458          | 97.0297 | 60.9058 |
+| 0.013         | 13.2159 | 12000 | 0.9624          | 96.8223 | 57.8806 |
+| 0.008         | 14.3172 | 13000 | 0.9800          | 96.7185 | 57.1280 |
+| 0.0062        | 15.4185 | 14000 | 0.9948          | 96.6714 | 55.3007 |
+| 0.0044        | 16.5198 | 15000 | 1.0088          | 96.6808 | 57.2599 |
+| 0.0034        | 17.6211 | 16000 | 1.0242          | 96.5959 | 55.2440 |
+| 0.0029        | 18.7225 | 17000 | 1.0367          | 96.5865 | 55.6945 |
+| 0.0022        | 19.8238 | 18000 | 1.0447          | 96.6148 | 55.5518 |
+| 0.0021        | 20.9251 | 19000 | 1.0507          | 96.5771 | 55.5891 |
+| 0.0017        | 22.0264 | 20000 | 1.0545          | 96.5771 | 54.8789 |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
-    "epoch": 16.50206440957886,
-    "total_flos": 1.949150849531904e+19,
-    "train_loss": 0.8641470371723176,
-    "train_runtime": 21435.9254,
-    "train_samples_per_second": 22.392,
-    "train_steps_per_second": 0.467
 }

 {
+    "epoch": 22.026431718061673,
+    "total_flos": 3.376341480070185e+19,
+    "train_loss": 0.2651471851706505,
+    "train_runtime": 28635.2259,
+    "train_samples_per_second": 22.35,
+    "train_steps_per_second": 0.698
 }

config.json CHANGED Viewed

@@ -53,7 +53,7 @@
   "num_mel_bins": 80,
   "pad_token_id": 50257,
   "scale_embedding": false,
-  "torch_dtype": "bfloat16",
   "transformers_version": "4.48.3",
   "use_cache": true,
   "use_weighted_layer_sum": false,

   "num_mel_bins": 80,
   "pad_token_id": 50257,
   "scale_embedding": false,
+  "torch_dtype": "float16",
   "transformers_version": "4.48.3",
   "use_cache": true,
   "use_weighted_layer_sum": false,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b22de776648c8fc55dbdb37a34986669b21215c0d0cc7d4355ba0090a00314ad
-size 181508256

 version https://git-lfs.github.com/spec/v1
+oid sha256:ba7adc2f01886be15a853093db2793868d1db4e468dcd4089b4edbae9770053c
+size 181508056

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
-    "epoch": 16.50206440957886,
-    "total_flos": 1.949150849531904e+19,
-    "train_loss": 0.8641470371723176,
-    "train_runtime": 21435.9254,
-    "train_samples_per_second": 22.392,
-    "train_steps_per_second": 0.467
 }

 {
+    "epoch": 22.026431718061673,
+    "total_flos": 3.376341480070185e+19,
+    "train_loss": 0.2651471851706505,
+    "train_runtime": 28635.2259,
+    "train_samples_per_second": 22.35,
+    "train_steps_per_second": 0.698
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff