End of training

f8bae68 verified about 1 month ago

6.51 kB

	---
	library_name: transformers
	language:
	- kh
	license: mit
	base_model: facebook/mbart-large-50
	tags:
	- generated_from_trainer
	datasets:
	- S-Sethisak
	metrics:
	- wer
	- bleu
	model-index:
	- name: KOCS - Khmer Orthographic Correction System
	results:
	- task:
	name: Sequence-to-sequence Language Modeling
	type: text2text-generation
	dataset:
	name: khmer-orthography-correction-dataset
	type: S-Sethisak
	metrics:
	- name: Wer
	type: wer
	value: 0.05181716833890747
	- name: Bleu
	type: bleu
	value:
	score: 91.23760284388217
	counts:
	- 64405
	- 56188
	- 48516
	- 41469
	totals:
	- 67099
	- 60367
	- 53696
	- 47271
	precisions:
	- 95.98503703482913
	- 93.07734358175824
	- 90.3530989272944
	- 87.7260899917497
	bp: 0.9945898677227208
	sys_len: 67099
	ref_len: 67463
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# KOCS - Khmer Orthographic Correction System

	This model is a fine-tuned version of [facebook/mbart-large-50](https://huggingface.co/facebook/mbart-large-50) on the khmer-orthography-correction-dataset dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0810
	- Cer: 0.0214
	- Wer: 0.0518
	- Bleu: {'score': 91.23760284388217, 'counts': [64405, 56188, 48516, 41469], 'totals': [67099, 60367, 53696, 47271], 'precisions': [95.98503703482913, 93.07734358175824, 90.3530989272944, 87.7260899917497], 'bp': 0.9945898677227208, 'sys_len': 67099, 'ref_len': 67463}

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 64
	- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 10
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Cer \| Wer \| Bleu \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|:------:\|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:\|
	\| 1.0649 \| 1.0 \| 842 \| 0.3280 \| 0.0732 \| 0.1732 \| {'score': 71.9221872879795, 'counts': [57846, 46293, 36724, 28877], 'totals': [67547, 60815, 54145, 47716], 'precisions': [85.63814825232801, 76.12102277398668, 67.82528395973775, 60.51848436583117], 'bp': 1.0, 'sys_len': 67547, 'ref_len': 67463} \|
	\| 0.2503 \| 2.0 \| 1684 \| 0.1941 \| 0.0442 \| 0.1100 \| {'score': 81.83581475148922, 'counts': [61144, 51308, 42671, 35147], 'totals': [66960, 60228, 53557, 47131], 'precisions': [91.3142174432497, 85.18961280467556, 79.67399219523125, 74.57299866330017], 'bp': 0.9925161967292249, 'sys_len': 66960, 'ref_len': 67463} \|
	\| 0.1299 \| 3.0 \| 2526 \| 0.1461 \| 0.0393 \| 0.0906 \| {'score': 85.38397449977448, 'counts': [62316, 53167, 44914, 37556], 'totals': [67168, 60436, 53765, 47338], 'precisions': [92.77632205812291, 87.97240055596002, 83.53761740909513, 79.33584012843804], 'bp': 0.9956176582385678, 'sys_len': 67168, 'ref_len': 67463} \|
	\| 0.0876 \| 4.0 \| 3368 \| 0.1189 \| 0.0333 \| 0.0754 \| {'score': 87.61990203829572, 'counts': [63031, 54246, 46254, 39048], 'totals': [66846, 60114, 53443, 47020], 'precisions': [94.292852227508, 90.23854676115381, 86.54828508878619, 83.04551254785198], 'bp': 0.990812296425952, 'sys_len': 66846, 'ref_len': 67463} \|
	\| 0.0531 \| 5.0 \| 4210 \| 0.1024 \| 0.0291 \| 0.0667 \| {'score': 89.01138842411211, 'counts': [63577, 55021, 47137, 39975], 'totals': [67042, 60310, 53639, 47214], 'precisions': [94.8315981026819, 91.23031006466589, 87.8782229348049, 84.66768331427119], 'bp': 0.9937400301719442, 'sys_len': 67042, 'ref_len': 67463} \|
	\| 0.0357 \| 6.0 \| 5052 \| 0.0933 \| 0.0248 \| 0.0596 \| {'score': 90.11989978537375, 'counts': [63951, 55579, 47830, 40753], 'totals': [67063, 60331, 53661, 47238], 'precisions': [95.3595872537763, 92.12345228821005, 89.1336352285645, 86.27164570896312], 'bp': 0.9940532117557648, 'sys_len': 67063, 'ref_len': 67463} \|
	\| 0.0262 \| 7.0 \| 5894 \| 0.0874 \| 0.0247 \| 0.0575 \| {'score': 90.47303331001092, 'counts': [64075, 55759, 48046, 40983], 'totals': [67024, 60292, 53624, 47202], 'precisions': [95.60008355216041, 92.48158959729318, 89.5979412203491, 86.82471081733824], 'bp': 0.993471511214246, 'sys_len': 67024, 'ref_len': 67463} \|
	\| 0.0191 \| 8.0 \| 6736 \| 0.0838 \| 0.0225 \| 0.0533 \| {'score': 91.04591374118674, 'counts': [64352, 56095, 48400, 41348], 'totals': [67157, 60425, 53753, 47328], 'precisions': [95.82322021531635, 92.83409184940008, 90.04148605659219, 87.36477349560514], 'bp': 0.9954538780005294, 'sys_len': 67157, 'ref_len': 67463} \|
	\| 0.015 \| 9.0 \| 7578 \| 0.0817 \| 0.0215 \| 0.0529 \| {'score': 91.05623653418233, 'counts': [64338, 56094, 48410, 41355], 'totals': [67124, 60392, 53721, 47297], 'precisions': [95.84947261784161, 92.88316333289177, 90.11373578302712, 87.43683531725057], 'bp': 0.9949623770309173, 'sys_len': 67124, 'ref_len': 67463} \|
	\| 0.0135 \| 10.0 \| 8420 \| 0.0810 \| 0.0214 \| 0.0518 \| {'score': 91.23760284388217, 'counts': [64405, 56188, 48516, 41469], 'totals': [67099, 60367, 53696, 47271], 'precisions': [95.98503703482913, 93.07734358175824, 90.3530989272944, 87.7260899917497], 'bp': 0.9945898677227208, 'sys_len': 67099, 'ref_len': 67463} \|


	### Framework versions

	- Transformers 4.57.2
	- Pytorch 2.9.0+cu126
	- Datasets 4.0.0
	- Tokenizers 0.22.1