vietnamese-correction-lora-v3
This model is a fine-tuned version of vinai/bartpho-syllable on an bmd1905/vi-error-correction-2.0 dataset. It achieves the following results on the evaluation set:
- "epoch": 5.0,
- "eval_f1_score": 0.5853658536585366,
- "eval_loss": 0.12532223761081696,
- "eval_precision": 1.0,
- "eval_recall": 0.41379310344827586,
- "eval_runtime": 18757.1356,
- "eval_sacrebleu": 37.83399298017529,
- "eval_samples_per_second": 6.85,
- "eval_steps_per_second": 1.142,
- "step": 27775
Training results
| Training Loss | Epoch | Step | Validation Loss | Sacrebleu | Precision | Recall | F1 Score |
|---|---|---|---|---|---|---|---|
| 0.2162 | 1 | 16666 | 0.3123 | 32.2314 | 1.0000 | 0.4138 | 0.5854 |
| 0.1994 | 2 | 22220 | 0.2335 | 34.4627 | 1.0000 | 0.4138 | 0.5854 |
| 0.1982 | 3 | 16666 | 0.1481 | 37.7962 | 1.0000 | 0.4138 | 0.5854 |
| 0.1272 | 4 | 22220 | 0.1272 | 37.7962 | 1.0000 | 0.4138 | 0.5854 |
| 0.1253 | 5 | 27775 | 0.1253 | 37.8340 | 1.0000 | 0.4138 | 0.5854 |
Training and evaluation data
DatasetDict({
train: Dataset({
features: ['input', 'output'],
num_rows: 1_000_000
})
val: Dataset({
features: ['input', 'output'],
num_rows: 200_000
})
test: Dataset({
features: ['input', 'output'],
num_rows: 40_000
})
})
Training procedure
The following bitsandbytes quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: float16
Training hyperparameters
The following hyperparameters were used during training:
trainable params: 25,165,824 || all params: 326,801,408 || trainable%: 7.700647360735974
The model is loaded in 8-bit precision. To train this model you need to add additional modules inside the model such as adapters using
peftlibrary and freeze the model weights. Please check the examples in https://github.com/huggingface/peft for more details.Num examples = 1_000_000
Num Epochs = 5
Instantaneous batch size per device = 6
Total train batch size (w. parallel, distributed & accumulation) = 144
Gradient Accumulation steps = 24
Total optimization steps = 27,775
Number of trainable parameters = 25,165,824
Framework versions
- PEFT 0.4.0
- Transformers 4.47.0
- Pytorch 2.5.1+cu121
- Datasets 3.3.1
- Tokenizers 0.21.0
Training procedure
The following bitsandbytes quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: float16
- Downloads last month
- 6
Model tree for TungCan/vietnamese-correction-lora-v3
Base model
vinai/bartpho-syllable