YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

vietnamese-correction-lora-v3

This model is a fine-tuned version of vinai/bartpho-syllable on an bmd1905/vi-error-correction-2.0 dataset. It achieves the following results on the evaluation set:

  • "epoch": 5.0,
  • "eval_f1_score": 0.5853658536585366,
  • "eval_loss": 0.12532223761081696,
  • "eval_precision": 1.0,
  • "eval_recall": 0.41379310344827586,
  • "eval_runtime": 18757.1356,
  • "eval_sacrebleu": 37.83399298017529,
  • "eval_samples_per_second": 6.85,
  • "eval_steps_per_second": 1.142,
  • "step": 27775

Training results

Training Loss Epoch Step Validation Loss Sacrebleu Precision Recall F1 Score
0.2162 1 16666 0.3123 32.2314 1.0000 0.4138 0.5854
0.1994 2 22220 0.2335 34.4627 1.0000 0.4138 0.5854
0.1982 3 16666 0.1481 37.7962 1.0000 0.4138 0.5854
0.1272 4 22220 0.1272 37.7962 1.0000 0.4138 0.5854
0.1253 5 27775 0.1253 37.8340 1.0000 0.4138 0.5854

Training and evaluation data

DatasetDict({
    train: Dataset({
        features: ['input', 'output'],
        num_rows: 1_000_000
    })
    val: Dataset({
        features: ['input', 'output'],
        num_rows: 200_000
    })
    test: Dataset({
        features: ['input', 'output'],
        num_rows: 40_000
    })
})

Training procedure

The following bitsandbytes quantization config was used during training:

  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: float16

Training hyperparameters

The following hyperparameters were used during training:

  • trainable params: 25,165,824 || all params: 326,801,408 || trainable%: 7.700647360735974

    The model is loaded in 8-bit precision. To train this model you need to add additional modules inside the model such as adapters using peft library and freeze the model weights. Please check the examples in https://github.com/huggingface/peft for more details.

  • Num examples = 1_000_000

  • Num Epochs = 5

  • Instantaneous batch size per device = 6

  • Total train batch size (w. parallel, distributed & accumulation) = 144

  • Gradient Accumulation steps = 24

  • Total optimization steps = 27,775

  • Number of trainable parameters = 25,165,824

Framework versions

  • PEFT 0.4.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu121
  • Datasets 3.3.1
  • Tokenizers 0.21.0

Training procedure

The following bitsandbytes quantization config was used during training:

  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: float16
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TungCan/vietnamese-correction-lora-v3

Adapter
(5)
this model

Dataset used to train TungCan/vietnamese-correction-lora-v3