tiny-ko-sft / README.md

minpeter

End of training

96a295b verified 7 months ago

preview code

raw

history blame

5.97 kB

metadata

library_name: transformers
base_model: minpeter/pretrained-tiny-ko
tags:
  - axolotl
  - generated_from_trainer
datasets:
  - lemon-mint/Korean-FineTome-100k
  - lemon-mint/smol-koreantalk
  - heegyu/open-korean-instructions-v20231020
  - FreedomIntelligence/evol-instruct-korean
  - FreedomIntelligence/alpaca-gpt4-korean
  - FreedomIntelligence/sharegpt-korean
  - coastral/korean-writing-style-instruct
  - devngho/korean-instruction-mix
model-index:
  - name: tiny-ko-sft
    results: []

See axolotl config

axolotl version: 0.10.0.dev0

base_model: minpeter/pretrained-tiny-ko

hub_model_id: minpeter/tiny-ko-sft
output_dir: ./outputs/tiny-ko-sft
wandb_project: "axolotl"
wandb_entity: "kasfiekfs-e"

chat_template: chatml
datasets:
  - path: lemon-mint/Korean-FineTome-100k
    type: chat_template
    split: train[:10%]
    field_messages: messages
    message_property_mappings:
      role: role
      content: content

  - path: lemon-mint/smol-koreantalk
    type: chat_template
    split: train[:10%]
    field_messages: messages
    message_property_mappings:
      role: role
      content: content

  - path: heegyu/open-korean-instructions-v20231020
    type: chat_template
    split: train[:10%]
    field_messages: conversations
    message_property_mappings:
      role: from
      content: value
    roles:
      user: ["human", "user"]
      assistant: ["gpt", "assistant", "bot"]
      system: ["system", "input"]

  # NOTE: https://github.com/FreedomIntelligence/MultilingualSIFT
  - path: FreedomIntelligence/evol-instruct-korean
    type: chat_template
    split: train[:10%]
    field_messages: conversations
    message_property_mappings:
      role: from
      content: value

  - path: FreedomIntelligence/alpaca-gpt4-korean
    type: chat_template
    split: train[:10%]
    field_messages: conversations
    message_property_mappings:
      role: from
      content: value

  - path: FreedomIntelligence/sharegpt-korean
    type: chat_template
    split: train[:10%]
    field_messages: conversations
    message_property_mappings:
      role: from
      content: value

  - path: coastral/korean-writing-style-instruct
    type: chat_template
    split: train[:10%]
    field_messages: conversations
    message_property_mappings:
      role: from
      content: value

  - path: devngho/korean-instruction-mix
    type: chat_template
    split: train[:10%]
    field_messages: messages
    message_property_mappings:
      role: from
      content: value

dataset_prepared_path: last_run_prepared
val_set_size: 0.05

save_steps: 200
warmup_steps: 20
eval_steps: 200

sequence_len: 2048

# <<<< experimental settings <<<<
sample_packing: false
train_on_inputs: true
# >>>> experimental settings >>>

pad_to_sequence_len: true

gradient_accumulation_steps: 4
micro_batch_size: 16

optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 1e-3

bf16: auto
tf32: false

added_tokens_overrides:
  128001: "<|im_end|>"
  128002: "<|im_start|>"

special_tokens:
  bos_token: <|begin_of_text|>
  eos_token: <|im_end|>
  pad_token: <|im_end|>

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

num_epochs: 3
weight_decay: 0.0

tiny-ko-sft

This model is a fine-tuned version of minpeter/pretrained-tiny-ko on the lemon-mint/Korean-FineTome-100k, the lemon-mint/smol-koreantalk, the heegyu/open-korean-instructions-v20231020, the FreedomIntelligence/evol-instruct-korean, the FreedomIntelligence/alpaca-gpt4-korean, the FreedomIntelligence/sharegpt-korean, the coastral/korean-writing-style-instruct and the devngho/korean-instruction-mix datasets. It achieves the following results on the evaluation set:

Loss: 1.4059

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 128
total_eval_batch_size: 32
optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 20
training_steps: 2972

Training results

Training Loss	Epoch	Step	Validation Loss
3.9539	0.0010	1	3.9757
1.6999	0.2019	200	1.6884
1.6123	0.4037	400	1.6288
1.5387	0.6056	600	1.5876
1.5681	0.8075	800	1.5429
1.3066	1.0091	1000	1.5208
1.395	1.2110	1200	1.5007
1.3474	1.4128	1400	1.4699
1.3025	1.6147	1600	1.4383
1.2566	1.8166	1800	1.4117
1.1672	2.0182	2000	1.4227
1.1267	2.2200	2200	1.4141
1.0195	2.4219	2400	1.4098
1.084	2.6238	2600	1.4063
1.1254	2.8256	2800	1.4059

Framework versions

Transformers 4.52.3
Pytorch 2.6.0+cu124
Datasets 3.6.0
Tokenizers 0.21.1