See axolotl config

axolotl version: 0.12.2

base_model: Qwen/Qwen3-0.6B
trust_remote_code: true
strict: false

chat_template: qwen3

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

datasets:
  - path: ./Dataset/dataset_bpln.jsonl
    type: chat_template
    field_messages: conversations
    message_property_mappings:
      role: from
      content: value
    roles:
      user: ["human"]
      assistant: ["gpt"]
      system: ["system"]

dataset_prepared_path: ./process
val_set_size: 0.01
output_dir: ./outputs/out

sequence_len: 256
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

wandb_project: BPLN
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

adapter: lora
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj

load_in_8bit: false
load_in_4bit: false

gradient_accumulation_steps: 4
micro_batch_size: 4
num_epochs: 3

optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 8e-5
weight_decay: 0.0

warmup_ratio: 0.05

bf16: true
fp16: false
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

flash_attention: false

logging_steps: 1
evals_per_epoch: 1
saves_per_epoch: 1
save_total_limit: 2

special_tokens:
  eos_token: "<|im_end|>"

outputs/out

This model is a fine-tuned version of Qwen/Qwen3-0.6B on the ./Dataset/dataset_bpln.jsonl dataset. It achieves the following results on the evaluation set:

Loss: 1.5580
Memory/max Mem Active(gib): 1.44
Memory/max Mem Allocated(gib): 1.44
Memory/device Mem Reserved(gib): 1.49

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 4
training_steps: 81

Training results

Training Loss	Epoch	Step	Validation Loss	Mem Active(gib)	Mem Allocated(gib)	Mem Reserved(gib)
No log	0	0	1.8057	1.17	1.17	1.19
1.7214	0.9818	27	1.5839	1.44	1.44	1.49
1.7178	1.9455	54	1.5639	1.44	1.44	1.49
1.6562	2.9091	81	1.5580	1.44	1.44	1.49

Framework versions

PEFT 0.17.0
Transformers 4.55.2
Pytorch 2.5.1+cu121
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for LucasMota10/qwen3_0.6B_BPLN

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Adapter

(278)

this model