Built with Axolotl

See axolotl config

axolotl version: 0.13.0.dev0

base_model: Qwen/Qwen3-4B-Instruct-2507
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
strict: false

chat_template: qwen3
datasets:
  - path: fischkas09/TPDB_conversations
    type: chat_template
    split: train
    field_messages: messages
    message_property_mappings:
      role: role
      content: content
val_set_size: 0.15
output_dir: /workspace/axolotl/peptide_chat/model
dataset_prepared_path: /workspace/axolotl/peptide_chat/prepared_data

sequence_len: 4096
sample_packing: true
eval_sample_packing: true

load_in_4bit: true
adapter: qlora
lora_r: 8
lora_alpha: 16
lora_target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - down_proj
  - up_proj
lora_mlp_kernel: true
lora_qkv_kernel: true
lora_o_kernel: true

use_wandb: true
wandb_name: qwen3-qlora-peptide-chat
wandb_project: peptide-chat
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_torch_4bit
lr_scheduler: cosine
learning_rate: 0.0001

bf16: auto
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
auto_resume_from_checkpoints: true
logging_steps: 1
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
special_tokens:

# save_first_step: true  # uncomment this to validate checkpoint saving works with your config

workspace/axolotl/peptide_chat/model

This model is a fine-tuned version of Qwen/Qwen3-4B-Instruct-2507 on the fischkas09/TPDB_conversations dataset. It achieves the following results on the evaluation set:

  • Loss: 4.3237
  • Ppl: 75.4694
  • Memory/max Active (gib): 3.33
  • Memory/max Allocated (gib): 3.33
  • Memory/device Reserved (gib): 6.3

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH_4BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 20
  • training_steps: 204

Training results

Training Loss Epoch Step Validation Loss Ppl Active (gib) Allocated (gib) Reserved (gib)
No log 0 0 7.5089 1824.1217 3.31 3.31 5.3
6.1255 0.2468 17 5.8785 357.2578 3.33 3.33 6.3
4.8152 0.4936 34 4.8201 123.9785 3.33 3.33 6.3
4.6569 0.7405 51 4.6556 105.1744 3.33 3.33 6.31
4.5626 0.9873 68 4.5607 95.6505 3.33 3.33 6.3
4.4898 1.2323 85 4.4996 89.9799 3.33 3.33 6.3
4.4592 1.4791 102 4.4429 85.0226 3.33 3.33 6.3
4.4193 1.7260 119 4.3994 81.3985 3.33 3.33 6.3
4.3905 1.9728 136 4.3698 79.0246 3.33 3.33 6.3
4.2897 2.2178 153 4.3452 77.1110 3.33 3.33 6.3
4.318 2.4646 170 4.3318 76.0812 3.33 3.33 6.3
4.3452 2.7114 187 4.3248 75.5490 3.33 3.33 6.3
4.3629 2.9583 204 4.3237 75.4694 3.33 3.33 6.3

Framework versions

  • PEFT 0.18.1
  • Transformers 4.57.6
  • Pytorch 2.9.1+cu128
  • Datasets 4.5.0
  • Tokenizers 0.22.2
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fischkas09/peptide-chat-qwen3

Adapter
(5501)
this model

Dataset used to train fischkas09/peptide-chat-qwen3