qwen2_5_omni_text_image_half

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7678
  • Token Acc: 0.7693

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Token Acc
0.9669 0.4211 50 0.9984 0.7193
0.9312 0.8421 100 0.9554 0.7274
0.7948 1.2611 150 0.9151 0.7375
0.7381 1.6821 200 0.8547 0.7479
0.6022 2.1011 250 0.8082 0.7591
0.6003 2.5221 300 0.7770 0.7660
0.5722 2.9432 350 0.7678 0.7692
0.5801 3.0 357 0.7678 0.7693

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support