Qwen3 0.6B Serbian

This is an experiment wheither is possible to "learn" small model to think/speak Serbian combining two datasets, PhD theses and Serbian/English translated pairs.

Used full training on single GPU with 24G VRAM.

Limited on 100.000 paragraphs to fit with the size of a model.

Training results

TrainOutput(
  global_step=6250,
  training_loss=2.4078646276855467,
  metrics={
    'train_runtime': 6637.2579,
    'train_samples_per_second': 15.066,
    'train_steps_per_second': 0.942,
    'total_flos': 1.685871939134423e+17,
    'train_loss': 2.4078646276855467,
    'entropy': 2.320802170932293,
    'num_tokens': 44336234.0,
    'mean_token_accuracy': 0.5260497442260385,
    'epoch': 1.0
  }
)

Used as base model for instruct-sr.