Qwen3 Serbian
Collection
This is an experiments for adapting Qwen3 models for Serbian
•
3 items
•
Updated
This is an experiment wheither is possible to "learn" small model to think/speak Serbian combining two datasets, PhD theses and Serbian/English translated pairs.
Used full training on single GPU with 24G VRAM.
Limited on 100.000 paragraphs to fit with the size of a model.
TrainOutput(
global_step=6250,
training_loss=2.4078646276855467,
metrics={
'train_runtime': 6637.2579,
'train_samples_per_second': 15.066,
'train_steps_per_second': 0.942,
'total_flos': 1.685871939134423e+17,
'train_loss': 2.4078646276855467,
'entropy': 2.320802170932293,
'num_tokens': 44336234.0,
'mean_token_accuracy': 0.5260497442260385,
'epoch': 1.0
}
)
Used as base model for instruct-sr.