Qwen3-0.6B SR Instruct

🇷🇸 Srpski

Opis Modela

Ovaj model je specijalizovana verzija Qwen3-0.6B, adaptirana (fine-tuned) za srpski jezik. Model je prošao kroz dve faze treninga:

  1. Osnovni model: Trening na visokokvalitetnim naučnim tekstovima.
  2. Instruct model: Podešavanje za praćenje uputstava (ChatML format) koristeći specifične setove podataka.

Karakteristike

  • Veličina: 0.6 milijardi parametara (izuzetno brz na manjim karticama).
  • Format: ChatML (<|im_start|>, <|im_end|>).

Preporučeni parametri za Inference

Za najbolju gramatiku i logiku, preporučuje se Beam Search:

# num_beams=5, do_sample=False, no_repeat_ngram_size=3

🇬🇧 English

Model Description

Qwen3-0.6B-SR-Instruct- is a specialized, lightweight language model fine-tuned for the Serbian language.

The model underwent a two-stage training process:

  1. Base Model: Training on academic and scientific corpora.
  2. Instruction Tuning: Refined using specialized instruction sets in ChatML format to ensure professional and context-aware responses.

Key Features

  • Compact & Efficient: At 0.6B parameters, it offers high-speed inference even on consumer-grade GPUs.
  • Formatting: Uses ChatML template for clean conversational flow.

Usage & Inference Settings

To achieve optimal grammatical correctness in Serbian, we recommend using Beam Search over random sampling:

  • Beam Count: 5
  • Sampling: Disabled (do_sample=False)
  • Repetition Penalty: 1.1

📈 Training Progress

The model was trained using a structured Instruct Tuning approach. The chart below visualizes the loss reduction over 1500 steps, showing a smooth convergence to a final loss of 1.2655.

  • Training Loss: Consistent decline, indicating effective learning of the Serbian instruction set.

  • Final Loss: 1.2655 (indicates high confidence in domain-specific responses).

  • Hardware: Optimized for single GPU 24GB VRAM.

Trainig Grapg

⚠️ Limitations (Ograničenja)

SR: S obzirom na veličinu od 0.6B parametara, model može imati sledeća ograničenja:

  • Halucinacije: Pri visokim temperaturama (sampling), model može generisati fiktivne podatke. Uvek koristiti Beam Search za kritične informacije.

  • Sugestivnost: Model teži da se složi sa korisnikom čak i ako je tvrdnja netačna (kao što smo videli u testu sa "spaljivanjem"). Koristite stroge System Prompte.

EN: Given the 0.6B parameter scale, the following limitations apply:

  • Hallucinations: High temperature settings may lead to factual errors. Use Beam Search for accuracy.

  • Compliance Bias: The model might follow incorrect user premises. Use strong System Instructions to anchor the model's logic.

🛠️ Quick Start / Kako koristiti

from transformers import pipeline
import torch

model_id = "tvoja-putanja/qwen3-0.6b-sr-instruct"

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

prompt = "<|im_start|>user\nObjasni važnost digitalizacije arhiva.<|im_end|>\n<|im_start|>assistant\n"

output = pipe(prompt, max_new_tokens=300, num_beams=5, do_sample=False)
print(output[0]['generated_text'])

Dataset Copyright Nikola Janković, 2025, licensed under the Creative Commons Attribution-NonCommercial 2.0 Generic (CC BY-NC 2.0) license

Downloads last month
46
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sagicc/Qwen3-0.6B-sr-Instruct

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(1)
this model
Quantizations
1 model

Dataset used to train Sagicc/Qwen3-0.6B-sr-Instruct

Collection including Sagicc/Qwen3-0.6B-sr-Instruct