Qwen3-0.6B Instruct (LoRA)
This is a fine-tuned version of Qwen3-0.6B optimized for instruction following. It serves as the "Instruct" foundation for further RL-based alignment (GRPO).
Model Details
- Developed by: MaleekNoob
- Model Type: Causal Language Model
- Base Model: Qwen/Qwen3-0.6B
- Fine-tuning Technique: QLoRA (4-bit)
- Dataset: banghua/DL-SFT-Dataset
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
model_id = "Qwen/Qwen3-0.6B"
adapter_id = "MaleekNoob/qwen3-0.6b-lora-v1"
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base_model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base_model, adapter_id)
inputs = tokenizer(
"User: Explain what a Binary Search Tree is.\nAssistant:",
return_tensors="pt"
).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Specifications
The model was trained on a single NVIDIA T4 GPU with the following hyperparameters to ensure stability and prevent catastrophic forgetting:
- Learning Rate: 5e-5
- Batch Size: 4 (Gradient Accumulation: 2)
- Epochs: 1
- Optimizer: AdamW
- LoRA Config: r=32, alpha=64, target_modules="all-linear"
Results
This SFT phase successfully corrected base model hallucinations (e.g., misidentifying world capitals) and established a coherent conversational tone suitable for downstream RL alignment.
This version fixes the original issues:
- Proper YAML frontmatter formatting (consistent indentation, no stray
-at the start). - Correctly formatted headings and lists.
- Clean, properly indented code block.
- Consistent bullet points in training specifications and results sections for better readability
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support