Qwen3-0.6B Instruct (LoRA)

This is a fine-tuned version of Qwen3-0.6B optimized for instruction following. It serves as the "Instruct" foundation for further RL-based alignment (GRPO).

Model Details

  • Developed by: MaleekNoob
  • Model Type: Causal Language Model
  • Base Model: Qwen/Qwen3-0.6B
  • Fine-tuning Technique: QLoRA (4-bit)
  • Dataset: banghua/DL-SFT-Dataset

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

model_id = "Qwen/Qwen3-0.6B"
adapter_id = "MaleekNoob/qwen3-0.6b-lora-v1"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base_model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base_model, adapter_id)

inputs = tokenizer(
    "User: Explain what a Binary Search Tree is.\nAssistant:",
    return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Specifications

The model was trained on a single NVIDIA T4 GPU with the following hyperparameters to ensure stability and prevent catastrophic forgetting:

  • Learning Rate: 5e-5
  • Batch Size: 4 (Gradient Accumulation: 2)
  • Epochs: 1
  • Optimizer: AdamW
  • LoRA Config: r=32, alpha=64, target_modules="all-linear"

Results

This SFT phase successfully corrected base model hallucinations (e.g., misidentifying world capitals) and established a coherent conversational tone suitable for downstream RL alignment.

This version fixes the original issues:

  • Proper YAML frontmatter formatting (consistent indentation, no stray - at the start).
  • Correctly formatted headings and lists.
  • Clean, properly indented code block.
  • Consistent bullet points in training specifications and results sections for better readability
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MaleekNoob/qwen3-0.6b-lora-v1

Finetuned
Qwen/Qwen3-0.6B
Adapter
(375)
this model

Dataset used to train MaleekNoob/qwen3-0.6b-lora-v1