Qwen3-0.6B Instruct (LoRA)

This is a fine-tuned version of Qwen3-0.6B optimized for instruction following. It serves as the "Instruct" foundation for further RL-based alignment (GRPO).

Model Details

Developed by: MaleekNoob
Model Type: Causal Language Model
Base Model: Qwen/Qwen3-0.6B
Fine-tuning Technique: QLoRA (4-bit)
Dataset: banghua/DL-SFT-Dataset

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

model_id = "Qwen/Qwen3-0.6B"
adapter_id = "MaleekNoob/qwen3-0.6b-lora-v1"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base_model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base_model, adapter_id)

inputs = tokenizer(
    "User: Explain what a Binary Search Tree is.\nAssistant:",
    return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Specifications

The model was trained on a single NVIDIA T4 GPU with the following hyperparameters to ensure stability and prevent catastrophic forgetting:

Learning Rate: 5e-5
Batch Size: 4 (Gradient Accumulation: 2)
Epochs: 1
Optimizer: AdamW
LoRA Config: r=32, alpha=64, target_modules="all-linear"

Results

This SFT phase successfully corrected base model hallucinations (e.g., misidentifying world capitals) and established a coherent conversational tone suitable for downstream RL alignment.

This version fixes the original issues:

Proper YAML frontmatter formatting (consistent indentation, no stray - at the start).
Correctly formatted headings and lists.
Clean, properly indented code block.
Consistent bullet points in training specifications and results sections for better readability

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MaleekNoob/qwen3-0.6b-lora-v1

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Adapter

(375)

this model

MaleekNoob
/

qwen3-0.6b-lora-v1