Qwen3-4B-SFT (STEP100 Checkpoint)

This repository provides the LoRA adapter checkpoint at training step 100
fine-tuned from:

Qwen/Qwen3-4B-Instruct-2507

This is the best-performing checkpoint selected from training.

⚠️ This repository contains LoRA adapter weights only.
The base model must be loaded separately.

Training Overview

This adapter was trained using:

Supervised Fine-Tuning (SFT)
QLoRA (4-bit, Unsloth)
Output-only loss (Chain-of-Thought masked)
Task-aware upsampling

Checkpoint:

Selected step: 100
Training max steps: 200
Best validation performance observed at step 100

Training Objective

The goal is to improve structured output accuracy across:

JSON
YAML
XML
TOML
CSV

Output-only Supervision

Loss is applied only to the final assistant output.

Intermediate reasoning (Chain-of-Thought) is masked.

Learning begins after output markers such as: Output: OUTPUT: Final: Answer: Result: Response:

Configuration:

SFT_MASK_COT = 1
SFT_OUTPUT_LEARN_MODE = "after_marker"

Task-aware Upsampling

Upsampling was enabled to rebalance structured transformation tasks.

Applied multipliers:

Task Type	Multiplier
text_to_toml	1.4
text_to_xml	1.4
json_to_xml	1.3
yaml_to_xml	1.3
csv_to_xml	1.3
toml_to_xml	1.3

Configuration:

SFT_USE_UPSAMPLING = 1

This improves XML-related transformation robustness while maintaining multi-format generalization.

Training Configuration

Base Model

Qwen/Qwen3-4B-Instruct-2507

Dataset

u-10bei/structured_data_with_cot_dataset_512_v2

Method

QLoRA (4-bit) via Unsloth

Hyperparameters

Parameter	Value
Max sequence length	512
Epochs	1
Max steps	200
Learning rate	1.5e-5
Warmup ratio	0.05
Weight decay	0.0
Per-device train batch size	2
Gradient accumulation	8
Effective batch size	16

LoRA Configuration

Parameter	Value
r	32
alpha	64
dropout	0.1
target modules	q_proj, k_proj, v_proj, o_proj

Repository Contents

This repository contains only:

adapter_config.json
adapter_model.safetensors
README.md

This is the STEP100 checkpoint only (final selected model).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "Gen-oze/Qwen3-4B-SFT"

tokenizer = AutoTokenizer.from_pretrained(base)

model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)

model = PeftModel.from_pretrained(model, adapter)
model.eval()

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support