YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3-4B-SFT (STEP100 Checkpoint)

This repository provides the LoRA adapter checkpoint at training step 100
fine-tuned from:

Qwen/Qwen3-4B-Instruct-2507

This is the best-performing checkpoint selected from training.

⚠️ This repository contains LoRA adapter weights only.
The base model must be loaded separately.


Training Overview

This adapter was trained using:

  • Supervised Fine-Tuning (SFT)
  • QLoRA (4-bit, Unsloth)
  • Output-only loss (Chain-of-Thought masked)
  • Task-aware upsampling

Checkpoint:

  • Selected step: 100
  • Training max steps: 200
  • Best validation performance observed at step 100

Training Objective

The goal is to improve structured output accuracy across:

  • JSON
  • YAML
  • XML
  • TOML
  • CSV

Output-only Supervision

Loss is applied only to the final assistant output.

Intermediate reasoning (Chain-of-Thought) is masked.

Learning begins after output markers such as: Output: OUTPUT: Final: Answer: Result: Response:

Configuration:

  • SFT_MASK_COT = 1
  • SFT_OUTPUT_LEARN_MODE = "after_marker"

Task-aware Upsampling

Upsampling was enabled to rebalance structured transformation tasks.

Applied multipliers:

Task Type Multiplier
text_to_toml 1.4
text_to_xml 1.4
json_to_xml 1.3
yaml_to_xml 1.3
csv_to_xml 1.3
toml_to_xml 1.3

Configuration:

  • SFT_USE_UPSAMPLING = 1

This improves XML-related transformation robustness while maintaining multi-format generalization.


Training Configuration

Base Model

Qwen/Qwen3-4B-Instruct-2507

Dataset

u-10bei/structured_data_with_cot_dataset_512_v2

Method

QLoRA (4-bit) via Unsloth

Hyperparameters

Parameter Value
Max sequence length 512
Epochs 1
Max steps 200
Learning rate 1.5e-5
Warmup ratio 0.05
Weight decay 0.0
Per-device train batch size 2
Gradient accumulation 8
Effective batch size 16

LoRA Configuration

Parameter Value
r 32
alpha 64
dropout 0.1
target modules q_proj, k_proj, v_proj, o_proj

Repository Contents

This repository contains only:

  • adapter_config.json
  • adapter_model.safetensors
  • README.md

This is the STEP100 checkpoint only (final selected model).


Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "Gen-oze/Qwen3-4B-SFT"

tokenizer = AutoTokenizer.from_pretrained(base)

model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)

model = PeftModel.from_pretrained(model, adapter)
model.eval()
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support