Gen-oze's picture
Update README.md
0768e16 verified

Qwen3-4B-Upsample_multi5

This repository provides a LoRA adapter fine-tuned from:

Qwen/Qwen3-4B-Instruct-2507

using QLoRA (4-bit, Unsloth) with:

  • Structured output–focused Supervised Fine-Tuning (SFT)
  • Chain-of-Thought (CoT) masked loss
  • Task-aware upsampling

⚠️ This repository contains LoRA adapter weights only.
The base model must be loaded separately.


Training Objective

This adapter is trained to improve structured output accuracy across:

  • JSON
  • YAML
  • XML
  • TOML
  • CSV

Output-only supervised loss

Loss is applied only to the final assistant output.

  • Chain-of-Thought reasoning is masked.
  • Learning starts after output markers such as: Output: OUTPUT: Final: Answer: Result: Response:

Configuration:

  • SFT_MASK_COT = 1
  • SFT_OUTPUT_LEARN_MODE = "after_marker"

Task-aware Upsampling (IMPORTANT)

This training run uses controlled upsampling to rebalance specific structured transformation tasks.

Upsampling rules applied:

Task Type Multiplier
text_to_toml 1.4
text_to_xml 1.4
json_to_xml 1.3
yaml_to_xml 1.3
csv_to_xml 1.3
toml_to_xml 1.3

Configuration:

  • SFT_USE_UPSAMPLING = 1

This improves XML-heavy transformation robustness while preserving multi-format generalization.


Training Configuration

Base Model

Qwen/Qwen3-4B-Instruct-2507

Dataset

u-10bei/structured_data_with_cot_dataset_512_v2

Method

QLoRA (4-bit) via Unsloth


Hyperparameters

Parameter Value
Max sequence length 512
Epochs 1
Max steps 200
Learning rate 1.5e-5
Warmup ratio 0.05
Weight decay 0.0
Per-device train batch size 2
Per-device eval batch size 2
Gradient accumulation 8
Effective batch size 16

LoRA Configuration

Parameter Value
r 32
alpha 64
dropout 0.1
target modules q_proj, k_proj, v_proj, o_proj

Checkpoints

Checkpoints are automatically pushed during training.

  • Saved every 25 steps
  • Maximum retained checkpoints: 2
  • Stored under: checkpoints/

Repository: Gen-oze/Qwen3-4B-Upsample_multi5


Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "Gen-oze/Qwen3-4B-Upsample_multi5"

tokenizer = AutoTokenizer.from_pretrained(base)

model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)

model = PeftModel.from_pretrained(model, adapter)
model.eval()