qwen3-4b-structured-output-lora-last

必須ポイント(「LoRAアダプタのみ」であることを明記、ベースモデル名を明示)

1.モデルの概要

本モデルは、松尾研LLM講座2025応用編の最終課題として作成された提出モデルである。

ベースモデルには
Qwen/Qwen3-4B-Instruct-2507 を使用し、
QLoRA（4-bit）+ Unsloth により LoRA アダプターを追加学習した。

本リポジトリには LoRA アダプターの重みのみが含まれており、ベースモデルは別途ロードする必要がある。

Model Overview

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth).

This repository contains LoRA adapter weights only. The base model must be loaded separately.

2.学習目的・設計思想(Why)

assistant-only loss、COT mask(Output:以降のみ学習)

本モデルの主目的は、構造化出力（JSON / YAML / XML / TOML / CSV）を

余計な説明を混ぜず
形式を崩さず
指定された構造のみを返す

よう安定化させることである。

単に「正解を出す」だけでなく、実運用で頻発する以下の問題を減らすことを目標とした。

出力に説明文が混ざる（例: *Here is the JSON:*）
括弧・タグの崩れでパース不能になる
CSVがヘッダのみで終了する
Chain-of-Thought（中間推論）が混入する

そのため、本モデルでは

「構造化本体のみを学習対象として安定化する」

という設計思想を中心に改善を行った。

Training Objective

This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV).

Loss is applied only to the final assistant output, while intermediate reasoning (Chain-of-Thought) is masked.

3.学習設定(How)

再現性のため必須

LoRA設定

r = 32
alpha = 64
dropout = 0.05
target modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj

構造化出力は「フォーマットの癖」を学ぶタスクであるため、 Attention層とFFN層の両方へLoRAを適用した。

rを過度に大きくせず、alphaを抑えめに設定することで、安定性と表現力のバランスを取った。

学習設定

learning rate = 1e-5
→ 形式学習には過小LRでは変化が出にくいため適度に設定
gradient accumulation = 16
→ VRAM制約下で実効バッチサイズを確保
max_steps = 320
→ エポック基準ではなくステップ基準で制御
weight_decay = 0.01
→ LoRAでも軽度の正則化を付与

シーケンス長

max_seq_len = 512

本課題では「長文対応」よりも「短い制約下で形式を壊さないこと」を優先した。

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: QLoRA (4-bit)
Max sequence length: 512
Epochs: 1
Learning rate: 1e-05
LoRA: r=32, alpha=64

4.使用方法(How to use)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "Gamoooo/qwen3-4b-structured-output-lora-last"

tokenizer = AutoTokenizer.from_pretrained(
    base,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

model = PeftModel.from_pretrained(model, adapter)

prompt = "Convert this to JSON: name=John, age=30"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=200)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

5.データセット・ライセンス注意

使用データセットを変更した場合には、"Dataset License","Compliance" 欄も適切な形に書き換える

Sources & Terms (IMPORTANT)

Training data: u-10bei/structured_data_with_cot_dataset_512_v2

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.

6.学習コードにおける主な変更点（配布コードからの変更）

学習データの assistant 出力に対し、以下の前処理を追加しました。

# 【追加】assistant 出力を正規化（構造化本体だけ + Output:\n 付与）
ds_all = ds_all.map(normalize_messages, desc="Normalize assistant outputs")
ds_all = ds_all.filter(lambda ex: ex["messages"][-1]["content"].strip() != "", desc="Drop empty after normalize")
ds_all = ds_all.filter(has_csv_two_lines, desc="Drop header-only CSV samples")

- コードフェンス内の構造化本体のみ抽出
- JSON / XML / CSV らしい開始記号を検出して抽出
- `Output:\n` を強制付与
- 正規化後に空になったサンプルを除去
- CSVがヘッダのみ（2行未満）のサンプルを除去

これにより、

- 説明文や前置きの混入を抑制
- Chain-of-Thought 混入の抑制
- 「構造化フォーマット本体」のみを学習対象に集中

させる設計としました。

### 3.2 データ品質フィルタの強化

assistant-only loss 設計では、
教師トークンが0（all-masked）になるサンプルが存在し得ます。

そのため、

- all-masked サンプルを事前に検出
- 学習・評価データから除去
- フィルタ後に再チェック

を追加し、NaN発生や評価不安定を防止しました。

### 3.3 ハイパーパラメータの再設計

配布コードから以下を調整しました。

- LoRA容量の見直し（r, alpha, dropout）
- 学習率の再設定
- gradient accumulation の増加
- step基準での学習制御

構造化出力は「形式の癖」を学ぶタスクであるため、
安定性と学習進行のバランスを重視しています。

### 3.4 学習健全性モニタの活用

labels のうち実際に loss 対象になっているトークン割合
（valid_ratio）を定期出力するコールバックを活用し、

- 学習が実際に assistant 出力へ乗っているか
- 教師信号が十分あるか

を確認できる設計としました。

## 本変更の意図

配布コードは汎用的な SFT 学習コードですが、
本提出モデルでは

> 「構造化出力に特化した安定学習」

を目的として、

- 教師データ品質の向上
- CoT混入の抑制
- フォーマット崩れの予防
- 学習の安定化

に重点を置いた改良を行っています。

Downloads last month: -

Model tree for Gamoooo/qwen3-4b-structured-output-lora-last

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5500)

this model

Gamoooo
/

qwen3-4b-structured-output-lora-last