Upload README.md with huggingface_hub

ca7b5b9 verified about 2 months ago

3 kB

base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
  - daichira/structured-5k-mix-sft
  - daichira/structured-hard-sft-4k
  - u-10bei/dpo-dataset-qwen-cot
language:
  - en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
  - model-soup
  - weight-average
  - structured-output
  - qwen
  - dpo
  - sft

exp040-soup-3model-weighted

Weighted Model Soup of 3 fine-tuned models for structured output generation (JSON / YAML / TOML / XML / CSV).

Full 16-bit merged weights. No adapter loading required.

Model Soup Configuration

This model is created by weighted averaging of 3 independently trained models:

Weight	Model	Training	Score
0.50	tomofusa/exp017-dpo-ipo-merged	SFT + DPO (IPO, lr=5e-7)	0.789
0.25	tomofusa/exp020-simpo-merged	SFT + CPO/SimPO (beta=2.5)	0.789
0.25	tomofusa/exp034-toml-upsample-dpo-merged	SFT (TOML upsampled) + DPO (IPO)	0.765

Soup method: model_A * 0.5 + model_B * 0.25 + model_C * 0.25 applied to all weight tensors.

Training Pipeline (per source model)

All source models share the same base pipeline:

Base model: Qwen/Qwen3-4B-Instruct-2507
SFT: QLoRA on structured output data (7,500 samples)
- SFT adapter: tomofusa/exp015-blend-h-lora
- Sources: daichira/structured-5k-mix-sft (5,000) + daichira/structured-hard-sft-4k (2,000 sampled) + custom TOML data (500)
- lr=5e-6, epochs=2, LoRA r=64/alpha=128, max_seq_len=1024
DPO: IPO/SimPO on u-10bei/dpo-dataset-qwen-cot (4,040 samples)
- lr=5e-7, beta=0.1, epochs=1, LoRA r=64/alpha=128
Merge + Soup: Each model merged to 16-bit, then weighted-averaged

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "tomofusa/exp040-soup-3model-weighted"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

Sources & Terms

Base model: Qwen/Qwen3-4B-Instruct-2507 - Apache 2.0
SFT data: daichira/structured-5k-mix-sft (CC-BY-4.0), daichira/structured-hard-sft-4k (CC-BY-4.0)
DPO data: u-10bei/dpo-dataset-qwen-cot
Users must comply with all upstream licenses and terms of use.